Authors:
Tobias Bolten
1
;
Regina Pohle-Fröhlich
1
and
Klaus Tönnies
2
Affiliations:
1
Institute for Pattern Recognition, Hochschule Niederrhein, Krefeld, Germany
;
2
Department of Simulation and Graphics, University of Magdeburg, Germany
Keyword(s):
Dynamic Vision Sensor, Semantic Segmentation, PointNet++, UNet.
Abstract:
Neuromorphic Vision Sensors, which are also called Dynamic Vision Sensors, are bio-inspired optical sensors
which have a completely different output paradigm compared to classic frame-based sensors. Each pixel of
these sensors operates independently and asynchronously, detecting only local changes in brightness. The output
of such a sensor is a spatially sparse stream of events, which has a high temporal resolution. However, the
novel output paradigm raises challenges for processing in computer vision applications, as standard methods
are not directly applicable on the sensor output without conversion.
Therefore, we consider different event representations by converting the sensor output into classical 2D frames,
highly multichannel frames, 3D voxel grids as well as a native 3D space-time event cloud representation. Using
PointNet++ and UNet, these representations and processing approaches are systematically evaluated to
generate a semantic segmentation of the sensor output
stream. This involves experiments on two different
publicly available datasets within different application contexts (urban monitoring and autonomous driving).
In summary, PointNet++ based processing has been found advantageous over a UNet approach on lower resolution
recordings with a comparatively lower event count. On the other hand, for recordings with ego-motion
of the sensor and a resulting higher event count, UNet-based processing is advantageous.
(More)