Can We Detect Pedestrians using Low-resolution LIDAR?

Integration of Multi-frame Point-clouds

Yoshiki Tatebe

, Daisuke Deguchi

, Yasutomo Kawanishi

, Ichiro Ide

Hiroshi Murase

and Utsushi Sakai

Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya-shi, Aichi, Japan

Information Strategy Oﬃce, Nagoya University, Furo-cho, Chikusa-ku, Nagoya-shi, Aichi, Japan

DENSO CORPORATION, 1-1 Showa-cho, Kariya-shi, Aichi, Japan

Keywords:

LIDAR, Pedestrian Detection, Low-resolution.

Abstract:

In recent years, demand for pedestrian detection using inexpensive low-resolution LIDAR (LIght Detection

And Ranging) is increasing, as it can be used to prevent traﬃc accidents involving pedestrians. However, it is

diﬃcult to detect pedestrians from a low-resolution (sparse) point-cloud obtained by a low-resolution LIDAR.

In this paper, we propose multi-frame features calculated by integrating point-clouds over multiple frames for

increasing the point-cloud resolution, and extracting their temporal changes. By combining these features,

the accuracy of the pedestrian detection from low-resolution point-clouds can be improved. We conducted

experiments using LIDAR data obtained in actual traﬃc environments. Experimental results showed that the

proposed method could detect pedestrians accurately from low-resolution LIDAR data.

1 INTRODUCTION

Since 2007, the number of road traﬃc deaths has not

decreased (WHO, ), illustrating the needs for further

countermeasures against traﬃc accidents. Further-

more, road traﬃc deaths among pedestrians are still

high, accounting for 22% of the total. In addition,

the demand for autopilot system for consumer vehi-

cles has grown considerably in the last decade (Shroﬀ

et al., 2013). From these reasons, a system that can

recognize the surrounding environment of a vehicle

and warn the driver about pedestrians in danger is in

strong demand.

In the past few years, pedestrian detection sys-

tems have been actively developed. Among such

researches, various methods that employ LIght De-

tection And Ranging (LIDAR) as an in-vehicle sen-

sor have been proposed (Arras et al., 2007; Pre-

mebida et al., 2009; Navarro-Serment et al., 2010;

Spinello et al., 2011; Kidono et al., 2011; Maturana

and Scherer, 2015). Some researchers studied pedes-

trian detection using LIDAR, and various methods

have been proposed. For example, using state-of-the-

art Deep Learning, Maturana and Scherer proposed

VoxNet (Maturana and Scherer, 2015). They used a

three-dimensional convolutional neural network for

real-time object recognition. It uses an occupancy

grid constructed by three models as input because it

can distinguish free space and unknown space. How-

ever, VoxNet requires a dense point-cloud that is ob-

tained by the expensive Velodyne LIDAR

. For this

reason, it cannot be applied to pedestrian detection us-

ing cheap low-resolution LIDAR that can be utilized

in consumer vehicles.

The low-resolution LIDAR used in our proposed

method can scan objects by six horizontal scanning

lasers. Therefore, that sensor gives very low vertical

resolution, while it is much cheaper than the Velo-

dyne LIDAR. In addition, the number of lasers hitting

an object changes due to the distance between the LI-

DAR and the object. If the object is distant from the

LIDAR, only four or less scanning lasers can hit it.

In other words, the vertical resolution of the point-

cloud is four or less, which is not suﬃcient to repre-

sent the characteristics of pedestrians. For this reason,

pedestrian detection using low-resolution LIDAR be-

comes very diﬃcult, and the conventional methods

using the Velodyne LIDAR cannot be applied. For

low-resolution point-clouds, Kidono et al. proposed

the slice feature and features related to the distribu-

http://velodynelidar.com/products.html

Tatebe Y., Deguchi D., Kawanishi Y., Ide I., Murase H. and Sakai U.

Can We Detect Pedestrians using Low-resolution LIDAR? - Integration of Multi-frame Point-clouds.

DOI: 10.5220/0006100901570164

In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), pages 157-164

ISBN: 978-989-758-226-4

157

First principal component (First-PC)

Second principal component (Second-PC)

ଵ

ଶ

ଷ

PCA

ଵ

ଶ

ଷ

Figure 1: Extraction of the slice feature.

tion of reﬂection intensity (Kidono et al., 2011). The

former represents the rough shape of a pedestrian,

and the latter represent the diﬀerences in reﬂection

characteristics between a pedestrian and other objects.

Fig. 1 shows the procedure to extract the slice feature

when three horizontal scans hit the object. Three-

dimensional point-clouds obtained from LIDAR are

divided into slices at multiple heights, and the width

and the depth for each slice are calculated to repre-

sent the rough shape of a pedestrian. Since each ob-

ject has its owndistributionof reﬂection intensity, fea-

tures related to the distribution of reﬂection intensity

work well to classify a pedestrian and other objects.

By combining these features, the detection accuracy

of a distant pedestrian observed as a low-resolution

point-cloud can be improved. However, if a pedes-

trian is distant from the vehicle, the detection rate is

still low because its resolution is very low. In par-

ticular, it is very diﬃcult to distinguish a pedestrian

from non-pedestrians such as poles, trees, road signs,

and so on, because their shapes are roughly similar

to that of a pedestrian. In addition, all features pro-

posed by Kidono et al. are extracted only from mo-

mentary information (a single frame). They do not

consider temporal information. Therefore, by using

multi-frame point-clouds, the proposed method tries

to detect a pedestrian in a low-resolution point-cloud

more accurately.

In this paper, we propose a pedestrian detection

method which improves the detection rate observed

as a low-resolution point-cloud by utilizing temporal

information. Our contributions and novelties of this

paper are as follows:

1. Density enhancement by integration of multi-

ple point-clouds: By integrating point-clouds ob-

tained from multiple frames, further details of

three-dimensional objects can be obtained be-

cause the point density increases.

2. Consideration of temporal changes of point-

Next frame

Figure 2: Conceptual diagram of integrating point-clouds.

clouds: By considering temporal changes of

point-clouds, objects whose shapes do not change

(for example poles, trees, road signs, and so on),

can be distinguished.

Section 2 describes the proposed feature extrac-

tion method based on the above two ideas, Section 3

describes the proposed pedestrian detection method,

evaluation experiments are described in Section 4,

and discussions are made in Section 5. We conclude

this paper in Section 6.

2 FEATURE EXTRACTION

USING TEMPORAL

INFORMATION

In this paper, we propose two ideas in order to make

use of the temporal information of LIDAR data.

The ﬁrst idea is to increase the density of point-

clouds by integrating multiple frames. In general, the

laser beam of a LIDAR hits diﬀerent positions in each

frame even if scanning the same object because of rel-

ative movement between the LIDAR and objects. For

this reason, more detailed features can be obtained

by alignment of multi-frame point-clouds, as shown

in Figure 2. By using integrated multi-frame point-

clouds for feature extraction, it is expected to allow

obtaining more detailed features.

The second idea is to make use of temporal

changes of features extracted from LIDAR data. In

general, the vehicle will gradually approach objects

because the velocity of the vehicle equipped with LI-

DAR is usually faster than that of a pedestrian. More-

over, the density of three-dimensional point-clouds

obtained from LIDAR increases as objects become

closer. That is, the density of a newly obtained frame

could be higher than that of the older one. The scan-

ning angle also changes due to relative movement of

the vehicle and the objects. For this reason, diﬀer-

ent information about the object can be obtained from

each frame.

Applying feature extraction based on the above

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

158

two ideas, a pedestrian detection method using a low

resolution LIDAR is realized, taking temporal infor-

mation of LIDAR data into consideration. The fol-

lowing sections 2.1 and 2.2 describe the details for

extracting features from multi-frame point-clouds.

2.1 Feature Extraction using

Multi-frame Information

In this paper, L represents the number of horizontal

scans hitting an object.

A pedestrian candidate point-clouds sequence is

represented hierarchically as follows:

P = {P

P(t)}

t=1

, (1)

P(t) = {p

}

l=1

= {p

l,i

}

i=1

where P is a candidate point-clouds sequence with

a length of T frames, P

P(t) is the point-cloud of t-th

frame, p

is a point-cloud obtained by a l-th horizon-

tal scan, and p

l,i

is the i-th distant data of l-th scan

with coordinates (x, y, z) obtained from LIDAR. Fig-

ure 3 shows the graphical representation of this hierar-

chical structure. In the proposed method, features are

extracted from continuous M frames in the candidate

point-clouds sequence P. If P includes K frames, the

feature vectors are extracted K − M + 1 times for each

kind of feature as

f(j) = g



P(k)}

j+M−1

k= j



, (2)

where f

f(j) is the j-th feature vector extracted from

P using M frames, and g



P(k)}

j+M−1

k= j



is a feature

extraction function from multiple frames. Finally, K−

M + 1 feature vectors F

F = { f

f(j)}

K−M+1

j=1

are obtained

from P.

Features extracted from multi-frame point-clouds

are hereinafter referred to as “multi-frame features”,

and those extracted from each single-frame point-

cloud are referred to as “single-frame features”. There

are two beneﬁts derived from extracting features from

multi-frame point-clouds: (1) the density of point-

clouds can be increased, and (2) the changes of the

data between frames can be taken into account.

2.2 Multi-frame Features

In this section, a method for extracting multi-frame

features from candidate point-clouds sequences are

described. Before extracting each feature, pre-

processing is performed. First, M-frame candidate

point-clouds are projected onto the road surface, and

Principal Component Analysis (PCA) is applied to

ݐ ܶ െ ͳ ܶ

ࡼሺݐሻ

࢖

௟

௟ǡ௜

Figure 3: Structure of the candidate point-clouds sequence.

Table 1: Multi-frame features and their dimensions.

Feature’s name Dim.

Approximated volume

of a point-cloud

Maximum value of normalized

reﬂection intensity voxel

Mean of normalized

reﬂection intensity voxel

Maximum value of

reﬂection intensity voxel

Weighted mean of slice feature 2L

Weighted mean of

relative slice position

4(L− 1)

Weighted mean of maximum value

of normalized reﬂection intensity

Weighted mean of variance of

normalized reﬂection intensity

them in each frame. Next, the center-of-gravity of

each point-cloud is aligned to share the origin of the

axes composed of two principal components obtained

by PCA (the ﬁrst principal component x represents

the width of a point-cloud, and the second principal

component y represents the depth of a point-cloud).

By this pre-processing, the orientation and the posi-

tion of the each point-cloud are normalized.

Next, the method for extracting multi-frame fea-

tures from the normalized point-clouds will be de-

scribed. Table 1 shows the list of features used in

the proposed method and the number of dimensions

of each feature when the candidate point-cloud is hit

by L horizontal scans.

Reﬂection Intensity. The reﬂection intensity that

appears in Table 1 is the maximum value of the wave-

form representing the intensity of the reﬂected light at

each point obtained from LIDAR. The reﬂected light

is attenuated in inverse proportion to the square of the

distance to each point. Because of this, the reﬂection

intensity value is preferably normalized by multiply-

ing by the square of distance. This normalized value

Can We Detect Pedestrians using Low-resolution LIDAR? - Integration of Multi-frame Point-clouds

159

ଶǡଵ

ଵǡ଴

ଶǡ଴

ଵǡଵ

PCA

ݖ ݖ

ଶǡଵ

ଵǡଵ

ଶǡ଴

ଵǡ଴

First-PC

Second-PC

Figure 4: Extraction of the relative slice position (L = 3).

1-frame

PCA

Integrate

ܯ-frame

PCA

Second-PC

First PC

Second-PC

First-PC

Second-PC

Figure 5: Integration of the point-cloud (L = 3).

is called normalized reﬂection intensity.

Relative Slice Position. The relative slice position

that appears in Table 1 represents the outline of a can-

didate point-cloud. Figure 4 shows the procedure to

extract the relative slice position when L = 3. First,

after calculating a slice feature (Kidono et al., 2011),

the end points of the slices of each point-cloud p

(l = 1, 2, · · · , L) are connected in a straight line. Then,

angles of the straight lines with respect to the vertical

direction are obtained as the slice position. In other

words, by capturing relative position between slices,

the relative slice position represents the rough outline

of a pedestrian and other objects.

In this paper, features inspired by the ﬁrst idea of

integration of point-clouds to increase their densities

∼ f

), and the second idea of temporal change of

features ( f

∼ f

) are explored. The calculation pro-

cedure for each feature is described in the following

sections.

2.2.1 Integration of Point-clouds f

∼ f

In this section, multi-frame features extracted after

increasing the density of point-clouds by integrating

them based on the ﬁrst idea are described. Here, the

procedure of integrating point-clouds from M frames

for multi-frame feature extraction is described. Fig-

ure 5 shows the procedure integrating point-clouds

from M frames when L = 3. Here, M point-clouds

go through pre-processing before being integrated by

overlapping them along their coordinate axes.

Two kinds of features are extracted from the point-

clouds obtained by the above integration procedure.

Each feature proposed here can use more points for

feature extraction than a single-frame feature such as

slice feature which only uses four points in a point-

cloud p

, because these features are extracted by using

point-clouds after their densities are increased.

• Approximated volume of a point-cloud: f

• Feature voxel: f

∼ f

However, these features can also be extracted from a

single-frame point-cloud. For convenience, the fol-

lowing explanation assumes that the extraction was

performed from a single-frame point-cloud.

Approximated Volume of a Point-Cloud (f

Approximated volume of a point-cloud represents the

shape of point-clouds by a relative ratio of the points

constituting each point-cloud p

of an object. The ex-

traction procedure for this feature is as follows. First,

(l = 1, 2, · · · , L), that is the number of points con-

stituting a point-cloud p

, are calculated. Next, this

number of points is normalized by the number of all

points from the input point-cloud, and the normalized

value is referred to as α

(l = 1,2, · · · , L). The entire

process is formulated as

l=1

. (3)

Finally, the vector of normalized points is obtained by

concatenating all the normalized values. The number

of dimensions of the feature vector will vary depend-

ing on the number of scan hits L of the input point-

cloud. One feature is extracted from the point-cloud

, and the feature vector f

which is L-dimensions,

is obtained as

= {α

, · · · ,α

, · · · , α

Feature Voxel (f

∼ f

). The feature voxel is a

feature obtained by extracting a single-frame feature

such as the maximum value, or the mean of nor-

malized reﬂection intensity, or the maximum value

of reﬂection intensity from divided regions of point-

clouds. The following describes the procedure of ex-

tracting the feature voxel. First, point-clouds are di-

vided into sub-regions as shown in Figure 6.

Along coordinate axes obtained by pre-

processing, a point-cloud is divided into four

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

160

ଷǡଶ

PCA Divide

ଶǡଵ

ଶǡଶ

ଶǡସ

ଶǡଷ

ଷǡଵ

ଷǡସ

ଷǡଷ

ଵǡଶ

ଵǡସ

ଵǡଷ

ଵǡଵ

Second-PC

First-PC

Second-PC

First-PC

Figure 6: Division of a point-cloud for the extraction of

feature voxel (L = 3).

sub-regions at the center-of-gravity of each point-

cloud p

. By concatenating features extracted from

sub-regions, a feature vector is obtained. The

number of dimensions of the feature vector will

vary depending on the number of scan hits L of the

input point-cloud. Features (β

l,1

, β

l,2

, β

l,3

, β

l,4

) are

extracted from four sub-regions (d

l,1

, d

l,2

, d

l,3

, d

l,4

)

of the p

(l = 1,2, · · · , L). Here, β

l,1

, β

l,2

, β

l,3

, β

l,4

are

conventional features extracted from four sub-regions

of the p

. Thus, 4L dimensions feature vectors

∼ f

are obtained as

= {β

1,1

, β

1,2

, β

1,3

, β

1,4

, · · · , β

l,1

, β

l,2

, β

l,3

, β

l,4

· · · , β

L,1

, β

L,2

, β

L,3

, β

L,4

} (i = 2, 3, 4).

2.2.2 Temporal Changes of Point-clouds f

∼ f

In this section, to utilize the second idea of the tem-

poral changes of point-clouds, f

∼ f

are extracted

by calculating the weighted mean of some single-

frame features (slice feature, relative slice position,

maximum value of the normalized reﬂection inten-

sity, and variance of the normalized reﬂection inten-

sity) extracted from continuous M frames of P so that

the newer frame’s feature has a higher weight. The

weighted mean of continuous point-clouds from M

frames is calculated by a recurrence relation:

s(k) =











s(1) (k = 1),

0.4

s(k− 1)+ 0.6s

s(k) (k = 2, 3, · · · , M),

where s

s(k) is a single-frame feature extracted from

the k-th frame of P, and s

s(k) is the weighted mean

calculated from 1 to the k-th frames. For example, a

weighted mean s

s(2) is calculated by using s

s(1) and

s(2), and a weighted mean

s(3) is calculated by using

s(2) and s

s(3). This calculation is repeated until s

s(M),

and it becomes multi-frame feature vectors f

∼ f

The dimensionality of the ﬁnal feature vector is equal

to that of the original single-frame features, as shown

in Table 1.

Training phase Detection phase

Feature extraction

Classifier construction

3D objects detection

Manual labeling

(Pedestrian / Others)

Classification

Feature extraction

3D objects detection

LIDAR point clouds

Pedestrian classifier

Pedestrian

LIDAR point clouds

Others

Figure 7: Process ﬂow of the proposed method.

3 PEDESTRIAN DETECTION

USING A LOW-RESOLUTION

LIDAR

Figure 7 shows the process ﬂow of the proposed

method. The proposed method is the training phase

for building a classiﬁer to be used for pedestrian de-

tection in advance, and the detection phase for detect-

ing pedestrians from the LIDAR data using the clas-

siﬁer constructed in the training phase.

3.1 Training Phase

3.1.1 3D Object Detection and Manual Labeling

The proposed method extracts point-clouds of three-

dimensional objects from LIDAR data by applying a

three-dimensional object detection and tracking algo-

rithm (Ogawa et al., 2011) for LIDAR data collected

in real-world environments. In this paper, the point-

clouds obtained by three-dimensional object detection

are regarded as the candidate point-clouds and a series

of them are regarded as the candidate point-clouds se-

quence. The candidate point-clouds contained pedes-

trians, trees, poles, traﬃc signs, and so on. These

point-clouds are labeled as “pedestrian” or “other ob-

jects” manually.

3.1.2 Feature Extraction

The proposed method extracts features related to

the shape and the reﬂection intensity of three-

dimensional objects described in Section 2.2 from

successive multi-frame point-clouds in a series of

point-clouds obtained by tracking three-dimensional

objects. The proposed feature extraction method is

based on two ideas using the temporal information of

multi-frame point-clouds.

Can We Detect Pedestrians using Low-resolution LIDAR? - Integration of Multi-frame Point-clouds

161

3.1.3 Classiﬁer Construction

In the proposed method, a pedestrian classiﬁer is

constructed by inputting the features calculated from

point-clouds of pedestrians (positive samples) and

other objects (negative samples) to kernel-SVM,

which generally shows high performance. Here, the

RBF (Radial Basis Function) kernel is used.

3.2 Detection Phase

3.2.1 3D Object Detection

Candidate point-clouds sequences are obtained by ap-

plying the same three-dimensional object detection

and tracking algorithm (Ogawa et al., 2011) on point-

clouds obtained from LIDAR.

3.2.2 Feature Extraction

The same features as the training phase are extracted

from candidate point-clouds sequences.

3.2.3 Classiﬁcation

By inputting these features to the classiﬁer con-

structed in the training phase, the candidate point-

clouds are classiﬁed as either a pedestrian or not. This

output is the detection result of the proposed method.

4 EXPERIMENTS

To conﬁrm the eﬀectiveness of the proposed method

against pedestrian detection using a low-resolution

LIDAR, experiments using in-vehicle low-resolution

LIDAR point-clouds taken in real-world environ-

ments were conducted. The low-resolution LIDAR

used in the experiments was equipped on the top of

the room mirror. The LIDAR could obtain depth data

at 6 vertical directions and 401 horizontal directions

at 10 fps. The vertical detection angle was 6 degrees

(1 degree pitch), and the horizontal detection angle

was 40 degrees (0.1 degree pitch).

4.1 Experimental Procedure

In this experiment, point-clouds of pedestrians and

other objects were collected by the low-resolution LI-

DAR in real-world environments. By applying the

three-dimensional object detection and tracking algo-

rithm (Ogawa et al., 2011) to the collected data, the

candidate point-clouds sequences were obtained.

Table 2: Number of point-clouds for each number of scan

hits L.

Number of samples by L

L = 3 L = 4 L = 5 L = 6

Pedestrians 1,002 5,269 6,934 8,189

Others 5,726 5,904 7,236 13,535

Table 3: Single-frame features and their dimensions.

Feature’s name Dim.

Slice feature 2L

Relative slice position 4(L− 1)

Maximum value of normalized

reﬂection intensity

Mean of normalized

reﬂection intensity

Variance of normalized

reﬂection intensity

Maximum value of

reﬂection intensity

All positive samples and hard negative samples

(pole, tree and so on) extracted from these sequences

were input of the proposed method. Table 2 shows

the result of aggregating the data by each number

of scan hits L used in this experiment. Dividing the

candidate point-clouds sequences by L, the perfor-

mance of the proposed method was evaluated by ﬁve-

fold cross-validation. To ensure fair comparison, the

point-clouds obtained from the same object were not

included in both training samples and test samples.

We evaluated the proposed method by using ROC

(Receiver Operating Characteristic) curve and its

AUC (Area Under the Curve). In this experiment,

three methods, that is the proposed method 1, the pro-

posed method 2 and the comparison method, were

evaluated. The proposed method 1 used the multi-

frame feature f

∼ f

extracted from three frames

(M = 3) for pedestrian classiﬁcation, and the proposed

method 2 used the multi-frame feature f

∼ f

ex-

tracted from three frames (M = 3) for pedestrian clas-

siﬁcation. This method detected pedestrians every

three frames because three frames were required for

the multi-frame feature extraction. For the compar-

ison method, the average of three ROC curves and

AUCs obtained in each of the three frames were used

for the evaluation. In the comparisonmethod, conven-

tional single-frame features were used for the classiﬁ-

cation (Kidono et al., 2011). Table 3 shows the list of

single-frame features, and Table 4 shows the features

used in each method. We implemented feature extrac-

tion and classiﬁcation methods using MATLAB, and

used a conventional PC (Intel Core i7 4790 3.6GHz)

in the experiment.

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

162

0.00 0.05 0.10 0.15 0.20 0.25 0.30

False Positive Rate

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

True Positive Rate

Proposed method 1

Proposed method 2

Comparison method

(a) L = 3.

0.00 0.05 0.10 0.15 0.20 0.25 0.30

False Positive Rate

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

True Positive Rate

Proposed method 1

Proposed method 2

Comparison method

(b) L = 4.

0.00 0.05 0.10 0.15 0.20 0.25 0.30

False Positive Rate

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

True Positive Rate

Proposed method 1

Proposed method 2

Comparison method

0.00 0.05 0.10 0.15 0.20 0.25 0.30

False Positive Rate

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

True Positive Rate

Proposed method 1

Proposed method 2

Comparison method

(d) L = 6.

Figure 8: ROC curves of each method by the number of scan hits L.

Table 4: Evaluated methods and features.

Prop. 1 Prop. 2 Comp.

∼ f

X X

∼ f

Table 5: AUC of each method.

Method

AUC by L

L = 3 L = 4 L = 5 L = 6

Proposed 1 0.937 0.953 0.971 0.960

Proposed 2 0.952 0.963 0.986 0.983

Comparison 0.926 0.891 0.958 0.954

4.2 Results & Discussions

Figures 8(a)–8(d) show the ROC curves of each

method obtained in this experiment. When False Pos-

itive Rate (FPR) was close to 5%, True Positive Rate

(TPR) of the proposed methods 1 and 2 were higher

than that of the comparison method for all numbers of

scan hits L. Note that the detection rate in low FPR

is important for pedestrian detection, where the pro-

posed method outperforms the comparative method.

In addition, the proposed method 2 had higher detec-

tion accuracy than that of the proposed method 1.

Table 5 shows the AUC obtained in this experi-

ment by each method. The AUC values of the pro-

posed method 2 were higher than those of the pro-

posed method 1 and the comparison method for all L,

showing that the proposed method 2 is superior than

the two other methods in overall performance.

The computation time of feature extraction and

classiﬁcation for each candidate are 16.4ms and

1.6 ms, respectively.

4.2.1 Comparison of Detection Accuracy

As shown in Figures 8(a)–8(d), by using a combina-

tion of features discussed in this paper, the proposed

Table 6: Rank of greedy algorithm.

Order

Rank by L

L = 3 L = 4 L = 5 L = 6

1 f

2 f

3 f

4 f

5 f

6 f

7 f

8 f

method could detect pedestrians more accurately than

using conventional single-frame features.

Furthermore, the proposed method 1 using f

∼

based on point-clouds density enhancement much

improvedthe pedestrian detection accuracy especially

in a very low-resolution condition (L = 3, 4). There-

fore, idea 1 of integrating point-clouds to increase

its density was eﬀective when the point-clouds den-

sity was very low. On the other hand, the proposed

method 2 could detect pedestrians much more accu-

rately than the proposed method 1 in relatively high-

resolution condition (L = 5, 6). When L = 5 and 6,

more detailed shape and reﬂection intensity of objects

could be obtained by LIDAR point-clouds, so f

∼ f

based on the temporal change worked well.

4.2.2 Contribution Analysis of Features

To analyze the contribution of each feature towards

the detection accuracy, the ranking of all features was

calculated by using the greedy algorithm. Table 6

shows the selection order (rank) of features by the

greedy algorithm. As shown in Table 6, the order was

diﬀerent for each number of scan hits L.

Table 7 shows a rearranged version of Table 6

according to the proposed ideas (idea 1: integration

of point-clouds to increase their density and idea 2:

temporal changes of features). In Table 7, the rank

of the features based on idea 1 was relatively higher

than that of the features based on idea 2. This trend

Can We Detect Pedestrians using Low-resolution LIDAR? - Integration of Multi-frame Point-clouds

163

Table 7: Rank of greedy algorithm (rearranged according to the fundamental ideas).

Ideas Feature’s name

Rank

Ave.

L = 3 L = 4 L = 5 L = 6

Density

Maximum value of reﬂection intensity voxel 3 4 1 1 2.3

Approximated volume of a point-cloud 2 2 4 4 3.0

Maximum value of normalized reﬂection intensity voxel 1 3 6 5 3.8

Mean of normalized reﬂection intensity voxel 6 5 7 6 6.0

Change

Weighted mean of relative slice position 4 7 3 2 4.0

Weighted mean of slice feature 7 6 2 3 4.5

Weighted mean of maximum value of normalized

reﬂection intensity

8 1 5 8 5.5

Weighted mean of variance of normalized

reﬂection intensity

5 8 8 7 7.0

suggests that the idea 1 was more eﬀective than the

idea 2. This is because it is likely that the move-

ment of pedestrians and vehicles in the range of three

frames used for multi-frame feature extraction was

very small.

5 CONCLUSIONS

In this paper, we proposed a pedestrian detection

method using multi-frame features extracted from

low-resolution LIDAR data. We introduced the multi-

frame features extracted by combining point-clouds

over multiple frames to increase its resolution and

capturing temporal changes of the point-clouds. The

proposed method detected pedestrians using the clas-

siﬁer trained by inputting the LIDAR data divided by

their numbers of scan hits L.

Using the data collected in real-world environ-

ments, experiments showed the proposed method us-

ing a combination of proposed multi-frame features,

could detect pedestrians more accurately than using

conventional single-frame features. We also analyzed

the contribution of each feature to the performance

improvement. The results showed the idea of integrat-

ing point-cloudsto increase their density was eﬀective

for pedestrian detection from low resolution LIDAR.

Future work includes improvement of the pro-

posed method considering the combination of single-

frame features and multi-frame features simultane-

ously, construction of the classiﬁer using partial AUC

(Narasimhan and Agarwal, 2013), and comparison of

features learned by Deep Learning.

ACKNOWLEDGEMENTS

Parts of this research were supported by MEXT,

Grant-in-Aid for Scientiﬁc Research.

REFERENCES

World Health Organization. (2015). Global status report on

road safety 2015.

Arras, K. O., Mozos, O. M., and Burgard, W. (Apr. 2007).

Using boosted features for the detection of people in

2D range data. In Proc. 2007 IEEE Int. Conf. on

Robotics and Automation, pages 3402–3407.

Kidono, K., Miyasaka, T., Watanabe, A., Naito, T., and

Miura, J. (June 2011). Pedestrian recognition using

high-deﬁnition LIDAR. In Proc. 2011 IEEE Intelli-

gent Vehicles Symposium, pages 405–410.

Maturana, D. and Scherer, S. (Sept. 2015). Voxnet: A

3D convolutional neural network for real-time object

recognition. In Proc. 2015 IEEE/RSJ Int. Conf. on In-

telligent Robots and Systems, pages 922–928.

Narasimhan, H. and Agarwal, S. (Aug. 2013). SVM pAUC

tight: A new support vector method for optimizing

partial AUC based on a tight convex upper bound. In

Proc. 19th ACM SIGKDD Int. Conf. on Knowledge

Discovery and Data Mining, pages 167–175.

Navarro-Serment, L. E., Mertz, C., and Hebert, M. (Oct.

2010). Pedestiran detection and tracking using three-

dimensional LADAR data. Int. J. of Robotics Re-

search, vol.29, no.12, pages 1516–1528.

Ogawa, T., Sakai, H., Suzuki, Y., Takagi, K., and Morikawa,

K. (June 2011). Pedestrian detection and tracking

using in-vehicle LIDAR for automotive application.

In Proc. 2011 IEEE Intelligent Vehicles Symposium,

pages 734–739.

Premebida, C., Ludwig, O., and Nunes, U. (Oct. 2009). Ex-

ploiting LIDAR-based features on pedestrian detec-

tion in urban scenarios. In Proc. 2009 IEEE Int. Conf.

on Intelligent Transportation Systems, pages 1–6.

Shroﬀ, D., Nangalia, H., Metawala, A., Parulekar, M., and

Padte, V. (Jan. 2013). Dynamic matrix and model pre-

dictive control for a semi-auto pilot car. In Proc. 2013

IEEE Int. Conf. on Advances in Technology and Engi-

neering, pages 1–5.

Spinello, L., Luber, M., and Arras, K. O. (May 2011).

Tracking people in 3D using a bottom-up top-down

detector. In Proc. 2011 IEEE Int. Conf. on Robotics

and Automation, pages 1304–1310.

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

164