Multi-Pedestrian Tracking and Map-Based Intention Estimation for
Autonomous Driving Scenario
Ali Dehghani and Lucila Patino Studencki
Coburg University of Applied Sciences and Arts, Faculty of Mechanical and Automotive Engineering, Coburg, Germany
Keywords:
Pedestrian Intention Estimation, Multiple Pedestrian Tracking, Situational Awareness, Autonomous Driving,
Autonomous Shuttle.
Abstract:
Pedestrian intentions estimation and tracking have become essential for the development of autonomous ve-
hicles (AVs). The vehicles need to be aware of pedestrians to avoid fatalities even in complex urban traffic.
This requires understanding the most probable trajectory of pedestrians to accordingly plan the vehicle’s ma-
neuvers. This complex task requires modeling how multiple pedestrians interact with each other and move
depending on their environment. This paper employs a Gaussian Mixture Probability Hypothesis Density Fil-
ter, enhanced by the Generalized Potential Field Approach (GMPHD-GPFA), to simultaneously track multiple
pedestrians and determine and predict their behavior seconds ahead. The model used considers the static envi-
ronment of the pedestrians to estimate their intentions and improve prediction accuracy. The paper evaluates
both the tracking efficiency of the algorithm and its capability to predict the intentions of multiple pedestrians.
1 INTRODUCTION
Intention estimation and tracking of pedestrians is a
fundamental aspect of Vehicle Environment Percep-
tion, a crucial component in the advancing field of
Intelligent Vehicle Technologies. An intelligent ve-
hicle should safely maneuver through complex envi-
ronments, including vulnerable road users (VRUs),
such as pedestrians. This capability is essential for
protecting VRUs and contributes to improving the
overall travel experience for passengers. By accu-
rately understanding and predicting pedestrian behav-
iors, intelligent vehicles can seamlessly integrate into
urban traffic and adjust their navigation strategies ac-
cordingly. Motivated by experiences with the Shut-
tle Modellregion Oberfranken (SMO) project in Kro-
nach, Germany (SMO, 2022), this study addresses the
critical need for advanced pedestrian intention esti-
mation in autonomous shuttle operations. As stated
in (Dehghani et al., 2023), the challenges encoun-
tered, particularly those involving unforeseen pedes-
trian intentions that often cause shuttle abrupt halts, il-
luminate the necessity for precise prediction of pedes-
trian goals. Pedestrian movements depend on a multi-
tude of factors, including different customs and infor-
mal regulations (social norms) related to each country
that significantly impact how people behave in traffic
and how they communicate their intentions (F
¨
arber,
2016). Furthermore, factors such as the street’s width
and the presence of traffic signals impact pedestrian
behavior. In narrower or signalized areas, pedestri-
ans may become less cautious, often crossing without
checking for traffic (Rasouli et al., 2017). Consider-
ing all these factors, predicting pedestrian intentions
requires an accurate model. Nonetheless, this task is
complex due to the variability in the number of pedes-
trians and their reactions to environmental factors
such as traffic density, road conditions, regulations,
social influences, and other circumstances (Rasouli
et al., 2017). Figure 1 illustrates an urban traffic sce-
nario observed from the viewpoint of an autonomous
vehicle, highlighting the complex interaction of dif-
ferent dynamic elements in challenging environmen-
tal circumstances. The scene involves multiple pedes-
trians, each potentially following separate routes and
having different objectives, various moving vehicles,
plenty of traffic signs, and adverse weather condi-
tions, which raises critical questions about the final
objectives of pedestrians. What are all pedestrians’ fi-
nal intentions, and which pedestrian can cause a col-
lision?
Rudenko et al. have devised a taxonomy that orga-
nizes current solutions based on their motion model-
ing techniques and the degree of contextual informa-
tion utilized (Rudenko et al., 2020). They divided the
modeling approach for predicting pedestrian motion
386
Dehghani, A. and Studencki, L.
Multi-Pedestrian Tracking and Map-Based Intention Estimation for Autonomous Driving Scenario.
DOI: 10.5220/0012691700003702
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 10th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2024), pages 386-393
ISBN: 978-989-758-703-0; ISSN: 2184-495X
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
Figure 1: A complex autonomous driving scenario in Kro-
nach, Germany under challenging conditions.
into three types: physics-based, pattern-based, and
planning-based. In physics-based approaches, many
motion prediction techniques model human move-
ment using basic kinematic principles (using New-
ton’s laws to model movement), capturing position,
velocity, and acceleration for simplicity and effective-
ness in stable conditions with short-term forecasts,
for instance, (Elnagar, 2001) use of a Kalman Filter
(KF) for tracking dynamic obstacles. However, the
aforementioned work only makes predictions that are
one step ahead and ignore contextual cues from the
environment. In pattern-based approaches, utilizing
the data collected from the environment or previous
observed trajectories to predict motion patterns also
demonstrates enhanced accuracy (Chen et al., 2016).
Razali et al. (Razali et al., 2021) present a vision-
based system that integrates pedestrian localization,
body pose estimation, and intention prediction using a
multi-task convolutional neural network, offering en-
hanced precision in intention prediction. However,
the effectiveness of data-driven prediction methodolo-
gies largely depends on the quantity, quality, and va-
riety of data, including various factors such as age,
gender, geographical landscapes, weather conditions,
lighting conditions, specific traffic scenarios, cultural
norms, legal norms and social norms. Consequently,
acquiring and processing such a substantial volume of
labeled training data poses a challenge in real-world
scenarios due to the computational intensity required
(Keller and Gavrila, 2013). Moreover, they mostly
do not consider the interaction of multiple pedestri-
ans in the scenario. Planning-based approaches to
pedestrian motion prediction try to understand the in-
tentions behind a pedestrian’s movement by following
a sense-reason-predict scheme about the likely goals
and possible path to reach the goal. They typically
focus on using optimization techniques by applying
predefined cost functions (forward planning) or learn-
ing these functions from observed behavior (inverse
planning). A number of approaches model the prob-
abilities of the future motion based on cost-to-go
value estimates. They propose a probabilistic goal-
directed motion model that accounts for several goals
in the environment (Best and Fitch, 2015)(Vasquez,
2016). While these approaches are suitable for sce-
narios where understanding the underlying intent or
goal is crucial, they are not as effective in dynamically
changing environments where objects frequently ap-
pear and disappear or when dealing with a large num-
ber of objects. These methods can be expanded to
consider different contextual cues (map-based, and
dynamic environment cues) that impact pedestrian be-
havior. This combination facilitates the creation of
more accurate and contextually sensitive forecasts by
considering factors such as societal norms, traffic sig-
nals, environmental layout, and psychological condi-
tions.
We propose a comprehensive solution for predict-
ing pedestrian motion using a technique that inher-
its physics-based and planning-based characteristics
that can simultaneously handle multiple pedestrians
in a complex automotive driving scenario. Such an
approach would leverage the accuracy of physics-
based models that adhere to Newton’s laws for move-
ment and the insight of planning-based models that
infer intentions and goals to forecast future paths.
This hybrid method would not only model the im-
mediate physical interactions but also incorporate a
deeper understanding of pedestrian behavior, mak-
ing predictions more robust in complex environments
where anticipating future movements is crucial. Our
algorithm enhances the Gaussian Mixture Probabil-
ity Hypothesis Density (GMPHD) Filter (Clark et al.,
2006) with the Generalized Potential Field Approach
(GPFA) (Particke et al., 2017). This hybrid prediction
approach creates a dynamic pedestrian motion model,
which integrates a broader range of influences, in-
cluding environmental layouts and individual pedes-
trian goals, into a unified framework to find the most
probable objective (intention) of all pedestrians. This
paper is structured as follows: In Section II the pro-
posed method including environmental data model-
ing as a potential field, a dynamic model for pedestri-
ans and Probability Hypothesis Density Filter(PHD)
is presented. The experiments for demonstrating the
effectiveness of our algorithm comes in Section III,
and the paper’s conclusion and suggestions for further
research are presented in Section IV.
Multi-Pedestrian Tracking and Map-Based Intention Estimation for Autonomous Driving Scenario
387
2 MULTI-PEDESTRIAN
TRACKING
2.1 Tracking Algorithm
In general, as shown in Figure 2, tracking multiple
pedestrians requires that they are first detected by
some sensory input. Advanced algorithms are then
applied to interpret the raw data, distinguish pedestri-
ans from other objects, and predict the intentions. In
this paper, we assume that the pedestrian tracking had
already been performed and positions in a 3D coordi-
nate system were available. We focus mainly on the
Map to Potential Field, the tracking algorithm, and
the Pedestrian Trajectory Prediction parts.
Pedestrian
Detection
Map to
Potential
Field
Tracking
Algorithm
Pedestrian
Trajectory
Prediction
Figure 2: General Architecture of the Multi-Pedestrian
Tracking System.
The concept of potential fields has been exten-
sively applied in various research areas, including
flocking behavior, trajectory planning, and pedestrian
crowd analysis. However, existing methods like the
social force model face limitations when dealing with
individual pedestrians or small groups, as they are op-
timized for pedestrian crowds. Moreover, the number
of parameters to be set is huge. To overcome these
challenges, the Generalized Potential Field Approach
(GPFA) was developed. It combines a potential field
with a kinematic motion model, ensures applicability
to single pedestrians and small groups, and simpli-
fies parametrization (Particke, 2020). To calculate the
potential field, every pedestrian is regarded as a test
particle in several different potential fields. Each field
(φ
k
N
) stands for a different information source, such as
a map of the surrounding area. Each potential field is
made up of a variety of potential sources (φ
k
i
), such as
obstacles. The potential at the pedestrian’s position is
calculated using the following equation:
φ
k
N
=
n
k
i=1
p
k
d
k
iN
φ
k
i
(1)
In this equation, p
k
(d
k
iN
) represents the weight
of each potential source, which depends on the Eu-
clidean distance d
k
iN
between the pedestrian and the
potential source but is independent of time.
As demonstrated in (Particke, 2020) the influence
of the potential field on the pedestrian can be modeled
as an acceleration vector a
k
N
of source k at position
PN, which affects directly the pedestrian’s dynamics.
The dynamic model of the pedestrian considers
both the gradient of the potential field
∇φ
k
N
and the
flow resistance (F
W
= c
w
v
2
N
evN):
a
k
N
=
∇φ
k
N
c
w
v
2
N
evN
m
p
(2)
where the pseudo mass m
p
and the drag coeffi-
cient c
w
parameters must be configured appropriately
to represent the expected dynamics of the pedestrian.
Similar to (Particke et al., 2017), a constant veloc-
ity model as a dynamic model in the Kalman Filter for
the pedestrian movement was used.
The PHD filter is a well-known method for multi-
target tracking based on the ideas of random finite
sets and was first introduced by Mahler and Ronald
(Mahler, 2003). Later, Clark and et al. (Clark et al.,
2006) proposed the Gaussian Mixture PHD (GM-
PHD) filter, a computationally effective implementa-
tion of the PHD filter. The PHD filter is exceptionally
well suited for handling an unknown and time-varying
number of targets (Gao et al., 2021), which is a fre-
quent challenge when attempting to follow numerous
pedestrians with various intentions. Each object in a
GM-PHD filter is presumed to follow a linear Gaus-
sian model. However, the multiple target posterior
distribution need not have the same covariance matri-
ces so that it will be a Gaussian mixture (GM). Given
a state p(x
k1
) at time k 1, the probability density
of a transition to the state p (x
k
) at time k at time k is
Transition Density and given by:
f
k|k1
(x
k
| x
k1
) (3)
In the context of the GM-PHD filter, the Kalman Fil-
ter is utilized for the state prediction of each target,
considering their acceleration. The state prediction
equation in the Kalman Filter is given by:
x
k
= F
k
x
k1
+ B
k
u
k
(4)
x
k
is the state vector at time k, which typically in-
cludes position and velocity. F
k
is the state transition
matrix, mapping the previous state x
k1
to the current
state x
k
. B
k
is the control input model. u
k
is the con-
trol vector, incorporating the acceleration (a) of the
pedestrian, obtained from the potential field.
In addition, assuming a state x
k
at time k, the prob-
ability density of receiving the detection z
k
gives the
Likelihood Function as following:
g
k
(z
k
| x
k
) (5)
VEHITS 2024 - 10th International Conference on Vehicle Technology and Intelligent Transport Systems
388
The probability density of state x
k
given all the
prior observations is represented by the notation
p
k
(x
k
| z
1:k
) for the posterior density. Applying
Bayes’ recursion, we can demonstrate that the pos-
terior density is actually as following using an initial
density of p
0
(·):
p
k
(x
k
| z
1:k
) =
g
k
(z
k
| x
k
) p
k|k1
(x
k
| z
1:k1
)
R
g
k
(z
k
| x) p
k|k1
(x | z
1:k1
)dx
(6)
In the GM-PHD filter, each target is treated as in-
dependent from the others regarding the generation
of observations and its evolution. The two equa-
tions clearly demonstrate that the PHD filter effec-
tively eliminates the combinatorial calculations re-
sulting from the unassigned association of measure-
ments to specific targets.
In a GMPHD-based GPFA, the tracking system
initializes with predefined system parameters and an
”Intention Map” that outlines the pedestrians’ prob-
able objectives. The GPFA calculates the movement
acceleration of pedestrians toward each intended des-
tination, serving as the control input for the PHD pre-
diction process. At each time step, the system mea-
sures actual pedestrian movements, and during the
association phase, the system generates multiple hy-
potheses based on the prior predictions and new mea-
surements. Finally, these hypotheses undergo refine-
ment in the update process.
Figure 3 shows the general workflow of the pro-
posed PHD-GPFA algorithm. It is assumed, that the
vehicle knows where it is and loads the topological
map of its environment. Based on this, the potential
field map for each intention is calculated according to
(2.1) and finally the acceleration for each hypothet-
ical intention is derived. Using this information the
prediction for time point k is calculated. Subsequently
the measurements are acquired and the association for
each of the hypothesis based on the mahalanobis dis-
tance is performed. Later, the measurement update for
each hypothesis is performed and finally, the selec-
tion of the most probable track (’Confirmed Tracks’)
according to the likelihood is performed.
2.2 Map to Potential Field
The primary aim of the GPFA is to improve the ef-
ficiency of models utilized in classic Kalman filter-
ing or Monte Carlo techniques. For this purpose,
an acceleration vector should be generated from in-
formation sources like attractive components (inten-
tions) and repulsive components. In simpler terms,
the pedestrians can be likened to test particles moving
within a potential field, where their movements are in-
fluenced by both attractions towards certain goals and
repulsions from obstacles.
Data Association
Hypothesis generation
based on
Mahalanobis distance
GPFA-based Prediction
Update
Confirmed Tracks
Measurements
1. Calculate the potential field
2. Gradient of potential field
3. Derive acceleration vector
Initialization
Intention Map
&
(control input)
Figure 3: PHD-GPFA Flow diagram.
The explained potential field is generated using
mainly the topological description of the area around
the car. However some assumptions regarding so-
cial force models and pedestrian behaviour are being
made:
Pedestrians have specific intentions (destinations)
and they aim to achieve them
Pedestrians tend to go on sidewalks
Pedestrians tend to cross road on zebras or corners
Pedestrians tend to avoid collisions
Pedestrians tend to keep their travel direction
Following this assumption, map elements such as ze-
bras or sidewalks are described as attraction areas.
Map elements like buildings are described as repul-
sion areas. Moreover, accounting for the direction of
walk is necessary to define plausible destinations.
The potential field generation is a key function of
the component termed map to potential field. For il-
lustration, Figure 4 demonstrates the concept within a
street scenario. Additionally, an example of a poten-
tial field for a driving scenario is depicted in Figure 5.
Using the defined potential field, the tracking and
posterior prediction of pedestrian trajectories are per-
formed.
How can we improve pedestrian trajectory predic-
tion using map data? To answer this question, we
Multi-Pedestrian Tracking and Map-Based Intention Estimation for Autonomous Driving Scenario
389
Figure 4: Street scenario.
Figure 5: Potential field example for a street scenario.
need to model the map as a potential field by utiliz-
ing the map information. The potential field is com-
puted by evaluating the influence of conic compo-
nents across a grid of points that extends the domain
of interest. While the influence of repulsive compo-
nents typically exhibits an inversely exponential re-
lationship with distance, modulated by a scaling fac-
tor, the influence of conic components in the case of
”intention,” in this instance, exhibits a linear relation-
ship with distance, also modulated by a scaling factor
(Particke et al., 2017). This influence decreases lin-
early with increasing distance. Figure 6 depicts the
resulting field in both 3D surface and 2D heatmap for-
mats. These visualizations depict the strength of the
field at a given point as the Z-value in the 3D surface
plot and the color intensity in the 2D heatmap, respec-
tively.
Y(m)
0
5
10
15
X(m)
0
5
10
8
6
4
0 5 10 15
Y(m)
0.0
2.5
5.0
7.5
10.0
X(m)
Figure 6: Pedestrian in potential field map.
3 EXPERIMENTAL RESULTS
In this study, we utilized a publicly available pedes-
trian trajectory dataset from the University of Edin-
burgh’s School of Informatics (Majecka, 2009), cho-
sen for its overhead camera system that captures clear,
minimally noisy pedestrian paths in a public space.
The dataset’s precise, real-world coordinate trajecto-
ries and the variety of pedestrian movements provide
an ideal base for adding controlled noise (specifically
additive white Gaussian noise) for analysis. Its di-
verse path patterns, originating from different points
but diverging towards multiple destinations, accu-
rately represent the dynamic nature of pedestrian traf-
fic in real environments.
For the experiment, trajectories of three pedestri-
ans going to two target regions (two intentions) were
selected. The extracted data were used as ground
truth, and some artificial measurement noise (σ = 0.1)
was added to assess the performance of our approach.
The approach was first evaluated using the best-
case scenario, where one pedestrian with a known
intention is tracked. Subsequently, the algorithm’s
capability to identify the real intention of a pedes-
trian was evaluated by tracking one pedestrian with
three possible intentions (defined as hypotheses for
the PHD-GPFA). Finally, the algorithm was tested
on scenarios involving multiple pedestrians each with
multiple unknown intentions.
3.1 One Pedestrian with One Known
Intention
The first experiment conducted involves tracking a
single pedestrian, with the pedestrian’s intention as-
sumed to be known at the position x = 6.61m and
y = 11.11m. The acceleration vector calculated by the
GPFA is then used as a control input for the Kalman
filter’s state prediction stage. The KF and KF-GPFA
methods were used to estimate the next state based
on the previous measurement and the motion model.
In addition, predictions for future time points ranging
from 1 to 10 seconds were made using both methods.
In addition, predictions for future time points,
ranging from 1 to 10 seconds, were made using both
methods. These predictions represent the time an au-
tonomous vehicle would need to react in case of a
collision and the position uncertainty related to this
time-frame. The results of the estimation and predic-
tion processes are presented in Figure 7.
On closer inspection of the data, it is evident that
the Kalman Filter provides a reasonably accurate es-
timate of the pedestrian’s position, with only minor
deviations compared to the ground truth data. Si-
VEHITS 2024 - 10th International Conference on Vehicle Technology and Intelligent Transport Systems
390
0 100
Time(s)
5
10
x coordinate
X Estimation
GT
KF Est x
GPFA Est x
0 100
Time(s)
0
10
y coordinate
Y Estimation
GT
KF Est y
GPFA Est y
0 10
Prediction over time (s)
0
1
RMSE
X RMSE
Kalman Filter
GPFA
0 10
Prediction over time (s)
0
1
RMSE
Y RMSE
Kalman Filter
GPFA
Figure 7: Trajectory estimation and prediction using KF
(state-of-the-art) and GPFA compared to Ground Truth
(GT).
multaneously, the KF-GPFA technique demonstrates
remarkable performance. This demonstrates the po-
tential benefits of using an enhanced method such as
KF-GPFA for more precise pedestrian position es-
timation. On the lower part of Figure 7, the pre-
diction capabilities of a vehicle, based on the esti-
mated trajectory and intention for different time in-
tervals, are shown. As expected, increasing the time
interval for making a prediction, the error increases.
In terms of forecasting pedestrian positions, the KF-
GPFA displays a noticeable enhancement over the
standard Kalman Filter.
3.2 One Pedestrian with Three Possible
Intentions
In the prior experiment, one of the limitations was the
assumption that the pedestrian’s intention or goal was
known, whereas, in autonomous scenarios, this in-
formation is typically unavailable. Motivated by this
discrepancy, the current experiment focuses on inte-
grating an unknown intention into pedestrian tracking
and prediction, aiming to address the question of how
pedestrians’ intentions can be estimated in advance.
As mentioned before, the proposed solution in-
volves estimating the probability of each possible in-
tention. This is achieved by modeling the intentions
as conic well components in the potential field, which
is derived from the map of the immediate environ-
ment. These components are then evaluated at each
estimated pedestrian position. Figure 8 illustrates a
potential field map, highlighting three plausible inten-
tions (hypotheses) for a pedestrian.
The estimation result for the most probable hy-
pothesis is presented in the upper subplot of Figure 9.
The estimated positions closely align with the actual
Figure 8: Potential field map for the unknown intention,
where the three wells represent the intentions.
0 2 4 6 8 10
Y Position (m)
0.0
2.5
5.0
7.5
10.0
12.5
X Position (m)
GT
Est
Hypo 1
Hypo 2
Hypo 3
0 20 40 60 80 100
Time Step (s)
0
20
40
60
80
100
Weight (percent)
Hypo 1
Hypo 2
Hypo 3
Figure 9: Tracking and probabilistic intention estimation of
a pedestrian over time, compared to ground truth (GT), with
an analysis of hypothesis (Hypo) weights.
positions, indicating accurate estimation. The pedes-
trian’s intentions are represented by polygons of var-
ious hues, denoting various hypotheses. The second
subplot depicts the likelihood; the larger the weight,
the more probable the hypothesis. This graph illus-
trates the efficacy of the GPFA algorithm in finding
the most probable intention, estimating positions, and
tracking the evolution of hypothesis probabilities.
3.3 Multiple Pedestrian with Multiple
Possible Intentions
While the previously mentioned experiment, fo-
cused on a single pedestrian with multiple intentions,
real automotive driving scenarios necessitate track-
ing multiple intentions of several pedestrians simul-
Multi-Pedestrian Tracking and Map-Based Intention Estimation for Autonomous Driving Scenario
391
taneously. In this experiment the goal is to validate
whether our algorithm could effectively handle multi-
ple pedestrians. For this, we examined three pedestri-
ans, each potentially having one of two distinct inten-
tions, labeled as intention one and intention two. In
order to evaluate the algorithm’s efficiency, we now
consider three pedestrians walking in the same area,
with two possible intention hypotheses given for each
pedestrian. The results of the tracking are presented in
Figure 10 as a 2D plot and in Figure 11 as root mean
square error (RMSE) for each of the six hypotheses.
0.0 2.5 5.0 7.5 10.0 12.5
X (m)
0.0
2.5
5.0
7.5
10.0
Y (m)
Ped 1 - Int 1
Ped 1 - Int 2
Ped 2 - Int 1
Ped 2 - Int 2
Ped 3 - Int 1
Ped 3 - Int 2
Truth 1
Truth 2
Truth 3
Intention 1
Intention 2
Figure 10: Track estimation for multi-pedestrian scenarios
utilizing GMPHD-GPFA; involving three pedestrians each
with two potential intentions, yielding a total of six hypoth-
esis scenarios.
Our observations reveal that the hypotheses corre-
sponding to the correct intentions—Pedestrian 1 with
Intention 1, Pedestrian 2 with Intention 2, and Pedes-
trian 3 with Intention 1—demonstrate lower RMSE
errors, aligning with our expectations. This sug-
gests that the algorithm can effectively differentiate
between the correct and incorrect intentions for each
trajectory.
Ped 1 - Int 1
Ped 1 - Int 2
Ped 2 - Int 1
Ped 2 - Int 2
Ped 3 - Int 1
Ped 3 - Int 2
0.00
0.05
0.10
RMSE
0.09
0.11
0.08
0.06
0.08
0.12
Figure 11: RMSE of multiple trajectories.
Analyzing the weight of different intentions also
illustrates the finding of the correct intention accord-
ing to Figure 12. After some time steps (between
40-60) the real intention can be clearly distinguished,
corresponding to around 4 til 6 seconds. Although it is
a significant improvement with respect to the state-of-
the-art algorithms, this is long for a successful aware-
ness of an autonomous vehicle. This is due to the fact
that the algorithm makes its inferences mainly based
on position data, and only if the trajectories take no-
table different ways can the intention be clearly deter-
mined.
Analyzing the weight of different intentions also
illustrates the finding of the correct intention accord-
ing to Figure 12. After some time steps (between
40–60) the real intention can be clearly distinguished,
corresponding to around 4 to 6 seconds. Although it is
a significant improvement with respect to the state-of-
the-art algorithms, this duration is long for a success-
ful awareness of an autonomous vehicle. This is be-
cause the algorithm makes its inferences mainly based
on position data, and only if the trajectories take no-
tably different paths can the intention be clearly de-
termined.
0 20 40 60 80 100
Time (s)
0.25
0.50
0.75
Weight (percent)
Ped 1 - Int 1
Ped 1 - Int 2
Ped 2 - Int 1
Ped 2 - Int 2
Ped 3 - Int 1
Ped 3 - Int 2
Figure 12: Assessing Diverse Hypothesis Weights for
Pedestrian Intention Detection.
The Table 1 shows the RMSE results for both es-
timations and predictions:
Table 1: RMSE Comparison of the prediction capabilities
in each experiment.
Trajectory Estimation Prediction
KF GMPHD-GPFA 1s 2s 3s
Pedestrian 1 0.11 0.09 0.42 1.26 2.43
Pedestrian 2 0.07 0.06 0.49 1.42 2.50
Pedestrian 3 0.10 0.08 0.32 0.98 1.91
The first column corresponds to state of the art al-
gorithm based on KF and PHD, the second column
coincides with the results in Figure 11, for the predic-
tion of the RMSE observed at a 1–3 second interval.
While this may seem high compared to results from
individual pedestrians, it is crucial to note the dif-
ferences between the experiments. The intentions in
the experiment involving multiple pedestrians are un-
known, unlike in the single-pedestrian scenario. Ad-
VEHITS 2024 - 10th International Conference on Vehicle Technology and Intelligent Transport Systems
392
ditionally, a higher number of hypotheses influences
the estimation quality, adding complexity that the al-
gorithm must manage. Although the algorithm suc-
cessfully tracks objects and infers intentions, it does
not directly consider changes in those intentions.
Despite the promising results, the prediction ca-
pability falls short of meeting the timing and accu-
racy requirements for autonomous vehicles operating
in urban environments. There is a need for further
development to enhance the model with faster inten-
tion detection techniques. Such improvements could
involve using gestures or other indicators, extending
beyond reliance solely on trajectory data.
4 CONCLUSION
This paper introduces an approach that combines
physical-based and planning-based modeling for
tracking and predicting the positions and intentions
of multiple pedestrians around an autonomous vehi-
cle. Utilizing a Probability Hypothesis Density Filter
(PHD) integrated with a Generalized Potential Field
Approach (GPFA), the proposed algorithm generates
multiple hypotheses and continuously tracks them,
effectively identifying pedestrians’ actual intentions.
This enables autonomous vehicles to accurately fore-
cast pedestrian movements and re-planing maneuvers
accordingly. However, accelerating the detection of
intentions remains a challenge that requires further
development. The study also highlights the criti-
cal role of incorporating map information in defin-
ing tracking hypotheses, significantly enhancing the
model’s precision and reliability.
ACKNOWLEDGEMENTS
The SMO project is supported by the Federal Ministry
of Transport and Digital Infrastructure of Germany.
For more information about the project, please see:
www.shuttle-modellregion-oberfranken.de
REFERENCES
Best, G. and Fitch, R. (2015). Bayesian intention infer-
ence for trajectory prediction with an unknown goal
destination. In 2015 IEEE/RSJ International Confer-
ence on Intelligent Robots and Systems (IROS), pages
5817–5823. IEEE.
Chen, Y., Liu, M., Liu, S.-Y., Miller, J., and How, J. P.
(2016). Predictive modeling of pedestrian motion pat-
terns with bayesian nonparametrics. In AIAA guid-
ance, navigation, and control conference, page 1861.
Clark, D. E., Panta, K., and Vo, B.-N. (2006). The gm-phd
filter multiple target tracker. In 2006 9th International
Conference on Information Fusion, pages 1–8. IEEE.
Dehghani, A., Salar, H., Srinivasan, S., Zhou, L., Arbeiter,
G., Lindner, A., and Patino-Studencki, L. (2023). En-
hancing availability of autonomous shuttle services: A
conceptual approach towards challenges and opportu-
nities. Manuscript under review.
Elnagar, A. (2001). Prediction of moving objects in dy-
namic environments using kalman filters. In Proceed-
ings 2001 IEEE International Symposium on Compu-
tational Intelligence in Robotics and Automation (Cat.
No. 01EX515), pages 414–419. IEEE.
F
¨
arber, B. (2016). Communication and communication
problems between autonomous vehicles and human
drivers. Autonomous driving: Technical, legal and so-
cial aspects, pages 125–144.
Gao, Y., Jiang, D., Zhang, C., and Guo, S. (2021). A labeled
gm-phd filter for explicitly tracking multiple targets.
Sensors, 21(11):3932.
Keller, C. G. and Gavrila, D. M. (2013). Will the pedes-
trian cross? a study on pedestrian path prediction.
IEEE Transactions on Intelligent Transportation Sys-
tems, 15(2):494–506.
Mahler, R. P. (2003). Multitarget bayes filtering via first-
order multitarget moments. IEEE Transactions on
Aerospace and Electronic systems, 39(4):1152–1178.
Majecka, B. (2009). Statistical models of pedestrian be-
haviour in the forum. Master’s thesis, School of Infor-
matics, University of Edinburgh.
Particke, F. (2020). Predictive Pedestrian Awareness
with Intention Uncertainties for Autonomous Driv-
ing. PhD thesis, Friedrich-Alexander-Universit
¨
at
Erlangen-N
¨
urnberg (FAU).
Particke, F., Patino-Studencki, L., Thielecke, J., and Feist,
C. (2017). Pedestrian tracking using a generalized po-
tential field approach. In VISIGRAPP (6: VISAPP),
pages 509–514.
Rasouli, A., Kotseruba, I., and Tsotsos, J. K. (2017). Un-
derstanding pedestrian behavior in complex traffic
scenes. IEEE Transactions on Intelligent Vehicles,
3(1):61–70.
Razali, H., Mordan, T., and Alahi, A. (2021). Pedes-
trian intention prediction: A convolutional bottom-up
multi-task approach. Transportation research part C:
emerging technologies, 130:103259.
Rudenko, A., Palmieri, L., Herman, M., Kitani, K. M.,
Gavrila, D. M., and Arras, K. O. (2020). Human mo-
tion trajectory prediction: A survey. The International
Journal of Robotics Research, 39(8):895–935.
SMO (2022). Shuttle modellregion ober-
franken (smo) project. https://www.
shuttle-modellregion-oberfranken.de/.
Vasquez, D. (2016). Novel planning-based algorithms
for human motion prediction. In 2016 IEEE In-
ternational Conference on Robotics and Automation
(ICRA), pages 3317–3322. IEEE.
Multi-Pedestrian Tracking and Map-Based Intention Estimation for Autonomous Driving Scenario
393