Multi-Pedestrian Tracking and Map-Based Intention Estimation for

Autonomous Driving Scenario

Ali Dehghani and Lucila Patino Studencki

Coburg University of Applied Sciences and Arts, Faculty of Mechanical and Automotive Engineering, Coburg, Germany

Keywords:

Pedestrian Intention Estimation, Multiple Pedestrian Tracking, Situational Awareness, Autonomous Driving,

Autonomous Shuttle.

Abstract:

Pedestrian intentions estimation and tracking have become essential for the development of autonomous ve-

hicles (AVs). The vehicles need to be aware of pedestrians to avoid fatalities even in complex urban trafﬁc.

This requires understanding the most probable trajectory of pedestrians to accordingly plan the vehicle’s ma-

neuvers. This complex task requires modeling how multiple pedestrians interact with each other and move

depending on their environment. This paper employs a Gaussian Mixture Probability Hypothesis Density Fil-

ter, enhanced by the Generalized Potential Field Approach (GMPHD-GPFA), to simultaneously track multiple

pedestrians and determine and predict their behavior seconds ahead. The model used considers the static envi-

ronment of the pedestrians to estimate their intentions and improve prediction accuracy. The paper evaluates

both the tracking efﬁciency of the algorithm and its capability to predict the intentions of multiple pedestrians.

1 INTRODUCTION

Intention estimation and tracking of pedestrians is a

fundamental aspect of Vehicle Environment Percep-

tion, a crucial component in the advancing ﬁeld of

Intelligent Vehicle Technologies. An intelligent ve-

hicle should safely maneuver through complex envi-

ronments, including vulnerable road users (VRUs),

such as pedestrians. This capability is essential for

protecting VRUs and contributes to improving the

overall travel experience for passengers. By accu-

rately understanding and predicting pedestrian behav-

iors, intelligent vehicles can seamlessly integrate into

urban trafﬁc and adjust their navigation strategies ac-

cordingly. Motivated by experiences with the Shut-

tle Modellregion Oberfranken (SMO) project in Kro-

nach, Germany (SMO, 2022), this study addresses the

critical need for advanced pedestrian intention esti-

mation in autonomous shuttle operations. As stated

in (Dehghani et al., 2023), the challenges encoun-

tered, particularly those involving unforeseen pedes-

trian intentions that often cause shuttle abrupt halts, il-

luminate the necessity for precise prediction of pedes-

trian goals. Pedestrian movements depend on a multi-

tude of factors, including different customs and infor-

mal regulations (social norms) related to each country

that signiﬁcantly impact how people behave in trafﬁc

and how they communicate their intentions (F

arber,

2016). Furthermore, factors such as the street’s width

and the presence of trafﬁc signals impact pedestrian

behavior. In narrower or signalized areas, pedestri-

ans may become less cautious, often crossing without

checking for trafﬁc (Rasouli et al., 2017). Consider-

ing all these factors, predicting pedestrian intentions

requires an accurate model. Nonetheless, this task is

complex due to the variability in the number of pedes-

trians and their reactions to environmental factors

such as trafﬁc density, road conditions, regulations,

social inﬂuences, and other circumstances (Rasouli

et al., 2017). Figure 1 illustrates an urban trafﬁc sce-

nario observed from the viewpoint of an autonomous

vehicle, highlighting the complex interaction of dif-

ferent dynamic elements in challenging environmen-

tal circumstances. The scene involves multiple pedes-

trians, each potentially following separate routes and

having different objectives, various moving vehicles,

plenty of trafﬁc signs, and adverse weather condi-

tions, which raises critical questions about the ﬁnal

objectives of pedestrians. What are all pedestrians’ ﬁ-

nal intentions, and which pedestrian can cause a col-

lision?

Rudenko et al. have devised a taxonomy that orga-

nizes current solutions based on their motion model-

ing techniques and the degree of contextual informa-

tion utilized (Rudenko et al., 2020). They divided the

modeling approach for predicting pedestrian motion

386

Dehghani, A. and Studencki, L.

Multi-Pedestrian Tracking and Map-Based Intention Estimation for Autonomous Driving Scenario.

DOI: 10.5220/0012691700003702

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 10th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2024), pages 386-393

ISBN: 978-989-758-703-0; ISSN: 2184-495X

Figure 1: A complex autonomous driving scenario in Kro-

nach, Germany under challenging conditions.

into three types: physics-based, pattern-based, and

planning-based. In physics-based approaches, many

motion prediction techniques model human move-

ment using basic kinematic principles (using New-

ton’s laws to model movement), capturing position,

velocity, and acceleration for simplicity and effective-

ness in stable conditions with short-term forecasts,

for instance, (Elnagar, 2001) use of a Kalman Filter

(KF) for tracking dynamic obstacles. However, the

aforementioned work only makes predictions that are

one step ahead and ignore contextual cues from the

environment. In pattern-based approaches, utilizing

the data collected from the environment or previous

observed trajectories to predict motion patterns also

demonstrates enhanced accuracy (Chen et al., 2016).

Razali et al. (Razali et al., 2021) present a vision-

based system that integrates pedestrian localization,

body pose estimation, and intention prediction using a

multi-task convolutional neural network, offering en-

hanced precision in intention prediction. However,

the effectiveness of data-driven prediction methodolo-

gies largely depends on the quantity, quality, and va-

riety of data, including various factors such as age,

gender, geographical landscapes, weather conditions,

lighting conditions, speciﬁc trafﬁc scenarios, cultural

norms, legal norms and social norms. Consequently,

acquiring and processing such a substantial volume of

labeled training data poses a challenge in real-world

scenarios due to the computational intensity required

(Keller and Gavrila, 2013). Moreover, they mostly

do not consider the interaction of multiple pedestri-

ans in the scenario. Planning-based approaches to

pedestrian motion prediction try to understand the in-

tentions behind a pedestrian’s movement by following

a sense-reason-predict scheme about the likely goals

and possible path to reach the goal. They typically

focus on using optimization techniques by applying

predeﬁned cost functions (forward planning) or learn-

ing these functions from observed behavior (inverse

planning). A number of approaches model the prob-

abilities of the future motion based on cost-to-go

value estimates. They propose a probabilistic goal-

directed motion model that accounts for several goals

in the environment (Best and Fitch, 2015)(Vasquez,

2016). While these approaches are suitable for sce-

narios where understanding the underlying intent or

goal is crucial, they are not as effective in dynamically

changing environments where objects frequently ap-

pear and disappear or when dealing with a large num-

ber of objects. These methods can be expanded to

consider different contextual cues (map-based, and

dynamic environment cues) that impact pedestrian be-

havior. This combination facilitates the creation of

more accurate and contextually sensitive forecasts by

considering factors such as societal norms, trafﬁc sig-

nals, environmental layout, and psychological condi-

tions.

We propose a comprehensive solution for predict-

ing pedestrian motion using a technique that inher-

its physics-based and planning-based characteristics

that can simultaneously handle multiple pedestrians

in a complex automotive driving scenario. Such an

approach would leverage the accuracy of physics-

based models that adhere to Newton’s laws for move-

ment and the insight of planning-based models that

infer intentions and goals to forecast future paths.

This hybrid method would not only model the im-

mediate physical interactions but also incorporate a

deeper understanding of pedestrian behavior, mak-

ing predictions more robust in complex environments

where anticipating future movements is crucial. Our

algorithm enhances the Gaussian Mixture Probabil-

ity Hypothesis Density (GMPHD) Filter (Clark et al.,

2006) with the Generalized Potential Field Approach

(GPFA) (Particke et al., 2017). This hybrid prediction

approach creates a dynamic pedestrian motion model,

which integrates a broader range of inﬂuences, in-

cluding environmental layouts and individual pedes-

trian goals, into a uniﬁed framework to ﬁnd the most

probable objective (intention) of all pedestrians. This

paper is structured as follows: In Section II the pro-

posed method including environmental data model-

ing as a potential ﬁeld, a dynamic model for pedestri-

ans and Probability Hypothesis Density Filter(PHD)

is presented. The experiments for demonstrating the

effectiveness of our algorithm comes in Section III,

and the paper’s conclusion and suggestions for further

research are presented in Section IV.

Multi-Pedestrian Tracking and Map-Based Intention Estimation for Autonomous Driving Scenario

387

2 MULTI-PEDESTRIAN

TRACKING

2.1 Tracking Algorithm

In general, as shown in Figure 2, tracking multiple

pedestrians requires that they are ﬁrst detected by

some sensory input. Advanced algorithms are then

applied to interpret the raw data, distinguish pedestri-

ans from other objects, and predict the intentions. In

this paper, we assume that the pedestrian tracking had

already been performed and positions in a 3D coordi-

nate system were available. We focus mainly on the

Map to Potential Field, the tracking algorithm, and

the Pedestrian Trajectory Prediction parts.

Pedestrian

Detection

Map to

Potential

Field

Tracking

Algorithm

Pedestrian

Trajectory

Prediction

Figure 2: General Architecture of the Multi-Pedestrian

Tracking System.

The concept of potential ﬁelds has been exten-

sively applied in various research areas, including

ﬂocking behavior, trajectory planning, and pedestrian

crowd analysis. However, existing methods like the

social force model face limitations when dealing with

individual pedestrians or small groups, as they are op-

timized for pedestrian crowds. Moreover, the number

of parameters to be set is huge. To overcome these

challenges, the Generalized Potential Field Approach

(GPFA) was developed. It combines a potential ﬁeld

with a kinematic motion model, ensures applicability

to single pedestrians and small groups, and simpli-

ﬁes parametrization (Particke, 2020). To calculate the

potential ﬁeld, every pedestrian is regarded as a test

particle in several different potential ﬁelds. Each ﬁeld

(φ

) stands for a different information source, such as

a map of the surrounding area. Each potential ﬁeld is

made up of a variety of potential sources (φ

), such as

obstacles. The potential at the pedestrian’s position is

calculated using the following equation:

∑

i=1





(1)

In this equation, p

) represents the weight

of each potential source, which depends on the Eu-

clidean distance d

between the pedestrian and the

potential source but is independent of time.

As demonstrated in (Particke, 2020) the inﬂuence

of the potential ﬁeld on the pedestrian can be modeled

as an acceleration vector ⃗a

of source k at position

PN, which affects directly the pedestrian’s dynamics.

The dynamic model of the pedestrian considers

both the gradient of the potential ﬁeld

⃗

∇φ

and the

ﬂow resistance (F

= c

⃗evN):

⃗a

−

⃗

∇φ

−c

⃗evN

(2)

where the pseudo mass m

and the drag coefﬁ-

cient c

parameters must be conﬁgured appropriately

to represent the expected dynamics of the pedestrian.

Similar to (Particke et al., 2017), a constant veloc-

ity model as a dynamic model in the Kalman Filter for

the pedestrian movement was used.

The PHD ﬁlter is a well-known method for multi-

target tracking based on the ideas of random ﬁnite

sets and was ﬁrst introduced by Mahler and Ronald

(Mahler, 2003). Later, Clark and et al. (Clark et al.,

2006) proposed the Gaussian Mixture PHD (GM-

PHD) ﬁlter, a computationally effective implementa-

tion of the PHD ﬁlter. The PHD ﬁlter is exceptionally

well suited for handling an unknown and time-varying

number of targets (Gao et al., 2021), which is a fre-

quent challenge when attempting to follow numerous

pedestrians with various intentions. Each object in a

GM-PHD ﬁlter is presumed to follow a linear Gaus-

sian model. However, the multiple target posterior

distribution need not have the same covariance matri-

ces so that it will be a Gaussian mixture (GM). Given

a state p(x

k−1

) at time k −1, the probability density

of a transition to the state p (x

) at time k at time k is

Transition Density and given by:

k|k−1

| x

k−1

) (3)

In the context of the GM-PHD ﬁlter, the Kalman Fil-

ter is utilized for the state prediction of each target,

considering their acceleration. The state prediction

equation in the Kalman Filter is given by:

= F

k−1

+ B

(4)

is the state vector at time k, which typically in-

cludes position and velocity. F

is the state transition

matrix, mapping the previous state x

k−1

to the current

state x

. B

is the control input model. u

is the con-

trol vector, incorporating the acceleration (⃗a) of the

pedestrian, obtained from the potential ﬁeld.

In addition, assuming a state x

at time k, the prob-

ability density of receiving the detection z

gives the

Likelihood Function as following:

| x

) (5)

VEHITS 2024 - 10th International Conference on Vehicle Technology and Intelligent Transport Systems

388

The probability density of state x

given all the

prior observations is represented by the notation

| z

1:k

) for the posterior density. Applying

Bayes’ recursion, we can demonstrate that the pos-

terior density is actually as following using an initial

density of p

(·):

| z

1:k

) =

| x

) p

k|k−1

| z

1:k−1

)

| x) p

k|k−1

(x | z

1:k−1

)dx

(6)

In the GM-PHD ﬁlter, each target is treated as in-

dependent from the others regarding the generation

of observations and its evolution. The two equa-

tions clearly demonstrate that the PHD ﬁlter effec-

tively eliminates the combinatorial calculations re-

sulting from the unassigned association of measure-

ments to speciﬁc targets.

In a GMPHD-based GPFA, the tracking system

initializes with predeﬁned system parameters and an

”Intention Map” that outlines the pedestrians’ prob-

able objectives. The GPFA calculates the movement

acceleration of pedestrians toward each intended des-

tination, serving as the control input for the PHD pre-

diction process. At each time step, the system mea-

sures actual pedestrian movements, and during the

association phase, the system generates multiple hy-

potheses based on the prior predictions and new mea-

surements. Finally, these hypotheses undergo reﬁne-

ment in the update process.

Figure 3 shows the general workﬂow of the pro-

posed PHD-GPFA algorithm. It is assumed, that the

vehicle knows where it is and loads the topological

map of its environment. Based on this, the potential

ﬁeld map for each intention is calculated according to

(2.1) and ﬁnally the acceleration for each hypothet-

ical intention is derived. Using this information the

prediction for time point k is calculated. Subsequently

the measurements are acquired and the association for

each of the hypothesis based on the mahalanobis dis-

tance is performed. Later, the measurement update for

each hypothesis is performed and ﬁnally, the selec-

tion of the most probable track (’Conﬁrmed Tracks’)

according to the likelihood is performed.

2.2 Map to Potential Field

The primary aim of the GPFA is to improve the ef-

ﬁciency of models utilized in classic Kalman ﬁlter-

ing or Monte Carlo techniques. For this purpose,

an acceleration vector should be generated from in-

formation sources like attractive components (inten-

tions) and repulsive components. In simpler terms,

the pedestrians can be likened to test particles moving

within a potential ﬁeld, where their movements are in-

ﬂuenced by both attractions towards certain goals and

repulsions from obstacles.

Data Association

Hypothesis generation

based on

Mahalanobis distance

GPFA-based Prediction

Update

Conﬁrmed Tracks

Measurements

1. Calculate the potential ﬁeld

2. Gradient of potential ﬁeld

3. Derive acceleration vector

Initialization

Intention Map

(control input)

Figure 3: PHD-GPFA Flow diagram.

The explained potential ﬁeld is generated using

mainly the topological description of the area around

the car. However some assumptions regarding so-

cial force models and pedestrian behaviour are being

made:

• Pedestrians have speciﬁc intentions (destinations)

and they aim to achieve them

• Pedestrians tend to go on sidewalks

• Pedestrians tend to cross road on zebras or corners

• Pedestrians tend to avoid collisions

• Pedestrians tend to keep their travel direction

Following this assumption, map elements such as ze-

bras or sidewalks are described as attraction areas.

Map elements like buildings are described as repul-

sion areas. Moreover, accounting for the direction of

walk is necessary to deﬁne plausible destinations.

The potential ﬁeld generation is a key function of

the component termed map to potential ﬁeld. For il-

lustration, Figure 4 demonstrates the concept within a

street scenario. Additionally, an example of a poten-

tial ﬁeld for a driving scenario is depicted in Figure 5.”

Using the deﬁned potential ﬁeld, the tracking and

posterior prediction of pedestrian trajectories are per-

formed.

How can we improve pedestrian trajectory predic-

tion using map data? To answer this question, we

Multi-Pedestrian Tracking and Map-Based Intention Estimation for Autonomous Driving Scenario

389

Figure 4: Street scenario.

Figure 5: Potential ﬁeld example for a street scenario.

need to model the map as a potential ﬁeld by utiliz-

ing the map information. The potential ﬁeld is com-

puted by evaluating the inﬂuence of conic compo-

nents across a grid of points that extends the domain

of interest. While the inﬂuence of repulsive compo-

nents typically exhibits an inversely exponential re-

lationship with distance, modulated by a scaling fac-

tor, the inﬂuence of conic components in the case of

”intention,” in this instance, exhibits a linear relation-

ship with distance, also modulated by a scaling factor

(Particke et al., 2017). This inﬂuence decreases lin-

early with increasing distance. Figure 6 depicts the

resulting ﬁeld in both 3D surface and 2D heatmap for-

mats. These visualizations depict the strength of the

ﬁeld at a given point as the Z-value in the 3D surface

plot and the color intensity in the 2D heatmap, respec-

tively.

Y(m)

X(m)

−8

−6

−4

0 5 10 15

Y(m)

0.0

2.5

5.0

7.5

10.0

X(m)

Figure 6: Pedestrian in potential ﬁeld map.

3 EXPERIMENTAL RESULTS

In this study, we utilized a publicly available pedes-

trian trajectory dataset from the University of Edin-

burgh’s School of Informatics (Majecka, 2009), cho-

sen for its overhead camera system that captures clear,

minimally noisy pedestrian paths in a public space.

The dataset’s precise, real-world coordinate trajecto-

ries and the variety of pedestrian movements provide

an ideal base for adding controlled noise (speciﬁcally

additive white Gaussian noise) for analysis. Its di-

verse path patterns, originating from different points

but diverging towards multiple destinations, accu-

rately represent the dynamic nature of pedestrian traf-

ﬁc in real environments.

For the experiment, trajectories of three pedestri-

ans going to two target regions (two intentions) were

selected. The extracted data were used as ground

truth, and some artiﬁcial measurement noise (σ = 0.1)

was added to assess the performance of our approach.

The approach was ﬁrst evaluated using the best-

case scenario, where one pedestrian with a known

intention is tracked. Subsequently, the algorithm’s

capability to identify the real intention of a pedes-

trian was evaluated by tracking one pedestrian with

three possible intentions (deﬁned as hypotheses for

the PHD-GPFA). Finally, the algorithm was tested

on scenarios involving multiple pedestrians each with

multiple unknown intentions.

3.1 One Pedestrian with One Known

Intention

The ﬁrst experiment conducted involves tracking a

single pedestrian, with the pedestrian’s intention as-

sumed to be known at the position x = 6.61m and

y = 11.11m. The acceleration vector calculated by the

GPFA is then used as a control input for the Kalman

ﬁlter’s state prediction stage. The KF and KF-GPFA

methods were used to estimate the next state based

on the previous measurement and the motion model.

In addition, predictions for future time points ranging

from 1 to 10 seconds were made using both methods.

In addition, predictions for future time points,

ranging from 1 to 10 seconds, were made using both

methods. These predictions represent the time an au-

tonomous vehicle would need to react in case of a

collision and the position uncertainty related to this

time-frame. The results of the estimation and predic-

tion processes are presented in Figure 7.

On closer inspection of the data, it is evident that

the Kalman Filter provides a reasonably accurate es-

timate of the pedestrian’s position, with only minor

deviations compared to the ground truth data. Si-

VEHITS 2024 - 10th International Conference on Vehicle Technology and Intelligent Transport Systems

390

0 100

Time(s)

x coordinate

X Estimation

KF Est x

GPFA Est x

0 100

Time(s)

y coordinate

Y Estimation

KF Est y

GPFA Est y

0 10

Prediction over time (s)

RMSE

X RMSE

Kalman Filter

GPFA

0 10

Prediction over time (s)

RMSE

Y RMSE

Kalman Filter

GPFA

Figure 7: Trajectory estimation and prediction using KF

(state-of-the-art) and GPFA compared to Ground Truth

(GT).

multaneously, the KF-GPFA technique demonstrates

remarkable performance. This demonstrates the po-

tential beneﬁts of using an enhanced method such as

KF-GPFA for more precise pedestrian position es-

timation. On the lower part of Figure 7, the pre-

diction capabilities of a vehicle, based on the esti-

mated trajectory and intention for different time in-

tervals, are shown. As expected, increasing the time

interval for making a prediction, the error increases.

In terms of forecasting pedestrian positions, the KF-

GPFA displays a noticeable enhancement over the

standard Kalman Filter.

3.2 One Pedestrian with Three Possible

Intentions

In the prior experiment, one of the limitations was the

assumption that the pedestrian’s intention or goal was

known, whereas, in autonomous scenarios, this in-

formation is typically unavailable. Motivated by this

discrepancy, the current experiment focuses on inte-

grating an unknown intention into pedestrian tracking

and prediction, aiming to address the question of how

pedestrians’ intentions can be estimated in advance.

As mentioned before, the proposed solution in-

volves estimating the probability of each possible in-

tention. This is achieved by modeling the intentions

as conic well components in the potential ﬁeld, which

is derived from the map of the immediate environ-

ment. These components are then evaluated at each

estimated pedestrian position. Figure 8 illustrates a

potential ﬁeld map, highlighting three plausible inten-

tions (hypotheses) for a pedestrian.

The estimation result for the most probable hy-

pothesis is presented in the upper subplot of Figure 9.

The estimated positions closely align with the actual

Figure 8: Potential ﬁeld map for the unknown intention,

where the three wells represent the intentions.

0 2 4 6 8 10

Y Position (m)

0.0

2.5

5.0

7.5

10.0

12.5

X Position (m)

Est

Hypo 1

Hypo 2

Hypo 3

0 20 40 60 80 100

Time Step (s)

100

Weight (percent)

Hypo 1

Hypo 2

Hypo 3

Figure 9: Tracking and probabilistic intention estimation of

a pedestrian over time, compared to ground truth (GT), with

an analysis of hypothesis (Hypo) weights.

positions, indicating accurate estimation. The pedes-

trian’s intentions are represented by polygons of var-

ious hues, denoting various hypotheses. The second

subplot depicts the likelihood; the larger the weight,

the more probable the hypothesis. This graph illus-

trates the efﬁcacy of the GPFA algorithm in ﬁnding

the most probable intention, estimating positions, and

tracking the evolution of hypothesis probabilities.

3.3 Multiple Pedestrian with Multiple

Possible Intentions

While the previously mentioned experiment, fo-

cused on a single pedestrian with multiple intentions,

real automotive driving scenarios necessitate track-

ing multiple intentions of several pedestrians simul-

Multi-Pedestrian Tracking and Map-Based Intention Estimation for Autonomous Driving Scenario

391

taneously. In this experiment the goal is to validate

whether our algorithm could effectively handle multi-

ple pedestrians. For this, we examined three pedestri-

ans, each potentially having one of two distinct inten-

tions, labeled as intention one and intention two. In

order to evaluate the algorithm’s efﬁciency, we now

consider three pedestrians walking in the same area,

with two possible intention hypotheses given for each

pedestrian. The results of the tracking are presented in

Figure 10 as a 2D plot and in Figure 11 as root mean

square error (RMSE) for each of the six hypotheses.

0.0 2.5 5.0 7.5 10.0 12.5

X (m)

0.0

2.5

5.0

7.5

10.0

Y (m)

Ped 1 - Int 1

Ped 1 - Int 2

Ped 2 - Int 1

Ped 2 - Int 2

Ped 3 - Int 1

Ped 3 - Int 2

Truth 1

Truth 2

Truth 3

Intention 1

Intention 2

Figure 10: Track estimation for multi-pedestrian scenarios

utilizing GMPHD-GPFA; involving three pedestrians each

with two potential intentions, yielding a total of six hypoth-

esis scenarios.

Our observations reveal that the hypotheses corre-

sponding to the correct intentions—Pedestrian 1 with

Intention 1, Pedestrian 2 with Intention 2, and Pedes-

trian 3 with Intention 1—demonstrate lower RMSE

errors, aligning with our expectations. This sug-

gests that the algorithm can effectively differentiate

between the correct and incorrect intentions for each

trajectory.

Ped 1 - Int 1

Ped 1 - Int 2

Ped 2 - Int 1

Ped 2 - Int 2

Ped 3 - Int 1

Ped 3 - Int 2

0.00

0.05

0.10

RMSE

0.09

0.11

0.08

0.06

0.08

0.12

Figure 11: RMSE of multiple trajectories.

Analyzing the weight of different intentions also

illustrates the ﬁnding of the correct intention accord-

ing to Figure 12. After some time steps (between

40-60) the real intention can be clearly distinguished,

corresponding to around 4 til 6 seconds. Although it is

a signiﬁcant improvement with respect to the state-of-

the-art algorithms, this is long for a successful aware-

ness of an autonomous vehicle. This is due to the fact

that the algorithm makes its inferences mainly based

on position data, and only if the trajectories take no-

table different ways can the intention be clearly deter-

mined.

Analyzing the weight of different intentions also

illustrates the ﬁnding of the correct intention accord-

ing to Figure 12. After some time steps (between

40–60) the real intention can be clearly distinguished,

corresponding to around 4 to 6 seconds. Although it is

a signiﬁcant improvement with respect to the state-of-

the-art algorithms, this duration is long for a success-

ful awareness of an autonomous vehicle. This is be-

cause the algorithm makes its inferences mainly based

on position data, and only if the trajectories take no-

tably different paths can the intention be clearly de-

termined.

0 20 40 60 80 100

Time (s)

0.25

0.50

0.75

Weight (percent)

Ped 1 - Int 1

Ped 1 - Int 2

Ped 2 - Int 1

Ped 2 - Int 2

Ped 3 - Int 1

Ped 3 - Int 2

Figure 12: Assessing Diverse Hypothesis Weights for

Pedestrian Intention Detection.

The Table 1 shows the RMSE results for both es-

timations and predictions:

Table 1: RMSE Comparison of the prediction capabilities

in each experiment.

Trajectory Estimation Prediction

KF GMPHD-GPFA 1s 2s 3s

Pedestrian 1 0.11 0.09 0.42 1.26 2.43

Pedestrian 2 0.07 0.06 0.49 1.42 2.50

Pedestrian 3 0.10 0.08 0.32 0.98 1.91

The ﬁrst column corresponds to state of the art al-

gorithm based on KF and PHD, the second column

coincides with the results in Figure 11, for the predic-

tion of the RMSE observed at a 1–3 second interval.

While this may seem high compared to results from

individual pedestrians, it is crucial to note the dif-

ferences between the experiments. The intentions in

the experiment involving multiple pedestrians are un-

known, unlike in the single-pedestrian scenario. Ad-

VEHITS 2024 - 10th International Conference on Vehicle Technology and Intelligent Transport Systems

392

ditionally, a higher number of hypotheses inﬂuences

the estimation quality, adding complexity that the al-

gorithm must manage. Although the algorithm suc-

cessfully tracks objects and infers intentions, it does

not directly consider changes in those intentions.

Despite the promising results, the prediction ca-

pability falls short of meeting the timing and accu-

racy requirements for autonomous vehicles operating

in urban environments. There is a need for further

development to enhance the model with faster inten-

tion detection techniques. Such improvements could

involve using gestures or other indicators, extending

beyond reliance solely on trajectory data.

4 CONCLUSION

This paper introduces an approach that combines

physical-based and planning-based modeling for

tracking and predicting the positions and intentions

of multiple pedestrians around an autonomous vehi-

cle. Utilizing a Probability Hypothesis Density Filter

(PHD) integrated with a Generalized Potential Field

Approach (GPFA), the proposed algorithm generates

multiple hypotheses and continuously tracks them,

effectively identifying pedestrians’ actual intentions.

This enables autonomous vehicles to accurately fore-

cast pedestrian movements and re-planing maneuvers

accordingly. However, accelerating the detection of

intentions remains a challenge that requires further

development. The study also highlights the criti-

cal role of incorporating map information in deﬁn-

ing tracking hypotheses, signiﬁcantly enhancing the

model’s precision and reliability.

ACKNOWLEDGEMENTS

The SMO project is supported by the Federal Ministry

of Transport and Digital Infrastructure of Germany.

For more information about the project, please see:

www.shuttle-modellregion-oberfranken.de

REFERENCES

Best, G. and Fitch, R. (2015). Bayesian intention infer-

ence for trajectory prediction with an unknown goal

destination. In 2015 IEEE/RSJ International Confer-

ence on Intelligent Robots and Systems (IROS), pages

5817–5823. IEEE.

Chen, Y., Liu, M., Liu, S.-Y., Miller, J., and How, J. P.

(2016). Predictive modeling of pedestrian motion pat-

terns with bayesian nonparametrics. In AIAA guid-

ance, navigation, and control conference, page 1861.

Clark, D. E., Panta, K., and Vo, B.-N. (2006). The gm-phd

ﬁlter multiple target tracker. In 2006 9th International

Conference on Information Fusion, pages 1–8. IEEE.

Dehghani, A., Salar, H., Srinivasan, S., Zhou, L., Arbeiter,

G., Lindner, A., and Patino-Studencki, L. (2023). En-

hancing availability of autonomous shuttle services: A

conceptual approach towards challenges and opportu-

nities. Manuscript under review.

Elnagar, A. (2001). Prediction of moving objects in dy-

namic environments using kalman ﬁlters. In Proceed-

ings 2001 IEEE International Symposium on Compu-

tational Intelligence in Robotics and Automation (Cat.

No. 01EX515), pages 414–419. IEEE.

arber, B. (2016). Communication and communication

problems between autonomous vehicles and human

drivers. Autonomous driving: Technical, legal and so-

cial aspects, pages 125–144.

Gao, Y., Jiang, D., Zhang, C., and Guo, S. (2021). A labeled

gm-phd ﬁlter for explicitly tracking multiple targets.

Sensors, 21(11):3932.

Keller, C. G. and Gavrila, D. M. (2013). Will the pedes-

trian cross? a study on pedestrian path prediction.

IEEE Transactions on Intelligent Transportation Sys-

tems, 15(2):494–506.

Mahler, R. P. (2003). Multitarget bayes ﬁltering via ﬁrst-

order multitarget moments. IEEE Transactions on

Aerospace and Electronic systems, 39(4):1152–1178.

Majecka, B. (2009). Statistical models of pedestrian be-

haviour in the forum. Master’s thesis, School of Infor-

matics, University of Edinburgh.

Particke, F. (2020). Predictive Pedestrian Awareness

with Intention Uncertainties for Autonomous Driv-

ing. PhD thesis, Friedrich-Alexander-Universit

Erlangen-N

urnberg (FAU).

Particke, F., Patino-Studencki, L., Thielecke, J., and Feist,

C. (2017). Pedestrian tracking using a generalized po-

tential ﬁeld approach. In VISIGRAPP (6: VISAPP),

pages 509–514.

Rasouli, A., Kotseruba, I., and Tsotsos, J. K. (2017). Un-

derstanding pedestrian behavior in complex trafﬁc

scenes. IEEE Transactions on Intelligent Vehicles,

3(1):61–70.

Razali, H., Mordan, T., and Alahi, A. (2021). Pedes-

trian intention prediction: A convolutional bottom-up

multi-task approach. Transportation research part C:

emerging technologies, 130:103259.

Rudenko, A., Palmieri, L., Herman, M., Kitani, K. M.,

Gavrila, D. M., and Arras, K. O. (2020). Human mo-

tion trajectory prediction: A survey. The International

Journal of Robotics Research, 39(8):895–935.

SMO (2022). Shuttle modellregion ober-

franken (smo) project. https://www.

shuttle-modellregion-oberfranken.de/.

Vasquez, D. (2016). Novel planning-based algorithms

for human motion prediction. In 2016 IEEE In-

ternational Conference on Robotics and Automation

(ICRA), pages 3317–3322. IEEE.

Multi-Pedestrian Tracking and Map-Based Intention Estimation for Autonomous Driving Scenario

393