An Easy-to-Use System for Tracking Robotic Platforms Using

Time-of-Flight Sensors in Lab Environments

Andr

e Kirsch and Jan Rexilius

Bielefeld University of Applied Sciences and Arts, 32427 Minden, Germany

Keywords:

Tracking, Robot, Drone, MAV, External, Time-of-Flight, LiDAR.

Abstract:

The acquisition of accurate tracking data is a common problem in scientiﬁc research. When developing new

algorithms and AI networks for the localization and navigation of mobile robots and MAVs, they need to be

evaluated against true observations. Off-the-shelf systems for capturing ground truth data often come at a high

cost, since they typically include multiple expensive sensors and require a special setup. We see the need

for a simpler solution and propose an easy-to-use system for small scale tracking data acquisition in research

environments using a single or multiple sensors with possibly already available hardware. The system is able

to track mobile robots moving on the ground, as well as MAVs that are ﬂying through the room. Our solution

works with point clouds and allows the use of Time-of-Flight based sensors like LiDAR. The results show that

the accuracy of our system is sufﬁcient to use as ground truth data with a low-centimeter mean error.

1 INTRODUCTION

Ground truth data is an important requirement to cor-

rectly assess the accuracy of navigation and local-

ization algorithms and is also used for training rein-

forcement learning networks for mobile robots and

micro aerial vehicles (MAVs), hereafter summarized

as robotic platforms. The data contains positional

information and sometimes also information about

the orientation of the robotic platform. But captur-

ing valid tracking data typically requires some sort

of motion capturing system like OptiTrack

, which

come at a high cost. There are other cheaper alter-

natives available, each having different advantages

and disadvantages, like the Lighthouse positioning

system (Taffanel et al., 2021) and Ultra-Wideband

(UWB) (Shule et al., 2020), which require hardware

mounted on the tracked object.

We focus on Time-of-Flight (ToF) based technolo-

gies like LiDAR, as they do not require any lighting in

the environment and can track unknown objects that

have no tags or special hardware attached to them.

Time-of-Flight is a technology that computes distance

based on how much time light or sound needs to travel

until it hits something and bounces back. For light, a

typical wavelength is infrared. It is often either emit-

ted using LEDs for low-range ToF devices or using

a laser in case of LiDAR. Sending out light waves as

https://optitrack.com/

a laser has the advantage of the light reaching farther

distances and therefore covering a larger area.

In this paper, we present a system for tracking

robotic platforms using Time-of-Flight for tracking

data acquisition, as shown in ﬁgure 1. The system

focuses on lab environments and can be used with

different types of Time-of-Flight-based sensors that

capture point clouds. Its main goal is to make cap-

turing position data more accessible since it does not

rely on a speciﬁc sensor. This enables the use of sen-

sors already available to a researcher, which might

have become unused after ﬁnishing an earlier research

project. This omits the need to invest into a costly mo-

tion capturing system. The proposed system should

be easy to install and already be useable with a single

sensor, while being scalable to multi sensor setups.

Multi sensor setups can increase the recording area

or capture the scene from a different angle. Please

note, that the tracking accuracy, frequency, and range

highly depend on the selected sensor. Furthermore,

there is no need to install additional hardware on the

robotic platform when using our system.

2 RELATED WORK

Camera-based motion capturing systems are the gold

standard for tracking objects in a conﬁned space and

are often used to record ground truth data due to their

Kirsch, A. and Rexilius, J.

An Easy-to-Use System for Tracking Robotic Platforms Using Time-of-Flight Sensors in Lab Environments.

DOI: 10.5220/0013110500003905

In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025), pages 571-578

ISBN: 978-989-758-730-6; ISSN: 2184-4313

571

Figure 1: Visualization of the setup of the proposed system. Subﬁgure (b) shows the recording area with two installed sensors.

At least one sensor is required. Additional sensors can be used to either increase the size of the recording area or to capture

the recording area from a different angle, as shown in the image. The captured data is visualized in subﬁgure (a), where our

system is tracking the detected MAV. Subﬁgure (c) shows a real-life view of a part of the recording area captured from the

view of the main sensor.

millimeter-level accuracy, as shown in (Furtado et al.,

2019). Popular systems include OptiTrack, Vicon

and Qualysis

, which come at a high cost and re-

quire a complex setup of multiple sensors in a des-

ignated area. They either use active or passive mark-

ers attached to the tracked object that emit or reﬂect

infrared (IR) light, respectively. This light is cap-

tured by multiple special cameras. Through triangu-

lation of the captured data, it is possible to calculate

the position of the markers in 3D space. A cheaper

open-source alternative that uses markers is Easy Ro-

Cap (Wang et al., 2024), which is speciﬁcally de-

signed for tracking mobile robots and MAVs. It uses

infrared sensors from multiple Intel RealSense depth

cameras. There are also commercial markerless mo-

tion capturing systems but these mainly focus on the

tracking of humans.

Another IR-based system for tracking MAVs is the

Lighthouse Positioning system (Taffanel et al., 2021).

Multiple base stations emit IR light through moving

sweep planes. This light is captured by multiple IR

receivers mounted on the tracked MAV. The position

is calculated on-board the MAV. With lower velocity

and in a conﬁned space, this method can reach accura-

cies of less than a centimeter. Ultra-Wideband is an-

other alternative, requiring active components on the

tracked hardware. As the name suggests, it utilizes

radio technology to determine the location of tracked

objects. An outline of this technology is given by

(Shule et al., 2020). As we want the location esti-

mation to be computed independently of the tracked

object, we focus on a different method. We use ToF

sensors that are part of the environment and capture

data without communication with the tracked object

required.

https://www.vicon.com/

https://www.qualisys.com/

2.1 Time-of-Flight-Based Methods

Compared to the aforementioned methods, Time-of-

Flight and especially LiDAR are used in a variety

of use cases. Its advantage over other approaches is

that it does not require any active or passive hard-

ware mounted on the tracked object. Therefore, it

is a preferred technology for tracking unknown ob-

jects and has formed the research area of 3D single

object tracking (3DSOC), including the pioneering

works of SC3D (Giancola et al., 2019) and P2B (Qi

et al., 2020). SC3D uses a Siamese tracker, while

P2B leverages PointNet++ (Qi et al., 2017). More

recent solutions to the problem include PTTR and

PTTR++ (Zhou et al., 2022), as well as P2P (Nie

et al., 2024), which also use some sort of AI network.

The networks are commonly evaluated and compared

using outdoor LiDAR sequences of a driving car, as in

the KITTI dataset (Geiger et al., 2013). Object classes

therefore include different types of cars, pedestrians,

and cyclists. None of the mentioned AI-based re-

search has been designed and tested with robotic plat-

forms in mind, which in our case are typically smaller

compared to their evaluated object classes.

The research on tracking mobile robots is much

more sparse. This might be due to the large variety

in forms and sizes, but also because swarms of mo-

bile robots are typically connected through a network

and can exchange their position information. (Yilmaz

and Bayindir, 2022) and (Was¸ik et al., 2015) use a

2D LiDAR sensor to ﬁnd mobile robots of the same

type as the carrier of the sensor in the environment,

also known as kin detection. (Pleterski et al., 2023)

do kin detection for miniature mobile robots using

an ultralow-resolution ToF sensor. They train a CNN

on the depth images to determine if a mobile robot is

present in the image.

With the recent publications of the LiDAR-based

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

572

Figure 2: Overview of the proposed pipeline using the Pepper mobile robot as an example. An input cloud is shown in (a).

The other subﬁgures (b)-(e) show the different stages of the pipeline as described in the sections 3.2-3.5. For visualization

reasons, the background is shown in all images in yellow. Note that we downsampled the point cloud for performance reasons.

MAV datasets by (Catalano et al., 2023b) for gen-

eral MAV detection and tracking, and by (Yuan et al.,

2024) for drone threat detection, there is a stronger

focus on detection and tracking of drones in point

clouds compared to mobile robots. Compared to

3DSOC, both AI-based and traditional methods are

researched to solve this problem. Initial groundwork

has been done by (Dogru and Marques, 2022) and

(Wang et al., 2021). Other traditional methods include

works done by (Qingqing et al., 2021) and (Cata-

lano et al., 2023a), who integrate multiple scans us-

ing different frequencies to improve accuracy, as well

as (Gazdag et al., 2024), who use a particle ﬁlter for

tracking and utilize the scanning pattern of the LiDAR

sensor. For AI-based methods, (Sier et al., 2023) fuse

2D signal image provided by the sensor and point

clouds to detect MAVs. Point clouds are used in a

consecutive step to extract the MAV pose. (Chen

et al., 2024) detect MAVs in RGB image data and

again use point clouds for position estimation. The

tracking of MAVs by (Deng et al., 2024) is primar-

ily based on LiDAR captures and incorporates Radar

data. MAVs are detected through a combination of

an attention-based LSTM, a PointNet-based module,

and an MLP. Since detection and tracking of robotic

platforms using AI-based methods is still in its early

stages and often merges different sensor types, we fo-

cus on traditional approaches in our implementation.

3 CONCEPT

This paper proposes a system for tracking and de-

tecting mobile robots and MAVs in lab environments.

The main use case of the system is to capture position

data of a speciﬁed robotic platform. The user pro-

vides a captured or a live stream of point clouds to

the system. Since the system is implemented using

the Robot Operating System (ROS), different sensors

can be used directly. Compared to other solutions, our

system does not rely on speciﬁc hardware and can use

any type of Time-of-Flight based sensor available to

the user. Please note that the accuracy and frequency

heavily depend on the selected sensor. It is also possi-

ble to combine the recordings of multiple sensors and

feed the result into the system. Furthermore, the user

only needs to specify the dimensions of robotic plat-

forms to track. The system does not require any other

knowledge and does not make any additional assump-

tions about the platform to avoid falsifying the input

data. Figure 2 shows an overview of our approach.

The following subsections explain each part in more

detail.

3.1 Input Data

The input is a point cloud representing the tracking

area. For a single sensor, this input can be directly

forwarded to the next step. For multiple sensors, their

point clouds need to be aligned. The user needs to

specify an initial coarse alignment for every addi-

tional sensor, which is reﬁned through point cloud

registration. This requires some overlap between the

point clouds. When working with any number of sen-

sors, note that different parts of an object are captured

at the edges of the ﬁeld of view compared to the cen-

ter, which can lead to inaccuracies. With multiple sen-

sors, a similar situation can occur when the tracked

object enters the ﬁeld of view of an additional sensor.

3.2 Ground Removal

Ground removal is the ﬁrst step in the pipeline. It

mainly consists of a pre-ﬁltering step and standard

RANSAC-based plane ﬁtting. Pre-ﬁltering ensures

that plane ﬁtting is only applied to points close to

the ground. Furthermore, it is possible to downsam-

ple the point cloud. This can negatively inﬂuence the

tracking accuracy, but allows for real-time computa-

tion times in all cases.

3.3 Background Removal

Background removal is used to split moving fore-

ground points from static background points. A back-

An Easy-to-Use System for Tracking Robotic Platforms Using Time-of-Flight Sensors in Lab Environments

573

ground cloud is built over time, storing the probabil-

ity of points being part of the background. The back-

ground cloud contains points in 3d space that store the

background probability for its immediate surround-

ings and are matched against points from an input

cloud. The background probability from the back-

ground cloud point deﬁnes whether the corresponding

input point is part of the foreground or background.

Data: cloud

new

, cloud

Result: cloud

f g

U ← [ f alse, . . . ];

foreach p

new

∈ cloud

new

corr

, θ

corr

←

nearestPoint(point

new

, cloud

);

if ∥p

new

− p

corr

∥ < corr dist then

U[p

corr

] ← tr ue;

if θ

corr

< bg thresh then

add p

new

to cloud

f g

;

end

else

add p

new

to cloud

;

add p

new

to cloud

f g

;

end

update cloud

using U;

Algorithm 1: Algorithm for background removal.

Given a new input cloud and a background cloud,

each point of the input cloud is checked for being

in the background by ﬁnding its corresponding back-

ground point, as shown in Algorithm 1, where θ

corr

is the background probability for p

corr

. If no cor-

responding point is in close distance or if the cor-

responding point is not part of the background, the

current point is added to the foreground cloud. De-

pending on the case, either a new point is added to

the background cloud, if no corresponding point was

found, or the background probability of the corre-

sponding point is increased. This update of the back-

ground cloud is done at the end.

corr

(

min(θ

t−1

corr

∆t, 1), U[p

corr

]

max(θ

t−1

corr

−t

out

∆t, 0), ¬U [p

corr

]

(1)

The probability update for the background cloud

points is shown in Equation 1. If a background point

was accessed in the current step (U [p

corr

] = true), the

probability of the point being part of the background

is increased. t

is the time in seconds that is required

for a point to reach maximum background probability.

∆t is the difference between the capture times of the

current and previous input clouds. In the event, that

the input cloud did not contain a corresponding point

for a point in the background cloud, its probability is

decreased, with t

out

being the fade out time.

The background cloud is initialized using a down-

sampled version of the ﬁrst input cloud. For faster

initialization, new points are added to the background

cloud with an initial probability value. This initial

value is the sum of the background threshold and t

Note that robotic platforms are initially part of the

background and need to move to become detectable.

3.4 Cluster Detection

The cluster detection is based on Euclidean distance

clustering, but differentiates between a foreground

and a background. The point clouds are provided by

the prior step. To save computation time, both in-

put clouds are downsampled. New clusters are only

created for points in the foreground and can grow

using points from both foreground and background.

Clustering is ﬁnished when every foreground point is

part of a cluster. To counteract inaccuracies in back-

ground removal, i.e. having single foreground points

that are part of a large background cluster, we added a

foreground-to-background ratio. This classiﬁes clus-

ters as invalid, if the ratio between foreground and

background points exceeds a certain threshold (here

50 %). Invalid clusters are discarded. For each valid

cluster, the centroid and the oriented bounding box

are calculated. Since dimensions in datasheets are

typically axis-aligned, we only consider the rotation

on the up-axis of the bounding box.

3.5 Robot Detection and Tracking

The tracking of robotic platforms is done by tracking-

by-detection, using only the dimension and position

information of detected clusters and a model. A

model of a robotic platform is provided by the user

and consists of the three values width, height, and

length, as stated in the datasheet of the robotic plat-

form. It is used to ﬁnd the robotic platform within

a list of detected continuous clusters. The detected

clusters are compared against the model and against

each other using three ﬁgures of merit for the er-

ror. The ﬁrst error E

is the difference in aspect ra-

tio (Equation 2), inspired by (Kaku et al., 2004). In-

stead of the principal axis length, we use the bounding

box dimensions.



−





−



(2)

, A

, and A

are the bounding box dimensions

of the ﬁrst cluster sorted by size with A

being the

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

574

largest dimension. Thus, B

, B

, and B

are either the

dimensions of the model or the second cluster.





× A

× B



− 1



(3)

The second error E

is for the difference in vol-

ume between the ﬁrst cluster and either the model or

second model (Equation 3). The inputs for this equa-

tion are the same as in Equation 2. We divide this

error by three to reduce its inﬂuence on the overall er-

ror, as small changes can already lead to a high error

value. Also note that compared to using Intersection

over Union, we do not require a position and our er-

ror is more emphasized when the ﬁrst cluster is larger

than the model or the second cluster.

= ∥p

− p

∥ (4)

The third error E

is the Euclidean distance be-

tween the centers of the two clusters (Equation 4).

E = E

+ E

= E

+ E

(5)

As shown in Equation 5, the errors are summed to

form the overall error E. When calculating the error

using the model, the position error is omitted, as

the model does not provide position data.

To track robotic platforms, we use continuous

clusters. A continuous cluster is a data structure that

stores information about a tracked object, including

position, bounding box, and error data. It is updated

using detected clusters in new point clouds. For ev-

ery new cluster, we try to ﬁnd the best-ﬁtting contin-

uous clusters using the error formula as shown in Al-

gorithm 2. If a ﬁtting continuous cluster was found,

we update the continuous cluster based on the new

cluster. If no ﬁtting continuous cluster was found, we

return false and the cluster is inserted as a new con-

tinuous cluster. We use a constant maximum error

to prevent clusters from being inserted into a wrong

continuous cluster. Also note that we differentiate be-

tween normal clusters and sparse clusters, which do

not contain more than eight points. As they contain

little information, they need to be handled differently.

This means that 1. they can not form a new continu-

ous cluster and 2. can only update a continuous clus-

ter if that cluster is the tracked robotic platform.

For a continuous cluster to be included in the

search for the robotic platform, it must be valid. A

continuous cluster is considered valid when it has

been detected in every frame for a certain amount of

time by the cluster detection. If a continuous cluster

is valid, it is evaluated against the model by calcu-

lating a mean error value for every cluster that was

assigned to it using the cluster detection method de-

scribed above. The bounding boxes of the clusters are

Data: cluster

new

, clusters

cont

Result: bool

best

← None;

error

best

← MAX ERROR;

foreach cc ∈ clusters

cont

error

← calculateError(cluster

new

, cc);

if error

< error

best

then

best

← cc;

error

best

← error

;

end

if cc

best

̸= None then

if cc

best

has not been updated this step

then

update cc

best

with cluster

new

;

else

revert cc

best

update;

update cc

best

with cluster

new

;

rerun this algorithm for the reverted

cluster;

end

return true;

end

return false;

Algorithm 2: Algorithm for updating a continuous cluster.

compared against the model dimensions. We limit the

number of evaluated clusters to avoid an inﬁnitely in-

creasing cluster count. The continuous cluster with

the smallest error value is selected as the tracked

robotic platform. We avoid false selections by using a

minimum required error value. If a continuous cluster

has not been updated or has not been marked as valid

after a speciﬁed amount of time, it is removed from

the list of continuous clusters.

It is possible that the best continuous cluster is not

updated for the current input cloud when no ﬁtting

cluster is found. There are two main reasons for this

to happen: 1. the tracked object has moved out of the

ﬁeld of view, and 2. the tracked object has not moved

for a certain amount of time and therefore faded into

the background. While, for the ﬁrst reason, the ob-

ject can no longer be tracked, the tracked object is

still visible in the second case, and we need to update

it in case of small movements. Similar to the clus-

ter detection part, we can use the previous position

of the cluster as the single foreground point and ﬁnd

its corresponding cluster using the additional back-

ground information. If a cluster was found in close

proximity, we can update the continuous cluster with

the new cluster. We can assume that the detected clus-

ter is the tracked object, as the chosen proximity value

only allows for minimal offset.

Position reﬁnement is the last step of the robot de-

tection and tracking procedure before publishing the

An Easy-to-Use System for Tracking Robotic Platforms Using Time-of-Flight Sensors in Lab Environments

575

Table 1: Position error (RMSE) for the Avia LiDAR sensor using the Multi-Lidar Multi-UAV dataset (Unit: meter, ∆t: 0.2 s).

MAV

Sequence

Mean

Stnd01 Stnd02 Stnd03 Stnd04 01 02 03 04 05

Holybro 0.05 0.043 0.0446 0.0503 0.0731 0.0992 0.1263 0.0681 0.0777 0.0702

Autel - - - - 0.1313 0.1074 0.1007 0.0845 0.1089 0.1066

Tello - - - - 0.1032 0.0991 0.1223 0.1008 0.0965 0.1044

Table 2: Position error (RMSE) for the Avia LiDAR sensor using the Multi-Lidar Multi-UAV dataset (Unit: meter, ∆t: 0.1 s).

MAV

Sequence

Mean

Stnd01 Stnd02 Stnd03 Stnd04 01 02 03 04 05

Holybro 0.0748 0.0701 0.0612 0.0854 0.0844 0.0837 0.1008 0.0611 0.0802 0.078

Autel - - - - 0.0844 0.0666 0.0713 0.0637 0.0658 0.0703

Tello - - - - 0.0578 0.0489 0.0585 0.0435 0.0493 0.0508

ﬁnal position. As from cluster detection until now,

a downsampled version of the input cloud has been

used, the accuracy of the estimated position can be

improved. Therefore, based on the estimated position,

the cluster is again extracted from the high resolution

cloud provided by the ground removal part. This clus-

ter now includes edge points that might not have been

considered with the downsampled version. Finally,

the cluster center is calculated using this new cluster.

4 EVALUATION

We evaluate our proposed system in terms of accu-

racy using an MAV dataset (Catalano et al., 2023b)

for comparison with a motion capturing system and a

custom dataset for comparison with the Vive Tracker

system. By using these two datasets, we show that our

system is applicable to MAVs as well as mobile robots

and that it is able to process LiDAR data as well as

regular ToF data. In addition, the custom dataset high-

lights requirements in the system setup.

4.1 MLMU Dataset

The Multi-LiDAR Multi-UAV dataset (Catalano

et al., 2023b) is a dataset designed for detecting and

tracking MAVs using LiDAR sensors. It contains

MAV ﬂights recorded by three different LiDAR sen-

sors, as well as the OptiTrack motion capturing sys-

tem for ground truth data. We only focus on the Livox

Avia sensor. Due to the distance between the sensors

and the MAVs, the point density is too low for the

other two sensors to provide accurate results. This

could be mitigated by reducing the distance between

the sensor and the tracked object, which we were un-

able to test. The three MAVs captured in the dataset

are the Holybro X500, the Autel Evo II, and the Tello.

We omitted the outdoor captures, since we focus on

tracking in indoor environments. Since a single Li-

DAR capture only contains partial information about

the environment, we preprocess the point clouds by

merging them over a certain timespan. We decided

to test the values of ∆t = 0.1 s and ∆t = 0.2 s. The

merged point clouds are published at a frequency of

16 hz in both cases, therefore adding single captures

to multiple point clouds.

The results of our evaluation are shown in Table 1

for ∆t = 0.1 s and in Table 2 for ∆t = 0.2 s. While

the error increased by 11 % with ∆t = 0.1 s for the

Holybro, the results of the Autel and Tello are better

(943 %) with a smaller timespan. While the larger

timespan doubles the point density, we noticed that

higher values leave a point trail, which is especially

noticeable with the smaller MAVs. On the other hand,

while the point density is sufﬁcient for the smaller

MAVs Autel and Tello with ∆t = 0.1 s, at some occa-

sions, the clustering fails for the Holybro, which se-

lects only parts of the MAV and therefore leads to in-

accuracies. Compared to the base results in (Catalano

et al., 2023b), we achieve similar accuracies without

the focus on a speciﬁc sensor type. Using only their

evaluated sequences, they perform better on the stan-

dard sequences, possibly due to the use of a kalman

ﬁlter, while we achieve a better overall mean error of

0.0589 m compared to their result of 0.0655 m.

4.2 Custom Dataset

The custom dataset has been captured using two pico

Monstar Infrared ToF sensors, with ground truth data

being provided by a Vive Tracker system. The two

sensors have been placed as shown in ﬁgure 2 (b),

with the second sensor capturing a side view. The

dataset contains recordings of the mobile robots Pep-

per and Turtlebot 2. An overview of the types of paths

recorded for both platforms is given in Figure 3. Se-

quences 7-8, which are not shown, also include ran-

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

576

Figure 3: Overview of some of the sequences recorded in the custom dataset for each robot using the Turtlebot and Pepper

robots. The path types include forward-backward (1), left-to-right (2), square (3), circle (4), slalom (5), and random (6).

dom movement with a focus on longer durations, like

in sequence 6. Sequence 0 is a mobile robot standing

still after a small initial forward movement to cap-

ture noise recorded by the sensor. Compared to the

other dataset, this dataset has a stronger focus on the

use case. This includes a smaller distance between

the sensors and the tracked objects for a higher point

density, as well as using two sensors to evaluate the

difference between using one or more sensors. The

individual point clouds have been captured with a fre-

quency of 5 hz and downsampled using a voxel size of

0.02 m. The downsampling is necessary in some cases

to ensure real-time performance, e.g. when the mo-

bile robot is close to the sensor. With smaller robotic

platforms, i.e. MAVs, this might not be necessary.

The results for the custom dataset are shown in Ta-

ble 3. The mean position errors are between 3 cm and

4 cm. The mean error is 10 % higher for the Pep-

per robot. This is due to the more complex shape

compared to the Turtlebot. Furthermore, small un-

controllable upper body movements of the robot can

also negatively impact the accuracy. The results also

show a smaller error of 914 % when using two sen-

sors, as multiple sensors can better capture the full

shape of the tracked object. Note that for sequence

1 (see ﬁgure 3a), there is an improvement in accu-

racy using a single sensor, as the second sensor al-

ters the recorded shape of the robotic platform due

to the respective left-right movement leading to inac-

curacies. Another reason for inaccuracies, as can be

seen in ﬁgure 3c and 3e, are the accuracy drops on the

outer parts of the tracking area as the visible part of

the robot changes. This is also present with the Pep-

per robot and is ampliﬁed through the difference in

shape, when rotating the robot, as shown in ﬁgure 3b,

3d, and 3f. The bounding box shape depends on the

orientation, which can lead to a shift in the estimated

center of the robot.

Table 3: Position error (RMSE) for the pico Monstar sensor

using the custom dataset for a single and for two sensors

(Unit: meter).

Sequence

Turtlebot Pepper

Multi-

view

Single-

view

Multi-

view

Single-

view

0 0.0105 0.0054 0.0104 0.0138

1 0.0328 0.0306 0.0364 0.0361

2 0.0284 0.0316 0.0352 0.0429

3 0.0255 0.0305 0.0267 0.035

4 0.0175 0.0209 0.0292 0.0358

5 0.0337 0.0328 0.0284 0.0343

6 0.0289 0.0349 0.0394 0.0461

7 0.0418 0.059 0.0357 0.0463

8 0.04 0.0498 0.0426 0.0412

Mean

(1-8)

0.031 0.0362 0.0342 0.0397

5 CONCLUSION

In this paper, a system for detecting and tracking

robotic platforms for capturing tracking data in lab

environments was presented. The main purpose of

the system is to allow for easy capturing of position

data using already available sensors instead of invest-

ing in a costly motion capturing system. It uses Time-

of-Flight based sensors, like LiDAR, to capture point

clouds. Tracking is already possible using a single

sensor, which allows for a fast setup. Only based

on the dimensions of a model, robotic platforms are

detected in these clouds and their positions are esti-

mated. There is no need to mount additional hardware

or markers onto the robotic platform.

The evaluation shows that the system is capable

of detecting and tracking different robotic platforms

with accuracies in the low-centimeter range using a

single or two sensors and with low double-digit fre-

quencies. While this does not match the accuracies

and frequencies of high-quality motion capturing sys-

tems, we argue that our solution is still applicable

to many use cases. Future work will include an in-

An Easy-to-Use System for Tracking Robotic Platforms Using Time-of-Flight Sensors in Lab Environments

577

tegration of orientation information provided by the

robotic platforms, as well as reﬁnements on the track-

ing accuracy by including the model dimension infor-

mation in the ﬁnal position estimation. Supplemen-

tary documentation and the source code are available

at https://github.com/IoT-Lab-Minden/RP Tracking.

REFERENCES

Catalano, I., Sier, H., Yu, X., Westerlund, T., and Quer-

alta, J. P. (2023a). Uav tracking with solid-state lidars:

Dynamic multi-frequency scan integration. arXiv

preprint arXiv:2304.12125.

Catalano, I., Yu, X., and Queralta, J. P. (2023b). Towards

robust uav tracking in gnss-denied environments: a

multi-lidar multi-uav dataset. In IEEE International

Conference on Robotics and Biomimetics (ROBIO).

Chen, H., Chen, X., Chuanlong Xie and, S. W., Zhou, Q.,

Zhou, Y., Wang, S., Su, H., and Quanfeng Xu and,

Y. L. (2024). Uav tracking and pose-estimation - te-

chinical report for cvpr 2024 ug2 challenge. In Con-

ference on Computer Vision and Pattern Recognition

(CVPR).

Deng, T., Zhou, Y., Wu, W., Li, M., Huang, J., Liu, S.,

Song, Y., Zuo, H., Wang, Y., Wang, H., and Chen,

W. (2024). Multi-modal uav detection, classiﬁcation

and tracking algorithm - technical report for cvpr 2024

ug2 challenge. In Conference on Computer Vision and

Pattern Recognition (CVPR).

Dogru, S. and Marques, L. (2022). Drone detection using

sparse lidar measurements. IEEE Robotics and Au-

tomation Letters, 7(2).

Furtado, J. S., Liu, H. H., Lai, G., Lacheray, H., and

Desouza-Coelho, J. (2019). Comparative analysis of

optitrack motion capture systems. In Advances in Mo-

tion Sensing and Control for Robotic Applications.

Gazdag, S., M

oller, T., Filep, T., Keszler, A., and Majdik,

A. L. (2024). Detection and tracking of mavs using

a lidar with rosette scanning pattern. arXiv preprint

arXiv:2408.08555.

Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013).

Vision meets robotics: The kitti dataset. International

Journal of Robotics Research (IJRR).

Giancola, S., Zarzar, J., and Ghanem, B. (2019). Lever-

aging shape completion for 3d siamese tracking. In

IEEE/CVF Conference on Computer Vision and Pat-

tern Recognition (CVPR).

Kaku, K., Okada, Y., and Niijima, K. (2004). Similarity

measure based on obbtree for 3d model search. In In-

ternational Conference on Computer Graphics, Imag-

ing and Visualization (CGIV).

Nie, J., Xie, F., Zhou, S., Zhou, X., Chae, D.-K., and

He, Z. (2024). P2p: Part-to-part motion cues guide

a strong tracking framework for lidar point clouds.

arXiv preprint arXiv:2407.05238.

Pleterski, J.,

Skulj, G., Esnault, C., Puc, J., Vrabi

c, R.,

and Podr

zaj, P. (2023). Miniature mobile robot detec-

tion using an ultralow-resolution time-of-ﬂight sensor.

IEEE Transactions on Instrumentation and Measure-

ment, 72.

Qi, C. R., Yi, L., Su, H., and Guibas, L. J. (2017). Point-

net++: Deep hierarchical feature learning on point sets

in a metric space. arXiv preprint arXiv:1706.02413.

Qi, H., Feng, C., Cao, Z., Zhao, F., and Xiao, Y. (2020).

P2b: Point-to-box network for 3d object tracking in

point clouds. In IEEE/CVF Conference on Computer

Vision and Pattern Recognition (CVPR).

Qingqing, L., Xianjia, Y., Queralta, J. P., and Westerlund, T.

(2021). Adaptive lidar scan frame integration: Track-

ing known mavs in 3d point clouds. In International

Conference on Advanced Robotics (ICAR).

Shule, W., Almansa, C. M., Queralta, J. P., Zou, Z., and

Westerlund, T. (2020). Uwb-based localization for

multi-uav systems and collaborative heterogeneous

multi-robot systems. Procedia Computer Science.

Sier, H., Yu, X., Catalano, I., Queralta, J. P., Zou, Z., and

Westerlund, T. (2023). Uav tracking with lidar as a

camera sensor in gnss-denied environments. In Inter-

national Conference on Localization and GNSS (ICL-

GNSS).

Taffanel, A., Rousselot, B., Danielsson, J., McGuire, K.,

Richardsson, K., Eliasson, M., Antonsson, T., and

onig, W. (2021). Lighthouse positioning system:

Dataset, accuracy, and precision for uav research.

arXiv preprint arXiv:2104.11523.

Was¸ik, A., Ventura, R., Pereira, J. N., Lima, P. U., and Mar-

tinoli, A. (2015). Lidar-based relative position esti-

mation and tracking for multi-robot systems. In Robot

2015: Second Iberian Robotics Conference.

Wang, H., Chen, C., He, Y., Sun, S., Li, L., Xu, Y., and

Yang, B. (2024). Easy rocap: A low-cost and easy-to-

use motion capture system for drones. Drones, 8(4).

Wang, H., Peng, Y., Liu, L., and Liang, J. (2021). Study

on target detection and tracking method of uav based

on lidar. In Global Reliability and Prognostics and

Health Management (PHM-Nanjing).

Yilmaz, Z. and Bayindir, L. (2022). Lidar-based robot de-

tection and positioning using machine learning meth-

ods. Balkan Journal of Electrical and Computer En-

gineering, 10(2).

Yuan, S., Yang, Y., Nguyen, T. H., Nguyen, T.-M., Yang, J.,

Liu, F., Li, J., Wang, H., and Xie, L. (2024). Mmaud:

A comprehensive multi-modal anti-uav dataset for

modern miniature drone threats. In IEEE Inter-

national Conference on Robotics and Automation

(ICRA).

Zhou, C., Luo, Z., Luo, Y., Liu, T., Pan, L., Cai, Z., Zhao,

H., and Lu, S. (2022). Pttr: Relational 3d point cloud

object tracking with transformer. In IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition

(CVPR).

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

578