An Extension of Orbslam for Mobile Robot Using Lidar and Monocular

Camera Data for SLAM Without Odometry

Rodrigo Lucas Santos

, Mateus Coelho Silva

and Ricardo A. R. Oliveira

Departmento de Computac¸

ao - DECOM, Universidade Federal de Ouro Preto - UFOP, Ouro Preto, Brazil

Keywords:

SLAM, Mobile Autonomous Robot, Data Fusion, RPLIDAR, RGB Camera.

Abstract:

Mobile autonomous robots require accurate maps to navigate and make informed decisions in real-time. The

SLAM (Simultaneous Localization and Mapping) technique allows robots to build maps while they move.

However, SLAM can be challenging in complex or dynamic environments. This study presents a mobile

autonomous robot named Scramble, which uses SLAM based on the fusion of data from two sensors: a

RPLIDAR A1m8 LiDAR and an RGB camera. How to improve the accuracy of mapping, trajectory planning,

and obstacle detection of mobile autonomous robots using data fusion? In this paper, we show that the fusion

of visual and depth data signiﬁcantly improves the accuracy of mapping, trajectory planning, and obstacle

detection of mobile autonomous robots. This study contributes to the advancement of autonomous robot

navigation by introducing a data-fusion-based approach to SLAM. Mobile autonomous robots are used in a

variety of applications, including package delivery, cleaning, and inspection. The development of more robust

and accurate SLAM algorithms is essential for the use of these robots in challenging environments.

1 INTRODUCTION

The era of automation is advancing exponentially,

transforming fundamental sectors such as logistics,

space exploration, and public safety. In this rapidly

evolving scenario, autonomous navigation emerges as

one of the most signiﬁcant and transformative chal-

lenges in the ﬁeld of robotics, with great revolution-

ary power especially in environments that challenge

the reliability of traditional navigation systems, such

as GPS and odometry, which often encounter limi-

tations in complex and dynamic scenarios (Grisetti

et al., 2007).

At the heart of this revolution, Simultaneous Lo-

calization and Mapping (SLAM) represents a critical

technological innovation, enabling robots to achieve

unprecedented autonomy. SLAM is not just about

mapping the unknown, but above all, it is the syner-

gistic fusion of sensor data to shape an understanding

of spatial environment in real time, where each mea-

surement and data contributes to the global position

of a mobile robot’s location.

The Visual SLAM strategy stands out for its abil-

https://orcid.org/0000-0003-3795-9218

https://orcid.org/0000-0003-3717-1906

https://orcid.org/0000-0001-5167-1523

ity to work under limited illumination and produce

high-deﬁnition maps. However, it faces environmen-

tal adversities such as shadows and reﬂections, which

can distort the perceived reality (Cadena et al., 2016).

By adding depth to the equation with sensors like 3D

LIDAR, we can obtain precise details about the world

around the robot. Although quite effective, the com-

plexity and cost associated with these sensors often

make them inaccessible for generalized applications

(Weiss and Biber, 2011). A more viable solution, 2D

LiDAR, offers an accessible alternative, providing ac-

curate measurements of angle and distance in a Carte-

sian plane.

Among the technologies highlighted in the liter-

ature, ORB-SLAM emerges with remarkable perfor-

mance, especially when compared to other monocu-

lar SLAM approaches (Zong et al., 2017) (Mur-Artal

and Tard

os, 2016). This system not only exempliﬁes

precise execution of real-time localization and map-

ping but also extends its applicability from indoor to

outdoor environments, overcoming the limitations of

traditional SLAM approaches.

In response to the call for signiﬁcant advances

in this ﬁeld, this study presents a new extension for

Monocular ORB-SLAM. It differentiates itself by in-

tegrating an innovative sensor, the rpLIDAR 2D, cre-

ating a hybrid methodology that capitalizes on the fu-

Santos, R., Silva, M. and Oliveira, R.

An Extension of Orbslam for Mobile Robot Using Lidar and Monocular Camera Data for SLAM Without Odometry.

DOI: 10.5220/0012736500003690

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 26th International Conference on Enterprise Information Systems (ICEIS 2024) - Volume 1, pages 943-954

ISBN: 978-989-758-692-7; ISSN: 2184-4992

943

sion of data from a LiDAR scanner and an RGB cam-

era. This synergy aims to improve the precision and

robustness of SLAM, tailoring it to face more com-

plex and dynamic environments.

To validate the proposed method, an autonomous

mobile robot was developed, equipped with a

SLAMTEC A1M8 LiDAR sensor and a Logitech

C270 webcam. Figure 1 represents the schematic

control of the developed mobile robot. Data fusion

is carried out by an algorithm capable of merging and

correlating information captured by both the LiDAR

and the camera. This data fusion process plays a cru-

cial role in obtaining a more comprehensive and ac-

curate representation of the robot’s surrounding envi-

ronment.

The authenticity of the proposed method was ver-

iﬁed through the development and testing of an au-

tonomous mobile robot, demonstrating that the data

fusion approach surpasses the limitations of method-

ologies based on singular sensors.

This contribution is a step forward in the develop-

ment of autonomous navigation, highlighting how the

integration of visual and depth data can unlock new

horizons for mapping, trajectory planning, and more

precise and reliable obstacle detection. This study not

only propels the ﬁeld of autonomous robotics but also

paves the way for an era of applicable innovations in

a diverse and challenging range of operational envi-

ronments.

Figure 1: Mobile Robot Control Schematic.

2 THEORETICAL REFERENCES

2.1 Autonomous Mobile Robots

The essence of mobile robots lies in the concept of

requiring minimal or no human intervention during

their locomotion and navigation operations. These

devices can generate symbols through interaction

with the external and/or internal environment, at-

tributing unique meaning to each element. In indoor

settings, mobile robots often rely on a combination of

sensors, including ﬂoor plans, sonar-based localiza-

tion systems, and inertial measurement units (IMUs).

These sensors play a crucial role in enabling the robot

to create an internal representation of its surround-

ings, thereby ensuring its effective and autonomous

operation.

To ensure adequate performance, the construction

of such robots requires the integration of several sen-

sors, the combination of which provides an accurate

internal representation of the environment. Location

plays a crucial role in the operation of an AMR, being

essential for mapping the environment and effectively

controlling its navigation.

However, noisy and/or conﬁned environments

present signiﬁcant challenges in obtaining location,

especially when using sensors such as GPS to ob-

tain the device’s position and orientation. The strat-

egy of obtaining the position and orientation of the

device, known as odometry, has the main objective of

developing a mathematical model that represents the

robot’s movements over time, resulting in a model of

the device’s current position and orientation. Further-

more, building maps is an essential task for the auton-

omy of mobile robots and is often closely linked to

the robot’s localization difﬁculties.

2.2 Visual Odometry

Visual odometry is a method for estimating the po-

sition and orientation of vehicles from camera im-

ages(Barducci and Scaramuzza, 2018). This method

is based on detecting visual features, such as land-

marks or objects, in successive images. Based on

the correspondence between these characteristics, it

is possible to estimate the vehicle’s movement. The

main objective of visual odometry is local consistency

and continuous estimation of vehicle trajectory and

poses. This means that the method must be able to ac-

curately estimate vehicle movement over time, even

in environments with changing lighting or obstruc-

tions.

The visual odometry process can be divided into

four main steps:

1. Detection of visual characteristics: in this step,

visual characteristics are identiﬁed in successive

images. These features can be landmarks, lines,

or other objects that are easily tracked over time.

2. Feature tracking: In this step, the identiﬁed vi-

sual features are tracked across successive images.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

944

This is done by comparing the features of an im-

age with the features of previous images.

3. Motion estimation: in this step, the vehicle move-

ment is estimated based on the correspondence

between the visual characteristics tracked.

4. Position update: in this step, the vehicle position

is updated based on the information obtained in

the previous step.

Visual odometry is a promising technique for esti-

mating the position and orientation of vehicles(Cheng

et al., 2022). However, this method presents some

challenges, such as the accumulation of errors over

time and the difﬁculty of adapting to new environ-

ments.

2.3 SLAM - Simultaneous Localization

and Mapping

Localization of an Autonomous Mobile Robot

(AMR) plays a crucial role in its operation, as it is

essential for performing environment mapping and

controlling its navigation. However, dynamic and/or

conﬁned environments pose signiﬁcant challenges in

obtaining localization, especially when using sensors

such as GPS to obtain the device’s position and orien-

tation. The strategy of obtaining the device’s position

and orientation, known as odometry, has as its main

objective to develop a mathematical model that repre-

sents the robot’s movements over time, resulting in a

model of the device’s current position and orientation

(Dudek and Jenkin, 2010). In addition, map building

is an essential task for the autonomy of mobile robots

and is often closely linked to the robot’s localization

difﬁculties(da Cruz J

unior et al., 2021).

SLAM is a robotic process that addresses two es-

sential questions: ”Where am I?” and ”What is the

environment around me?” Addressing these questions

simultaneously represents a signiﬁcant challenge due

to the complexity of the environment and the possi-

bility of new questions arising during the execution of

SLAM. SLAM algorithms have been developed to ad-

dress the challenge of mapping and localization, vary-

ing in the sensors and mathematical models used.

In general, sensors such as cameras, LiDAR,

radar, GNSS, and IMU are employed to collect infor-

mation from the environment. This data is processed

through optimization problems to solve the SLAM

problem(Kurt-Yavuz and Yavuz, 2012). The integra-

tion of multiple sensors into a single framework has

shown promise in robotics, especially in the naviga-

tion of autonomous vehicles. Although there are sev-

eral solutions in the literature that improve existing

techniques, only a few of them adopt truly innova-

tive approaches, such as data fusion, which combines

information from multiple sensors to increase the ac-

curacy and reliability of SLAM.

In summary, SLAM plays a crucial role in robotics

by addressing simultaneously localization and map-

ping in challenging environments, with different al-

gorithms and techniques aiming to improve the per-

formance and accuracy of this process.

3 RELATED WORKS

In (Ni et al., 2022) an improved adaptive ORB-SLAM

method with monocular vision in dynamic environ-

ments for robots is proposed. This method is capa-

ble of dynamically adapting to the presence of mov-

ing and dynamic objects, adjusting the robustness of

the visual characteristics used for SLAM. As a re-

sult, there is a signiﬁcant improvement in the accu-

racy and reliability of the system in dynamic environ-

ments, which makes it attractive in real environments.

Additionally, in (Li et al., 2023) a method capa-

ble of performing a dense reconstruction of substa-

tion rooms using LSD (Large-Scale Direct) is pro-

posed. LSD SLAM is a SLAM method that uses di-

rect information from images to map the environment

and locate the mobile agent. This study aims to re-

construct internal substation environments in detail,

which are complex environments and require high

mapping accuracy to guarantee safety and efﬁciency

when implementing LSD SLAM. The experimental

results demonstrate the effectiveness of the method in

dense reconstruction of substation rooms, providing

accurate maps that can be useful for various practical

applications such as maintenance planning and safety

inspection. It is worth mentioning that this system

requires high processing power and very high speed

GPUs for everything to occur as promised.

Although reliable and robust, the mentioned Vi-

sual SLAM techniques assume that environments

are static. However, in (Soares et al., 2022), a

new VISUAL SLAM method specially developed

for crowded human environments is introduced, em-

ploying person detection. In the literature, several

works address the implementation of SLAM in mo-

bile robots based on Raspberry Pi. One of these works

is (Serrata et al., 2017), which implements SLAM

technology using a low-cost Pixy camera, a robot

kit with an L298N motor board, and a Raspberry Pi

”V2.0”. The system was able to identify an average of

75% of reference points when detecting corners and

corridors, with an average power consumption of 1.14

Another relevant work is (Miranto et al., 2019),

An Extension of Orbslam for Mobile Robot Using Lidar and Monocular Camera Data for SLAM Without Odometry

945

which implements the Orbslam (Mur-Artal et al.,

2015) method with a monocular camera via a web-

cam. However, the system is integrated and depen-

dent on ROS (Robot Operating System) although it

is based on Raspberry Pi. Using the ORBSLAM

method, it is possible to detect objects of size 31.5x56

cm at a distance of 70 cm, with an error value of

1.21% and an accuracy value of 98.79%. These

works show that SLAM can be effectively deployed

on Raspberry Pi-based mobile robots. However, there

is still room for improvement, such as increasing ob-

ject detection accuracy and reducing power consump-

tion.

Toroslu and Do

gan (2018)(Toroslu and Do

gan,

2018) proposed a sensor fusion method for au-

tonomous vehicles that uses a combination of ultra-

sonic, optical, and IMU sensors. The method was im-

plemented on a mobile robot with two motors, and

the results showed that the method is capable of es-

timating the position and orientation of the robot ac-

curately in environments with changes in lighting and

obstructions. To achieve this, it uses a sensor fusion

algorithm based on a Kalman ﬁlter, which is not al-

ways so simple to conﬁgure. Additionally, they used

an optical encoder instead of the accelerometer in an

attempt to avoid noise and measurement error issues.

Using the Pygame library, they calculated the coordi-

nates and location of objects detected during naviga-

tion.

4 METHODOLOGY

In this study, we introduce an innovative approach for

performing SLAM (Simultaneous Localization and

Mapping) using two main visual sensors: a Log-

itech Webcam and a 2D RPLIDAR model A1M8

from SLAMTEC. To validate our experiments, we de-

veloped an autonomous mobile robot, the details of

which will be thoroughly presented in the experimen-

tal results section. The methodology adopted is di-

vided into four distinct sections.

In the ﬁrst section, we will address the robot’s

perception methods and the sensors involved in this

stage. The second section will discuss the robot’s

cognition model, while the third section will focus on

robot navigation, covering the concepts of mapping,

localization, and the SLAM method that we devel-

oped speciﬁcally for Robotic Navigation, the central

object of this research. Finally, the fourth section will

explore the proposed scenario for locomotion and ex-

periments, the results of which will be presented in

the subsequent section of this work.

4.1 Perception Methods

In the ﬁeld of robotics, especially for autonomous

mobile robots, perception plays a crucial role, en-

abling the robot to acquire self-awareness and under-

stand the external context in which it operates (Ran

et al., 2021). The ability of a robot to navigate au-

tonomously depends signiﬁcantly on the perception’s

capability to accurately gather information and fea-

tures from the environment, allowing the robot to

comprehend its surrounding area. Typically, this per-

ception is achieved through the combination of high-

resolution sensors and efﬁcient algorithms to extract

information from these sensors. In the scope of this

work, to accomplish the robot’s perception stage, we

employed two speciﬁc sensors: a Slamtec YDLIDAR

A1M8 and a Logitech C270 webcam, respectively.

The main objective of the camera implemented in

this work is to recognize and extract characteristics of

the environment and perceive ﬁxed locations through

predetermined tags. This perception is fundamental

for decision-making in controlling the robot’s move-

ment. These objectives highlight the importance of

the camera as an essential component for perception

and decision-making in robotic environments, empha-

sizing its fundamental role in the efﬁciency and safety

of robotic operations.

Figure 2: Robot Perception Module.

Additionally, computer vision algorithms written

in Python were implemented in the perception module

so that the camera and lidar recognize real-world data

and represent it in the robot’s world. Figure 2 presents

the basic scheme of the robot perception module pro-

posed in this work.

4.1.1 Camera Data Collection and Processing

To capture and process data from the camera, we used

a webcam from the manufacturer Logitech, model

C270 RGB, connected to the Raspberry Pi model

3B. The images are processed using Python and the

OpenCV library (Open Source Computer Vision Li-

brary), an open-source computer vision tool that of-

fers an extensive variety of features (

Culjak et al.,

2012), such as facial recognition, motion detection,

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

946

patterns, calibration of the camera, image segmenta-

tion and feature detection in images.

The algorithm operates as follows: the robot is ini-

tially positioned at an arbitrary point within a maze,

as detailed in section 4.3 of this work. During its tra-

jectory, the robot avoids obstacles and collisions with

the walls of the maze. Eight predeﬁned images were

strategically placed throughout the environment, re-

ferred to in this study as ”tags”. The robot moves

through the maze, constantly seeking the greatest free

distance, validated by the LIDAR sensor. Simul-

taneously, the camera module, which integrates the

robot’s perception, captures videos using the OpenCV

library.

The identiﬁcation of ”tags” is carried out through

a combination of the ORB (Rublee et al., 2011) and

RANSAC (Cantzler, 1981) algorithms. These algo-

rithms allow the robot to perceive and identify itself

as ”tags” in the environment at a ﬁxed distance of

30 cm, employing descriptor point recognition tech-

niques and correspondences between images.

When the robot ﬁnds one of these ”tags”, it locates

its position in the environment. After identifying the

”tag”, the robot makes a soft stop, captures an im-

age of the environment with the camera, and adds a

stamp containing temporal information, such as data

and time. Automatically, the robot starts mapping

around the current position using the LIDAR module

within a 5-second interval. After completing this lo-

cal mapping, the robot continues its navigation in the

maze, avoiding obstacles, until it ﬁnds another ”tag”

when it repeats the local mapping process.

4.1.2 Data with Lidar

The Lidar is a sensor that measures distances and an-

gles using laser light (Ashraf et al., 2017). Compris-

ing a laser emitted and reﬂected by surrounding ob-

jects, a detector measuring the time taken for the light

to return, and a data processing system that, utilizing

the speed of light, calculates the distance to the ob-

ject. This system can generate a 2D representation of

the environment using the collected data of distance

and angle, respectively. The data processing system

combines the distance and angle information to cre-

ate a two-dimensional environment representation.

The device used in this study is the RPLIDAR

model A1M8 from Yahboom. Its measurement fre-

quency ranges from 8000 to 32000 times per second,

and its scanning frequency varies from 7 to 16 Hz,

easily adjustable by the operator. Widely employed

in robotics for mapping intricate environments, the

RPLIDAR is capable of creating high-precision grid

maps (Son et al., 2021). Each measurement per-

formed by the RPLIDAR involves a 360

rotation

emitting a laser that traverses the surroundings until

encountering an obstacle, immediately returning dis-

tance and angle information of the detected object.

Data reading is conducted through the Python lan-

guage using the RPLIDAR module, openly avail-

able on GitHub (Skoltech Robotics, 2024). Within

the robot’s perception module, a speciﬁc submodule

is dedicated to capturing and interpreting data from

the lidar sensor. In the scope of this work, the li-

dar performs two fundamental functions: it obtains

distance information from obstacles to improve the

robot’s navigation and performs local mapping after

the camera module identiﬁes tags positioned arbitrar-

ily throughout a maze.

The robot navigates through an environment,

avoiding obstacles identiﬁed by the lidar, until it lo-

cates one of the tags positioned along the path. Detec-

tion is carried out by the camera, which, upon identi-

fying the tags, immediately activates the robot’s con-

trol module. The cart then begins to map the location

using the lidar, located on the top of the robot. The li-

dar records the points collected in its current position

at 5-second intervals, allowing the robot to accurately

capture information on the distances and angles of ob-

jects in its surroundings, limited to a maximum radius

of 12 meters. After 5 seconds, the robot resumes navi-

gating the environment, avoiding obstacles, and, upon

ﬁnding another tag, interrupts to carry out local map-

ping with the lidar module, repeating the process dur-

ing navigation.

The data collected by the lidar in each mapping is

stored in a text ﬁle (.txt), where each line represents

a measurement taken by the lidar, including angle in-

formation in degrees and distance in millimeters. On

each measurement (line), a timestamp is added to en-

able accurate fusion of the lidar data with the camera

data. This timestamp is crucial to accurately synchro-

nizing data.

4.2 Cognition Model

As the robot is exposed to the environment, it col-

lects and accumulates data related to its rotation, al-

lowing you to familiarize yourself with the objects

around you progressively. This increasing familiarity

allows the robot to consider previously visited loca-

tions without the need for long processing time.

The robot is essentially composed of two dis-

tinct parts: the mechanical structure and the control

module. The control module covers data perception

through sensors, processing, and cognitive naviga-

tion. This approach allows the robot to use the re-

sources of its mechanical structure to move intelli-

gently, avoiding collisions and accidents. After pro-

An Extension of Orbslam for Mobile Robot Using Lidar and Monocular Camera Data for SLAM Without Odometry

947

cessing and interpreting the data by the perception

module, cognitive education has two main objectives:

avoiding obstacles and optimizing routes, allowing

the robot to move towards a speciﬁc goal(Al-Araji

et al., 2019). Furthermore, the cognitive system for

independent mobile robots is composed of planning

algorithms, dynamic mechanisms for strategy change,

and a module for data fusion and protection (

et al., 1999). The robot designed in this research has

the principle of avoiding obstacles to its front. As it

identiﬁes itself as “tags”, the environment becomes

more familiar, providing a sense of security and trust,

thus allowing the robot to continue navigating the en-

vironment along the determined path.

4.3 Navigation

The operation of a mobile robot involves several fun-

damental actions, such as the ability to avoid obsta-

cles, which in turn can be subdivided into several

tasks, including the ability to bypass speciﬁc obsta-

cles (Cao et al., 1999). For a mobile robot, navigation

is understood as obtaining the necessary orientation

to reach a predetermined destination or to move along

a path in environments that have known elements, ref-

erence points, and distinct characteristics (Cao et al.,

1999). Autonomous navigation systems for mobile

robots employ dedicated algorithms to avoid obsta-

cles, identify environmental features through sensors,

and adjust direction autonomously during locomotion

(Khan and Ahmmed, 2016). In essence, navigation

encompasses four main areas of concern: Mapping,

Location, Route Planning, and the ability to avoid ob-

stacles (Alatise and Hancke, 2020).

4.3.1 Path Planning Based on the Strategy of

Avoiding Obstacles

The trajectory planning of the autonomous mobile

robot is based on the obstacle avoidance strategy. Ini-

tially, the robot starts from an unknown environment,

exclusively using the LIDAR sensor to collect points

related to detected objects. Each point captured in the

environment contains information about the angle in

degrees and distance in millimeters, referenced to the

front part of the robot, where the LIDAR sensor is

positioned.

In short, the navigation algorithm divides the

robot’s surrounding area into 12 equally sized seg-

ments. Initially considering only the points in front

of the robot, the algorithm analyzes the six corre-

sponding segments. It then converts these points into

lines extending from the center of the robot to the

point collected by the LIDAR, representing the dis-

tance of the detected object to the robot. Each of the

six front segments of the robot is made up of the sum

of the lines that constitute it. Subsequently, the algo-

rithm groups the lines of each segment into a single

line, maintaining the direction and direction of the

one with the highest value. The coordinates of the

segment lines generated around the robot are made

by the equation:1:

C(d, θ) =



d · cos



180

· π



, d · sin



180

· π



(1)

After generating the 12 lines (one for each seg-

ment), the algorithm focuses on the points in front of

the robot. At this time, he analyzes the six segments in

front of him. If the sum of the lines of the segments to

the left of the robot is greater than that to the right and

in front of the robot, it turns to the left. If the sum of

the straight lines of the segments to the right is greater,

it turns to the right. Otherwise, he keeps moving for-

ward. The lines are adjusted to double the size of the

robot to avoid obstacles and ensure smooth maneuver-

ing. Figure 3 illustrates the robot’s decision-making

process, the imaginary segments generated by LIDAR

points, and the autonomous navigation method.

Figure 3: Direction chosen by the robot.

4.3.2 Mapping and Location

Several approaches and methods have been developed

to enable mobile robots to navigate efﬁciently while

performing speciﬁc tasks. Among the various exist-

ing approaches, the one based on maps has proven

to be the most effective. For an Autonomous Mobile

Robot (AMR), mapping involves creating a meaning-

ful representation of the environment that serves as a

model for the robot, providing it with the knowledge

necessary to make decisions and achieve its objectives

(Alatise and Hancke, 2020). The robot’s ability to

make decisions is intrinsically linked to knowledge of

the environment, which introduces the need to solve

the problem of robot localization.

These problems are interdependent, solving one

implies solving the other, and vice versa. In this con-

text, the problem is formally approached using SLAM

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

948

(Simultaneous Localization and Mapping), which in-

volves the simultaneous calculation of the different

positions of the robot and the environment model.

In the scope of this study, predeﬁned labels spaced

throughout a maze are detected by the robot through

the camera module. These tags not only locate the

robot in the environment but also, connotatively, ”in-

form and validate to the robot that it is navigating

in an environment where its characteristics have been

detected.” In other words, the labels indicate that the

robot is currently navigating the maze. Upon detect-

ing a tag, the robot stops its movement, collects data

with the LIDAR sensor over a 5-second interval, and

then continues its trajectory avoiding obstacles. This

process is repeated when another tag is found, trigger-

ing the LIDAR again to map the area. After collect-

ing environmental information by LIDAR, the data is

stored in a text ﬁle with time stamp information. Us-

ing Python and the Matplotlib library, this data, orig-

inally represented in polar coordinates, is converted

to coordinates on a Cartesian plane. Subsequently, a

graph is generated that faithfully represents the local

mapping obtained.

4.4 Environment Preparation

In this section, we will detail the preparation of the

environment for the navigation and SLAM mapping

operations of the autonomous mobile robot. We opted

for a maze built with cardboard boxes, aiming to cre-

ate a signiﬁcant challenge for the robotic system un-

der study. The choice of this maze seeks to simulate

realistic conditions, challenging the robot’s ability to

navigate a dynamic, three-dimensional environment.

The maze consists of four corridors and open ar-

eas, strategically constructed from cardboard boxes.

The arrangement of the boxes was planned to create

a challenging environment, testing the robot’s agility

and efﬁciency in detecting, navigating, and mapping

obstacles. For aesthetic purposes, the boxes were cov-

ered with sheets of white A4 paper.

The dimensions of the maze include two rectan-

gular walls of different sizes. The external wall mea-

sures 2.80 meters long by 2.10 meters wide, while

the internal wall is 1.50 meters long and 1.20 meters

wide. The height of the walls varies due to the pres-

ence of boxes of different sizes and arrangements, im-

itating a heterogeneous and diverse environment.

To support the walls, we used trusses made from

recyclable wooden sticks, ﬁxed with hot glue. Fur-

thermore, along the four corridors, eight predeﬁned

”tags” were positioned to locate the robot in the envi-

ronment. Four of them are at the ends of the maze (in

each corner), while the other four are centered on each

inner wall. This arrangement provides crucial refer-

ence points for robot navigation and mapping. Figure

4 illustrates the proposed environment for the experi-

ments in this study.

Figure 4: 2D Floor Plan of the Proposed Environment.

5 EXPERIMENTAL RESULTS

In this section, we present the results obtained during

the experiments conducted to evaluate the effective-

ness and performance of the proposed Simultaneous

Localization and Mapping (SLAM) approach based

on tag recognition using the ORB+RANSAC combi-

nation. The experiments were carried out in a real en-

vironment where the mobile robot was programmed

to operate autonomously, avoiding obstacles while in-

teracting with previously positioned tags.

5.1 Mobile Robot

The robot used in the experiments and its physical at-

tributes will be presented in this section.

5.1.1 Mechanical Characteristics - Physical

Structure

The basic structure of the robot is made of blue

acrylic, with dimensions of 5 mm thick, 18 cm long,

and 13 cm wide. Below it, there is a metal plate struc-

ture measuring 13 cm long, 8 cm wide, and extend-

ing diagonally for approximately 3 cm in two ends to

support the engines. The robot has two heels, each

connected to a motor, positioned at the top right and

bottom left sides, respectively. Additionally, on each

side (right and left), there is a slidewheel connected

to a track that passes through another wheel, con-

trolled by the corresponding motor. A frame is built

4 cm above the basic robot structure using the same

acrylic material, with dimensions 17 cm long and 11

An Extension of Orbslam for Mobile Robot Using Lidar and Monocular Camera Data for SLAM Without Odometry

949

cm wide, where the robot’s visual sensors are located.

The overall dimensions of the robot are 21 cm high,

23 cm long, and 18 cm wide, respectively. The robot

can be viewed from its cavalry perspective in Figure

Figure 5: Knight’s perspective of the Scramble Robot.

5.1.2 Mechanical Characteristics - Hardware

The robot developed for this work is composed of two

main hardware components to control the actuators:

An Arduino Uno and an expansion board (Shield)

from the manufacturer Pololu model MC33926. This

expansion board (Shield) is an H bridge controller

composed of two drivers that operate between an ap-

propriate voltage between 5 and 28 volts with a direct

current of around 3 Amps and is intended to control

two direct current (DC) motors.

The robot has two DC motors that operate with an

appropriate voltage between 6 and 12 volts. Each en-

gine weighs approximately 100 grams, has a 39mm

cylinder, and its maximum speed can reach 350 rpm.

One of the motors is connected to the upper right

wheel and the other to the lower left wheel.

On the basic acrylic robot structure on the ﬁrst

ﬂoor, the Arduino Uno and motor driver shield are po-

sitioned at the front, while the Raspberry Pi 3 Model

B is placed at the back. At the front of the sec-

ond ﬂoor, a Logitech C270 RGB camera and a cir-

cuit board for power distribution are located, while at

the rear is positioned the RPLIDAR A1M8 Pro Li-

dar TOF 360

◦

with 8m scanning range sensor. The

camera and Lidar are integrated into the robot’s per-

ception module and are fundamental for simultaneous

robot mapping and localization (SLAM).

The Raspberry Pi is responsible for interpreting

and processing the data obtained by visual sensors po-

sitioned on the second ﬂoor of the robot. The Rasp-

berry Pi is a fully functional minicomputer powered

Figure 6: Original image captured from the Labyrinth with

Tags Positioned.

by a Broadcom BCM2837 system-on-chip (SoC),

which houses four high-performance ARM Cortex-

A53 processing cores. This device operates at a fre-

quency of 1.2 GHz and has a cache memory of 32 kB

at level 1 and 512 kB at level 2. Additionally, it in-

cludes an integrated graphics processor connected to

a 1 GB LPDDR2 memory module. With four USB

ports, a 40-pin input and output bus, Bluetooth Low

Energy (BLE), and built-in Wi-Fi, the Raspberry Pi

offers a versatile and powerful platform for a variety

of projects and applications(Balon and Simi

c, 2019).

5.2 Experiment Setup

The experiments were conducted with the mobile

robot autonomously navigating the proposed environ-

ment. The robot was equipped to recognize tags po-

sitioned in the environment and map the areas sur-

rounding each identiﬁed tag. At each tag detection,

four top-view images were captured, and local maps

were generated from the points collected by the lidar

sensor.

5.3 Visual Results

Figure 6 illustrates the originally captured image

of the environment showing the arrangement of the

boxes, the positioned tags, the corridor areas, and the

cardboard wall supports made from wooden sticks.

Figure 7 shows the robot’s vision at the moment

tag1 was found in the environment and the orb method

was used to ﬁnd correspondence between the tags and

the robot’s momentary vision. For each tag detection,

an image of the top view of the tag was captured.

robot. The robot created an occupancy grid map to

represent the environment as it navigated and mapped

the area near the tag. Figure 8 illustrates one of these

images, highlighting the position of the robot when

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

950

Figure 7: Robot vision when ﬁnding tag1.

Figure 8: Robot Position when Tag1 is Recognized.

Figure 9: Global Map Update - detecting Tag1.

Figure 10: Robot vision when ﬁnding tag2.

recognizing Tag1. After identifying Tag1, the robot

performed local mapping based on the view avail-

able at that time and then updated the global map, as

shown in Figure 9.

Similar steps were repeated for Tags 2, 3 and 4,

after the robot recognized the corresponding tags, as

illustrated in Figures 10 to 18. Each tag detection re-

sulted in the capture of a top view image, followed by

the generation of a local map. and updating the global

marking map. environment. These results highlight

the effectiveness of our SLAM method in creating ac-

curate maps and reliably localizing the robot in a dy-

namic environment.

Figure 11: Robot Position when Tag2 is Recognized.

Figure 12: Global Map Update - detecting Tag2.

Figure 13: Robot vision when ﬁnding tag3.

Figure 14: Robot Position when Tag3 is Recognized.

Figure 15: Global Map Update - detecting Tag3.

An Extension of Orbslam for Mobile Robot Using Lidar and Monocular Camera Data for SLAM Without Odometry

951

Figure 16: Robot vision when ﬁnding tag4.

Figure 17: Robot Position when Tag4 is Recognized.

Figure 18: Global Map Update - detecting Tag4.

5.4 Quantitative Results

The quantitative results of the experiments included

metrics such as the accuracy of the robot’s location

relative to reference tags, the quality of the generated

maps, and the processing time required for mapping

and localization operations.

5.5 Discussion of Results

The experimental results demonstrate the effective-

ness of the proposed SLAM technique in creating ac-

curate maps and accurately locating the robot in a real

environment using tags as reference points. The inte-

gration of LiDAR scans, camera images, and tag in-

formation allowed the creation of detailed maps and

efﬁcient autonomous navigation of the robot.

5.6 Manual Positioning Experiment

To complement the experiments conducted with au-

tonomous navigation, we carried out an additional ex-

periment where the robot was manually positioned in

various positions in the proposed environment. In this

experiment, the robot did not perform autonomous

navigation but was placed in speciﬁc locations to eval-

uate the ability of the SLAM approach to map the en-

vironment from different points of view.

5.6.1 Experiment Setup

The robot was manually positioned in several strate-

gic locations in the proposed environment. For each

position, the robot was programmed to scan the envi-

ronment with its LiDAR sensor. From this data, local

maps corresponding to each position of the robot were

generated. Figure 19 illustrates the positions in which

the robot was positioned.

Figure 19: Robot Position.

5.6.2 Results

The results of the manual positioning experiment

highlighted the effectiveness of the SLAM approach

in creating detailed maps of the environment from

different perspectives. Each positioning of the robot

was sought to generate a precise local map, high-

lighting the technique’s ability to map the environ-

ment regardless of the robot’s initial position. In Fig-

ure 20 we manually added the map corresponding to

each position in which the robot was allocated and

its corresponding map produced. However, it was

observed that the robot was unable to integrate the

maps produced, resulting in a lack of continuity be-

tween them. During the experiment, the robot had

difﬁculty locating in the environment due to the cre-

ation of a new reference map for each new position,

compromising its navigation and orientation capabil-

ities. The lack of visual perception through the cam-

era contributed to this limitation. Furthermore, the

processing time required for each new position of the

robot was identiﬁed as a challenge, making the pro-

cess time-consuming and increasing the likelihood of

system interruption and inoperability, especially con-

sidering the limited hardware resources available.

5.6.3 Discussion

The inclusion of the manual positioning experiment

as a control strengthens the results obtained in the au-

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

952

tonomous navigation experiments. By eliminating the

autonomous navigation variable, it was possible to di-

rectly evaluate the ability of the SLAM approach to

map the environment from multiple ﬁxed positions.

The consistent results obtained in this experiment fur-

ther corroborate the effectiveness and reliability of

the proposed SLAM technique from the ﬁrst exper-

iment. However, observations made during the exper-

iment also highlight the limitations of the approach,

especially regarding map integration and continuous

robot localization. These aspects must be addressed

in our future work to ensure the improvement and ap-

plicability of the technique in dynamic and constantly

changing environments.

6 CONCLUSIONS

In conclusion, this study presents a mobile au-

tonomous robot, Scramble, equipped with an innova-

tive SLAM approach based on data fusion from an

RPLIDAR A1m8 LiDAR and an RGB camera. The

main objective was to improve the accuracy of map-

ping, trajectory planning, and obstacle detection for

autonomous mobile robots in complex and dynamic

environments through data fusion. The experimen-

tal results demonstrate the effectiveness of the pro-

posed SLAM approach. By taking advantage of vi-

sual and depth sensors, the robot successfully navi-

gated a maze-like environment, recognizing tags and

updating its global map. The fusion of visual and Li-

DAR data signiﬁcantly improved the accuracy and ro-

bustness of the SLAM system, outperforming single-

sensor SLAM approaches in several scenarios.

Environmental preparation, including construc-

tion of the maze and placement of tags, aimed to

create a realistic and challenging conﬁguration for

the robot. The experimental setup, involving au-

tonomous navigation with tag recognition, demon-

strated the robot’s ability to adapt and navigate dy-

namically, proving the effectiveness of the proposed

SLAM approach. However, the manual positioning

Figure 20: Local map and robot positions.

experiment revealed challenges related to map inte-

gration and continuous localization of the robot. Cre-

ating a new reference map for each position compro-

mised navigation capabilities, indicating the need for

improvements in map integration and localization in

dynamic environments.

The presented SLAM approach opens avenues

for future research and development. Overcoming

the identiﬁed limitations and improving the system’s

adaptability to changing environments will be cru-

cial for real-world applications of autonomous mobile

robots. The fusion of visual and LiDAR data promises

to create more accurate and robust maps, enabling

precise navigation in challenging scenarios. In sum-

mary, this research contributes to the advancement of

autonomous robot navigation by proposing and val-

idating a data fusion-based SLAM approach. As the

ﬁeld continues to evolve, the ﬁndings of this study lay

the foundation for further exploration and innovation

in the domain of mobile autonomous robotics, pro-

moting the development of more reliable and efﬁcient

systems for diverse applications.

ACKNOWLEDGEMENTS

The authors would like to thank FAPEMIG, CAPES,

CNPq, Instituto Tecnol

ogico Vale, and the Federal

University of Ouro Preto for supporting this work.

This work was partially funded by CAPES (Finance

Code 001) and CNPq (308219/2020-1).

REFERENCES

Al-Araji, A. S., Ahmed, A. K., and Dagher, K. E. (2019).

A cognition path planning with a nonlinear controller

design for wheeled mobile robot based on an intelli-

gent algorithm. Journal of Engineering, 25(1):64–83.

Alatise, M. B. and Hancke, G. P. (2020). A review on chal-

lenges of autonomous mobile robot and sensor fusion

methods. IEEE Access, 8:39830–39846.

Ashraf, I., Hur, S., and Park, Y. (2017). An investigation of

interpolation techniques to generate 2d intensity im-

age from lidar data. IEEE Access, PP:1–1.

Balon, B. and Simi

c, M. (2019). Using raspberry pi

computers in education. In 2019 42nd International

Convention on Information and Communication Tech-

nology, Electronics and Microelectronics (MIPRO),

pages 671–676. IEEE.

Barducci, A. and Scaramuzza, D. (2018). Visual odom-

etry: A survey. IEEE Transactions on Robotics,

34(6):1225–1246.

Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza,

D., Neira, J., Reid, I., and Leonard, J. J. (2016). Past,

present, and future of simultaneous localization and

An Extension of Orbslam for Mobile Robot Using Lidar and Monocular Camera Data for SLAM Without Odometry

953

mapping: Toward the robust-perception age. IEEE

Transactions on robotics, 32(6):1309–1332.

Cantzler, H. (1981). Random sample consensus (ransac).

Institute for Perception, Action and Behaviour, Divi-

sion of Informatics, University of Edinburgh, 3.

Cao, J., Liao, X., and Hall, E. (1999). Reactive naviga-

tion for autonomous guided vehicle using neuro-fuzzy

techniques. 3837.

Cheng, J., Zhang, L., Chen, Q., Hu, X., and Cai, J. (2022).

A review of visual slam methods for autonomous driv-

ing vehicles. Engineering Applications of Artiﬁcial

Intelligence, 114:104992.

da Cruz J

unior, G. P. et al. (2021). Localizac¸

ao e mapea-

mento para rob

os m

oveis em ambientes conﬁnados

baseado em fus

ao de lidar com odometrias de rodas

e sensor inercial.

Dudek, G. and Jenkin, M. (2010). Computational principles

of mobile robotics. Cambridge university press.

Grisetti, G., Stachniss, C., and Burgard, W. (2007).

Improved techniques for grid mapping with rao-

blackwellized particle ﬁlters. IEEE transactions on

Robotics, 23(1):34–46.

Khan, S. and Ahmmed, M. K. (2016). Where am i?

autonomous navigation system of a mobile robot in

an unknown environment. 2016 5th International

Conference on Informatics, Electronics and Vision

(ICIEV), pages 56–61.

Kurt-Yavuz, Z. and Yavuz, S. (2012). A comparison of ekf,

ukf, fastslam2. 0, and ukf-based fastslam algorithms.

In 2012 IEEE 16th International Conference on In-

telligent Engineering Systems (INES), pages 37–43.

IEEE.

Li, Z., Zhao, X., and Li, Y. (2023). Dense reconstruc-

tion of substation room with lsd slam. In Third In-

ternational Conference on Artiﬁcial Intelligence, Vir-

tual Reality, and Visualization (AIVRV 2023), volume

12923, pages 193–201. SPIE.

Miranto, H. A., Jati, A. N., and Setianingsih, C. (2019). Re-

alization of point cloud maps using ros & visual sen-

sor on raspberry pi 3 based mobile robot. 2019 4th

International Conference on Information Technology,

Information Systems and Electrical Engineering (ICI-

TISEE), pages 517–522.

Mur-Artal, R., Montiel, J. M. M., and Tardos, J. D. (2015).

Orb-slam: a versatile and accurate monocular slam

system. IEEE transactions on robotics, 31(5):1147–

1163.

Mur-Artal, R. and Tard

os, J. D. (2016). Orb-slam2:

An open-source slam system for monocular, stereo,

and rgb-d cameras. IEEE Transactions on Robotics,

33:1255–1262.

Ni, J., Wang, X., Gong, T., and Xie, Y. (2022). An improved

adaptive orb-slam method for monocular vision robot

under dynamic environments. International Journal

of Machine Learning and Cybernetics, 13(12):3821–

3836.

Ran, T., Yuan, L., and Zhang, J. (2021). Scene perception

based visual navigation of mobile robot in indoor en-

vironment. ISA transactions, 109:389–400.

Rublee, E., Rabaud, V., Konolige, K., and Bradski, G.

(2011). Orb: An efﬁcient alternative to sift or surf.

In 2011 International conference on computer vision,

pages 2564–2571. Ieee.

Serrata, A. A. J., Yang, S., and Li, R. (2017). An intelli-

gible implementation of fastslam2.0 on a low-power

embedded architecture. EURASIP Journal on Embed-

ded Systems, 2017:1–11.

Skoltech Robotics (2024). RPLidar Python Module Repos-

itory. https://github.com/SkoltechRobotics/rplidar.

Accessed on: 01/31/24.

Soares, J., Gattass, M., and Meggiolaro, M. (2022). Ma-

peamento e localizac¸

ao simult

aneos em ambientes

din

amicos usando detecc¸

ao de pessoas. In Anais Es-

tendidos do XIV Simp

osio Brasileiro de Rob

otica e

XIX Simp

osio Latino-Americano de Rob

otica, pages

109–120, Porto Alegre, RS, Brasil. SBC.

Son, D. T., The, A. M., Duong, T. D., Chuong, L. V.,

Cuong, T. H., and Phuong, H. S. (2021). The practice

of mapping-based navigation system for indoor robot

with rplidar and raspberry pi. 2021 International Con-

ference on System Science and Engineering (ICSSE),

pages 279–282.

Toroslu, I. and Do

gan, M. (2018). Effective sensor fusion

of a mobile robot for slam implementation. 2018 4th

International Conference on Control, Automation and

Robotics (ICCAR), pages 76–81.

Culjak, I., Abram, D., Pribani

c, T., D

zapo, H., and Cifrek,

M. (2012). A brief introduction to opencv. 2012

Proceedings of the 35th International Convention

MIPRO, pages 1725–1730.

an, P., Kral, L., Kulich, M., and Preu’il, L. (1999).

Open control architecture for mobile robot. IFAC Pro-

ceedings Volumes, 32:8434–8439.

Weiss, U. and Biber, P. (2011). Plant detection and map-

ping for agricultural robots using a 3d lidar sensor.

Robotics and autonomous systems, 59(5):265–273.

Zong, W., Chen, L., Zhang, C., Wang, Z., and Chen, Q.

(2017). Vehicle model based visual-tag monocular

orb-slam. 2017 IEEE International Conference on

Systems, Man, and Cybernetics (SMC), pages 1441–

1446.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

954