Deep Learning for Active Robotic Perception

Nikolaos Passalis

, Pavlos Tosidis, Theodoros Manousis and Anastasios Tefas

Computational Intelligence and Deep Learning Group, AIIA Lab.,

Department of Informatics, Aristotle University of Thessaloniki, Greece

Keywords:

Active Perception, Deep Learning, Active Vision, Active Robotic Perception.

Abstract:

Deep Learning (DL) has brought signiﬁcant advancements in recent years, greatly enhancing various challeng-

ing computer vision tasks. These tasks include but are not limited to object detection and recognition, scene

segmentation, and face recognition, among others. DL’s advanced perception capabilities have also paved

the way for powerful tools in the realm of robotics, resulting in remarkable applications such as autonomous

vehicles, drones, and robots capable of seamless interaction with humans, such as collaborative manufactur-

ing. However, despite these remarkable achievements in DL within these domains, a signiﬁcant limitation

persists: most existing methods adhere to a static inference paradigm inherited from traditional computer vi-

sion pipelines. Indeed, DL models typically perform inference on a ﬁxed and static input, ignoring the fact

that robots possess the capability to interact with their environment to gain a better understanding of their

surroundings. This process, known as ”active perception”, closely mirrors how humans and various animals

interact and comprehend their environment. For instance, humans tend to examine objects from different an-

gles, when being uncertain, while some animals have specialized muscles that allow them to orient their ears

towards the source of an auditory signal. Active perception offers numerous advantages, enhancing both the

accuracy and efﬁciency of the perception process. However, incorporating deep learning and active perception

in robotics also comes with several challenges, e.g., the training process often requires interactive simulation

environments and dictates the use of more advanced approaches, such as deep reinforcement learning, the de-

ployment pipelines should be appropriately modiﬁed to enable control within the perception algorithms, etc.

In this paper, we will go through recent breakthroughs in deep learning that facilitate active perception across

various robotics applications, as well as provide key application examples. These applications span from face

recognition and pose estimation to object detection and real-time high-resolution analysis.

1 INTRODUCTION

In recent years, Deep Learning (DL) has led to signif-

icant advancements in a range of challenging com-

puter vision and robotic perception tasks (LeCun

et al., 2015). These tasks encompass but are not re-

stricted to object detection and recognition (Redmon

et al., 2016), scene segmentation (Badrinarayanan

et al., 2017), and face recognition (Wen et al., 2016),

among others. DL’s sophisticated perception capabil-

ities have also yielded potent tools for diverse robotics

applications, resulting in the emergence of impres-

sive use cases, such as self-driving vehicles (Bojarski

et al., 2016), unmanned aerial vehicles (drones) (Pas-

salis et al., 2018), and robots capable of seamless in-

teraction with humans, notably in collaborative man-

https://orcid.org/0000-0003-1177-9139

https://orcid.org/0000-0003-1288-3667

ufacturing scenarios (Liu et al., 2019).

Despite the recent accomplishments of DL in

these domains, a notable limitation plagues most ex-

isting approaches since they adhere to a static infer-

ence paradigm, which follows the traditional com-

puter vision pipeline. Therefore, DL models perform

inference on a ﬁxed and static input, ignoring the abil-

ity of robots, as well as cyber-physical systems (Li,

2018; Loukas et al., 2017), to interact with their en-

vironment in order to enhance their perception. For

example, we can consider the task of face recogni-

tion, where a robot captures a suboptimal proﬁle view

of a subject to be recognized. A conventional static

perception-based DL model may struggle to identify

the subject from a speciﬁc angle, particularly if it

lacks training on proﬁle face images for such angles.

However, it is often feasible for the robot to attain a

better and more distinguishing view by adjusting its

position relative to the human subject. Consequently,

Passalis, N., Tosidis, P., Manousis, T. and Tefas, A.

Deep Learning for Active Robotic Perception.

DOI: 10.5220/0012295800003595

In Proceedings of the 15th International Joint Conference on Computational Intelligence (IJCCI 2023), pages 15-21

ISBN: 978-989-758-674-3; ISSN: 2184-3236

in such scenarios, the same DL model will likely suc-

ceed in recognizing the subject after the robot repo-

sitions itself for a more suitable angle. This method-

ology, known as active perception (Aloimonos, 2013;

Bajcsy et al., 2018; Shen and How, 2019), enables

the manipulation of the robot or sensor to obtain a

clearer and more informative view or signal, ulti-

mately enhancing the perception capabilities and sit-

uational awareness of robotic systems. Note that this

process closely mirrors how humans and various ani-

mals engage with and percept their surroundings. For

instance, humans tend to explore different perspec-

tives when processing complex visual stimuli, while

many mammals possess specialized ear muscles that

pivot their ears toward the source of an auditory sig-

nal in order to acquire a clearer version of the sig-

nal (Heffner and Heffner, 1992).

A number of recent, although relatively basic, ap-

proaches have illustrated that active perception can

indeed enhance the perceptual capabilities of various

models. For example, works such as (Ammirato

et al., 2017) and (Passalis and Tefas, 2020), demon-

strated that developing a deep learning system that

predicts the next best move for a robot can signif-

icantly improve the accuracy of various perception

tasks, such as object detection and face recognition,

where the viewing angle, occlusions and the scale of

each object can have a signiﬁcant effect on the percep-

tion accuracy. Similar ﬁndings have also been doc-

umented in more recent research spanning a variety

of domains (Han et al., 2019; Tosidis et al., 2022;

Kakaletsis and Nikolaidis, 2023). It is also impor-

tant to emphasize that active perception methodolo-

gies can also enable the development of less compu-

tationally intensive deep learning models. This oc-

curs because these models are trained to address a

less complex problem. For example, in (Passalis and

Tefas, 2020), it is demonstrated that more lightweight

face recognition models can be used when DL mod-

els can actively interact with the environment in order

to acquire a more informative frontal view of the sub-

jects.

However, training active perception models dif-

fers signiﬁcantly compared to traditional static per-

ception approaches, since models must learn also the

dynamics of the perception process in order to pro-

vide control feedback. For example, an active face

recognition model should also learn how perception

accuracy varies as the robot moves around a subject,

as well as the direction in which a robot should move

in order to improve the accuracy of face recognition.

Therefore, it becomes clear that training active per-

ception models introduces additional challenges, both

with respect to acquiring the necessary data for train-

ing, as well as for extending the traditional (usually

supervised) learning pipelines to support such setups.

The main aim of this paper is to introduce the main

active perception approaches used for training DL-

based active perception models for different applica-

tions. To this end, we will ﬁrst present and discuss

the different options for acquiring the necessary data

used for training active perception models. Next, we

will present different training approaches that extend

traditional supervised learning methods for active per-

ception or employ reinforcement learning methods

to provide active perception feedback. Finally, we

will discuss applications in various ﬁelds related to

robotics, as well as discuss implications and practical

issues.

The rest of this paper is structured as follows.

First, in Section 2 we present the different method-

ologies for acquiring data for active perception, while

in Section 3 we present different learning approaches

that are used for training active perception DL mod-

els. Then, in Section 4, we provide an overview of

applications for different perception applications. Fi-

nally, Section 5 concludes this paper.

2 DATA FOR ACTIVE

PERCEPTION

As discussed in Section 1, training active perception

models requires a shift from traditional static percep-

tion methods, presenting a distinctive challenge. This

distinction arises from the necessity for active percep-

tion models to not only grasp the static aspects of ob-

ject recognition but also to encompass the dynamics

inherent in the perception process, allowing them to

generate control feedback. For instance, when con-

sidering an active face recognition model, the model

should acquire knowledge concerning the optimal di-

rection in which the robot should navigate to enhance

the accuracy of face recognition. Unfortunately, a no-

table constraint emerges as a signiﬁcant portion of

the available datasets does not inherently facilitate the

training of models for active perception tasks. Current

literature can be roughly categorized into three dis-

tinct methodologies that can be used for getting data

suitable for active perception: a) simulation-based

training, b) multi-view dataset-based training, and c)

on-demand (synthetic) data generation. An overview

of the different approaches, among with beneﬁts and

drawbacks, is provided in Table 1.

Ideally, an active perception model would learn

as it interacts with its environment. However, get-

ting ground truth data in real-time is typically infea-

sible. Therefore, in most cases, active perception

IJCCI 2023 - 15th International Joint Conference on Computational Intelligence

Table 1: Comparing different approaches that can be used for acquiring data that can be used for training active perception

models.

Approach Beneﬁts Drawbacks Examples

Simulation-based train-

ing

ﬂexible, any movement

can be simulated

computationally-demanding,

sim-to-real gap

(Ginargyros et al., 2023; Tzi-

mas et al., 2020; Tosidis et al.,

2022)

Multi-view dataset-

based training

real data used, no sim-

to-real gap

limited ﬂexibility, limited num-

ber of control actions, missing

data

(Passalis and Tefas, 2020;

Georgiadis et al., 2023)

On-demand (synthetic)

data generation through

manipulation

less susceptible to sim-

to-real gap, faster than

simulation-based train-

ing

less accurate simulation of con-

trol actions, perception dynam-

ics might not be accurately

modeled

(Dimaridou et al., 2023;

Passalis and Tefas, 2021;

Kakaletsis and Nikolaidis,

2023; Manousis et al., 2023;

Bozinis et al., 2021)

models are trained in an ofﬂine fashion. The ﬁrst

category of methods employs realistic simulation en-

vironments, e.g., as in (Tosidis et al., 2022; Ginar-

gyros et al., 2023), in order to simulate the effect

of various movements and allow the agent to learn

how perception accuracy varies when performing dif-

ferent actions. This approach provides great ﬂexi-

bility since any action can be simulated and the ef-

fect of the movement of a robot can be easily ob-

tained. However, such approaches are computation-

ally demanding, since they rely on realistic simula-

tion environments and graphics engines, such as We-

bots (Michel, 2004) and Unity (Haas, 2014), slow-

ing down the training process. Furthermore, these

approaches are also hindered by the so-called “sim-

to-real” gap (Zhao et al., 2020), since the agents are

trained using data generated by a simulator.

The second category of approaches, called “multi-

view dataset-based training” in this paper, employs

datasets that contain multiple views of the same

scene. In this way, the effect of various movements

can be quantiﬁed by fetching the view that would

correspond to the result of the said movement. For

example, in (Passalis and Tefas, 2020), the multiple

views around a person are used to simulate the effect

of an agent moving around, enabling training active

perception models that learn how to maximize face

recognition accuracy. Such approaches can overcome

the issues of computational complexity and the “sim-

to-real” gap. However, they are often too restrictive,

since the datasets should already contain the images

that can be used for every possible action an agent

can perform. This often leads to huge datasets, as well

as to agents that can be trained for a limited number

of control actions. Furthermore, such approaches of-

ten have to handle missing data, since, in many cases,

there are missing data in the corresponding multi-

view datasets.

Then, methods that generate “on-demand” data

have also been proposed. Such approaches can try

to simulate the effect of various movements start-

ing from real data and then appropriately manipulat-

ing the data, e.g., simulating occlusions (Dimaridou

et al., 2023). Another approach is to generate multi-

ple views that can then be used either for deciding the

best course of action or training the agent (Kakaletsis

and Nikolaidis, 2023). These approaches fall in be-

tween simulation-based and multi-view dataset-based

approaches since they employ real images that have

been appropriately manipulated to simulate the effect

of active perception. Therefore, even though they are

less susceptible to the sim-to-real gap and they are

typically faster, they often provide less accuracy in

simulating the effect of active perception feedback,

leading to models that might fail to capture all details

of the dynamics of the active perception process.

3 LEARNING METHODOLOGIES

FOR ACTIVE PERCEPTION

Training active perception models also departs from

the typical supervised learning approach that is fol-

lowed in many perception applications, such as face

recognition (Wen et al., 2016), object detection (Red-

mon et al., 2016) and pose estimation (Zheng et al.,

2023). Active perception models should not only

analyze and understand their input but also provide

some kind of control feedback, that can be then sub-

sequently used for improving perception accuracy.

Therefore, they tend to incorporate elements typi-

cally found in planning (Sun et al., 2021) and con-

trol (Tsounis et al., 2020) approaches used in robotics

applications. The degree to which such elements

are part of each model depends on the speciﬁc ap-

plication requirements. In recent literature, two ap-

Deep Learning for Active Robotic Perception

Table 2: Comparing different learning paradigms that can be used training active perception models.

Approach Beneﬁts Drawbacks Examples

Deep Reinforcement

Learning

directly optimizes the

active perception model

slow convergence, low sample

efﬁciency, (usually) requires

simulation environments

(Bozinis et al., 2021; Tzimas

et al., 2020; Tosidis et al., 2022)

Supervised Learning can work with any

kind of data, easier and

faster to train

requires carefully design

heuristics to construct ground

truth data

(Passalis and Tefas, 2020; Gi-

nargyros et al., 2023; Dimari-

dou et al., 2023; Manousis

et al., 2023)

proaches are prevalent: a) deep reinforcement learn-

ing (DRL)-based training and b) supervised training

through carefully designed ground truth. An overview

of these two different approaches, along with beneﬁts

and drawbacks, is provided in Table 2.

DRL has achieved remarkable progress in recent

years, providing beyond human performance in many

cases (Mnih et al., 2013). Such approaches naturally

ﬁt active perception, since they enable models to learn

how to provide control feedback to maximize percep-

tion accuracy through the interaction with an environ-

ment. Such approaches almost exclusively require

the use of a simulation environment to be trained.

Even though DRL methods enable discovering com-

plex policies that can directly optimize the objective

at hand, i.e., perception accuracy, they suffer from

low sample efﬁciency, long training times, and un-

stable convergence (Buckman et al., 2018). On the

other hand, the supervised method typically follows

an “imitation” learning training paradigm (Hua et al.,

2021), where the best actions to be performed are

found through an extensive search in the action space.

This is better understood with the following example.

A DRL-agent training to perform active face recogni-

tion, e.g., (Tosidis et al., 2022), would learn using the

reward signal from the environment, e.g., conﬁdence

in correctly recognizing a person. On the other hand,

a supervised approach, such as (Passalis and Tefas,

2020), would ﬁrst require simulating the effect of

various movements/actions and then provide ground

truth data on which action should the agent perform at

each step. This also enables supervised approaches to

work with any kind of data available, since the actions

to be evaluated can be dictated by the capacity of the

dataset to support the corresponding action. There-

fore, even though supervised approaches can provide

more stable and faster convergence and typically do

not require a complex simulation environment, they

rely on hand-crafted heuristic-based approaches to

constructing the ground truth data.

4 ACTIVE PERCEPTION FOR

ROBOTIC APPLICATIONS

Several recent active perception approaches have

been proposed for a variety of different applications.

In the rest of this Section, we brieﬂy overview meth-

ods proposed for different applications, as well as

discuss practical issues that often arise in robotics.

Among the most prominent applications of active per-

ception is face recognition. Indeed, early DL-based

approaches extended embedding-based active percep-

tion methods into active ones by including an addi-

tional head that predicts the next best movement that

a robot should perform in order to increase face recog-

nition conﬁdence (Passalis and Tefas, 2020). This ap-

proach assumes that the robot moves on a predeﬁned

trajectory around the target in order to be compati-

ble with the multi-view dataset employed. Then, the

model is trained to both maximize face recognition

conﬁdence, following a constrastive learning objec-

tive, as well as to regress the direction of movement

leading to the best face recognition accuracy. Note

that this direction is calculated by leveraging the mul-

tiple views available in the dataset and then selecting

the one that maximizes the conﬁdence for the next

active perception step. The experimental evaluation

demonstrated the effectiveness of this approach over

static perception for a variety of different active per-

ception steps. However, this approach used a dataset

with a small number of individuals and a relatively

small number of possible control movements. Later

methods, such as (Dimaridou et al., 2023), build upon

this approach by a) simulating the effect of various oc-

clusions on large-scale face recognition datasets, and

b) regressing both the direction and distance the robot

should move.

A simple DRL approach for training a DRL agent

to perform drone control in order to acquire frontal

views that can be used for face recognition was ini-

tially proposed in (Tzimas et al., 2020), highlighting

the potential of DRL methods for active perception

tasks. A more sophisticated approach was also re-

IJCCI 2023 - 15th International Joint Conference on Computational Intelligence

cently proposed building upon DRL in (Tosidis et al.,

2022). This approach leverages a realistic simulation

environment, built using Webots (Michel, 2004), and

directly trains a DRL agent to perform control in a

drone that ﬂies around humans in order to maximize

face recognition conﬁdence. The experimental eval-

uation demonstrated that the trained agent was able

to perform control in a variety of different situations.

However, the sim-to-real gap remains with DRL ap-

proaches, which can be a limiting factor in directly

applying such approaches in real applications.

A supervised approach was also proposed for ob-

ject detection in (Ginargyros et al., 2023), where a

rich dataset for potential movements was built us-

ing a simulation environment. This approach enabled

the models to learn the object detection conﬁdence

manifold for different types of objects, e.g., cars and

humans, while taking into account possible occlu-

sions, allowing them to perform control tailored to the

unique characteristics of different cases. To this end, a

separate navigation proposal network was trained ac-

cording to the conﬁdence manifold of each object, en-

abling the model to learn to propose trajectories that

will maximize object detection conﬁdence. At the

same time, this paper also revealed limitations that

are often intrinsic to the current state-of-the-art ob-

ject detection models, since it provided a structured

approach for revealing the conﬁdence manifold of ob-

ject detectors. A dataset that can support active vision

for object detection was also proposed in (Ammirato

et al., 2017), and a DRL agent was also trained and

evaluated. This paper demonstrated that it is possible,

given the appropriate dataset and annotations, to di-

rectly train DRL agents to perform control for active

vision tasks.

Another line of research focuses on performing

virtual control, i.e., not physically altering the posi-

tion of a robot or the parameters of a physical sen-

sor, but rather selectively analyzing speciﬁc parts of

the input in order to improve perception accuracy,

while reducing the computational load. Such ap-

proaches can be especially useful in cases where high-

resolution input images must be analyzed, while the

object of interest lies only in a small area within the

input. An especially promising approach was pre-

sented in (Manousis et al., 2023), where the heat

map extracted from a low-resolution version of a

high-resolution image was used to drive the percep-

tion process. To this end, the proposed method ﬁrst

identiﬁed a region of interest in the original image

by looking for potential activations (i.e., parts where

the DL model detects something, but not necessar-

ily with high conﬁdence), in a low-resolution ver-

sion of the input, and then performed targeted crop-

Figure 1: Active perception outputs can be represented in

a homogeneous way using an application agnostic control

speciﬁcation deﬁned by OpenDR (Passalis et al., 2022).

ping into the high-resolution image in order to select

the area that needs to be analyzed. The experimen-

tal evaluation demonstrated that signiﬁcant accuracy

and speed improvements can be acquired using the

proposed method. However, a limitation of such ap-

proaches is that as the size of the region of interest

grows, the performance beneﬁt obtained using active

perception is becoming smaller. It is worth noting that

such approaches can be also easily adjusted to per-

form control of the parameters of a camera, e.g., phys-

ical zoom, in order to acquire signals that are easier to

analyze.

Another signiﬁcant issue when implementing ac-

tive perception models is the existence of a common

way of expressing the outcomes of active perception.

This is especially important, since in many robotics

systems, different models might be employed for dif-

ferent perception tasks. Having to handle a com-

pletely different form of output for different models

signiﬁcantly complicates the development process.

OpenDR toolkit (Passalis et al., 2022) has provided

a common application agnostic control speciﬁcation

for standardizing such active perception outputs. This

speciﬁcation ensures that algorithms designed for ac-

tive perception can effectively process the result. To

this end, four control axes have been identiﬁed, as

shown in Fig. 1. For all axes, it is assumed that the

robot moves in a sphere and a real value from −1 to

1 is provided for the movement on each axis. Using

this way of expressing the output of active percep-

tion approaches holds the credentials for simplifying

the development of active perception-enable robotics

systems, by enabling the the efﬁcient re-use of com-

ponents related to handling and executing the feed-

back provided by active perception algorithms.

Deep Learning for Active Robotic Perception

5 CONCLUSIONS

DL has revolutionized computer vision and robotics

by enabling remarkable advancements in perception

tasks. However, as discussed in this paper, a signif-

icant limitation persists in many existing DL-based

systems: the static inference paradigm. Most DL

models operate on ﬁxed, static inputs, neglecting the

potential beneﬁts of active perception – a process that

mimics how humans and certain animals interact with

their environment to better understand it. Active per-

ception offers advantages in terms of accuracy and ef-

ﬁciency, making it a crucial area of exploration for

enhancing robotic perception. While the incorpora-

tion of deep learning and active perception in robotics

presents numerous opportunities, it also poses sev-

eral challenges. Training often necessitates interac-

tive simulation environments and more advanced ap-

proaches like deep reinforcement learning. Moreover,

deployment pipelines need to be adapted to enable

control within perception algorithms. These chal-

lenges highlight the importance of ongoing research

and development in this ﬁeld.

ACKNOWLEDGMENTS

This work was supported by the European Union’s

Horizon 2020 Research and Innovation Program

(OpenDR) under Grant 871449. This publication re-

ﬂects the authors’ views only. The European Com-

mission is not responsible for any use that may be

made of the information it contains.

REFERENCES

Aloimonos, Y. (2013). Active perception. Psychology Press.

Ammirato, P., Poirson, P., Park, E., Ko

seck

a, J., and Berg,

A. C. (2017). A dataset for developing and bench-

marking active vision. In Proceedings of the IEEE In-

ternational Conference on Robotics and Automation,

pages 1378–1385.

Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017).

Segnet: A deep convolutional encoder-decoder ar-

chitecture for image segmentation. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

39(12):2481–2495.

Bajcsy, R., Aloimonos, Y., and Tsotsos, J. K. (2018).

Revisiting active perception. Autonomous Robots,

42(2):177–196.

Bojarski, M., Del Testa, D., Dworakowski, D., Firner,

B., Flepp, B., Goyal, P., Jackel, L. D., Monfort,

M., Muller, U., Zhang, J., et al. (2016). End to

end learning for self-driving cars. arXiv preprint

arXiv:1604.07316.

Bozinis, T., Passalis, N., and Tefas, A. (2021). Improv-

ing visual question answering using active perception

on static images. In Proceedings of the International

Conference on Pattern Recognition, pages 879–884.

Buckman, J., Hafner, D., Tucker, G., Brevdo, E., and Lee,

H. (2018). Sample-efﬁcient reinforcement learning

with stochastic ensemble value expansion. Proceed-

ings of the Advances in Neural Information Process-

ing Systems, 31.

Dimaridou, V., Passalis, N., and Tefas, A. (2023). Deep ac-

tive robotic perception for improving face recognition

under occlusions. In Proceedings of the IEEE Sympo-

sium Series on Computational Intelligence (accepted),

page 1.

Georgiadis, C., Passalis, N., and Nikolaidis, N. (2023). Ac-

tiveface: A synthetic active perception dataset for face

recognition. In Proceedings of the International Work-

shop on Multimedia Signal Processing (accepted),

page 1.

Ginargyros, S., Passalis, N., and Tefas, A. (2023). Deep ac-

tive perception for object detection using navigation

proposals. In Proceedings of the IEEE Symposium Se-

ries on Computational Intelligence (accepted), page 1.

Haas, J. K. (2014). A history of the unity game engine.

Han, X., Liu, H., Sun, F., and Zhang, X. (2019). Active

object detection with multistep action prediction us-

ing deep q-network. IEEE Transactions on Industrial

Informatics, 15(6):3723–3731.

Heffner, R. S. and Heffner, H. E. (1992). Evolution of sound

localization in mammals. In The evolutionary biology

of hearing, pages 691–715.

Hua, J., Zeng, L., Li, G., and Ju, Z. (2021). Learning for

a robot: Deep reinforcement learning, imitation learn-

ing, transfer learning. Sensors, 21(4):1278.

Kakaletsis, E. and Nikolaidis, N. (2023). Using synthesized

facial views for active face recognition. Machine Vi-

sion and Applications, 34(4):62.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-

ing. Nature, 521(7553):436–444.

Li, J.-h. (2018). Cyber security meets artiﬁcial intelligence:

a survey. Frontiers of Information Technology & Elec-

tronic Engineering, 19(12):1462–1474.

Liu, Q., Liu, Z., Xu, W., Tang, Q., Zhou, Z., and Pham, D. T.

(2019). Human-robot collaboration in disassembly for

sustainable manufacturing. International Journal of

Production Research, 57(12):4027–4044.

Loukas, G., Vuong, T., Heartﬁeld, R., Sakellari, G., Yoon,

Y., and Gan, D. (2017). Cloud-based cyber-physical

intrusion detection for vehicles using deep learning.

IEEE Access, 6:3491–3508.

Manousis, T., Passalis, N., and Tefas, A. (2023). Enabling

high-resolution pose estimation in real time using ac-

tive perception. In Proceedings of the IEEE Interna-

tional Conference on Image Processing, pages 2425–

2429.

Michel, O. (2004). Cyberbotics ltd. webots™: professional

mobile robot simulation. International Journal of Ad-

vanced Robotic Systems, 1(1):5.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,

Antonoglou, I., Wierstra, D., and Riedmiller, M.

IJCCI 2023 - 15th International Joint Conference on Computational Intelligence

(2013). Playing atari with deep reinforcement learn-

ing. arXiv preprint arXiv:1312.5602.

Passalis, N., Pedrazzi, S., Babuska, R., Burgard, W., Dias,

D., Ferro, F., Gabbouj, M., Green, O., Iosiﬁdis, A.,

Kayacan, E., et al. (2022). OpenDR: An open toolkit

for enabling high performance, low footprint deep

learning for robotics. In Proceedings of the IEEE/RSJ

International Conference on Intelligent Robots and

Systems, pages 12479–12484.

Passalis, N. and Tefas, A. (2020). Leveraging active percep-

tion for improving embedding-based deep face recog-

nition. In Proceedings of the IEEE International

Workshop on Multimedia Signal Processing, pages 1–

Passalis, N. and Tefas, A. (2021). Pseudo-active vision for

improving deep visual perception through neural sen-

sory reﬁnement. In Proceedings of the IEEE Interna-

tional Conference on Image Processing, pages 2763–

2767.

Passalis, N., Tefas, A., and Pitas, I. (2018). Efﬁcient cam-

era control using 2d visual information for unmanned

aerial vehicle-based cinematography. In Proceedings

of the IEEE International Symposium on Circuits and

Systems, pages 1–5.

Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.

(2016). You only look once: Uniﬁed, real-time object

detection. In Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition, pages 779–

788.

Shen, M. and How, J. P. (2019). Active perception in ad-

versarial scenarios using maximum entropy deep re-

inforcement learning. In Proceedings of the Interna-

tional Conference on Robotics and Automation, pages

3384–3390. IEEE.

Sun, H., Zhang, W., Yu, R., and Zhang, Y. (2021). Motion

planning for mobile robots—focusing on deep rein-

forcement learning: A systematic review. IEEE Ac-

cess, 9:69061–69081.

Tosidis, P., Passalis, N., and Tefas, A. (2022). Active vi-

sion control policies for face recognition using deep

reinforcement learning. In Proceedings of the 30th

European Signal Processing Conference, pages 1087–

1091.

Tsounis, V., Alge, M., Lee, J., Farshidian, F., and Hut-

ter, M. (2020). Deepgait: Planning and control of

quadrupedal gaits using deep reinforcement learning.

IEEE Robotics and Automation Letters, 5(2):3699–

3706.

Tzimas, A., Passalis, N., and Tefas, A. (2020). Leveraging

deep reinforcement learning for active shooting under

open-world setting. In Proceedings of the IEEE Inter-

national Conference on Multimedia and Expo, pages

1–6.

Wen, Y., Zhang, K., Li, Z., and Qiao, Y. (2016). A discrim-

inative feature learning approach for deep face recog-

nition. In Proceedings of the European Conference on

Computer Vision, pages 499–515.

Zhao, W., Queralta, J. P., and Westerlund, T. (2020). Sim-

to-real transfer in deep reinforcement learning for

robotics: a survey. In Proceedings of the IEEE Sym-

posium Series on Computational Intelligence, pages

737–744.

Zheng, C., Wu, W., Chen, C., Yang, T., Zhu, S., Shen, J.,

Kehtarnavaz, N., and Shah, M. (2023). Deep learning-

based human pose estimation: A survey. ACM Com-

puting Surveys, 56(1):1–37.

Deep Learning for Active Robotic Perception