Deep Driving with Additional Guided Inputs for Crossings in Pedestrian

Zones

Qazi Hamza Jan, Jan Markus Arnold Kleen and Karsten Berns

Robotic Research Lab, Department of Computer Science, Reinland Pf

alzische-Technische Universit

at,

Erwin-Schr

odinger-Straße 52, 67663 Kaiserslautern, Germany

Keywords:

Autonomous Driving, Deep Driving, Pedestrian Zones, Simulation, Neural Networks, Directional Inputs.

Abstract:

Deep Neural Networks are being used in different applications to solve complex tasks with high precision. One

application, also the focus of this paper, is end-to-end driving. Generally, in an end-to-end approach, a neural

network learns to directly feed values to actuators based on sensor inputs. This paper uses an End-to-end

approach with images and additional direction inputs:left, right and straight for imposing a certain direction

at unstructured and arbitrary intersections of pedestrian zones. Expecting high precision for predicted steering

in pedestrian zones could be uncertain due to the atypical structures of intersections. Findings for increased

accuracy are done using direction inputs with three variants of two approaches: Single and parallel model.

Depth information was included to overcome shadow problems from RGB in simulation, but it resulted in

worsening the drive, and hence removed in further experiments. The experiments are performed in simulation

to verify the utility of the proposed approaches and narrow down the best models for actual hardware. From

the experiments, it is seen that parallel model with front images have performed best. The model drove well

along the paths and followed the given input direction from the user at the crossings. To maintain the length

of this paper, only results for parallel structures are discussed.

1 INTRODUCTION

Automotive industries are working towards safer, reli-

able and human-like driving. Over the past few years,

various aspects of autonomous driving are progress-

ing (Parekh et al., 2022). Researchers beneﬁt from

Machine Learning (ML) algorithms for object detec-

tion (Tian et al., 2019; Erhan et al., 2014), semantic

segmentation (Garcia-Garcia et al., 2017; Yu et al.,

2018), vehicle control (Kuutti et al., 2020) etc. To

take advantage of such encouraging results, this work

uses an End-to-End deep neural network for driving

autonomous vehicles in pedestrian zones. A pedes-

trian zone, in semantic perspective, offer a high vari-

ation in its architecture. It consists of unevenly con-

nected paths. One such example with similar charac-

teristics is shown in Figure 1. Autonomous vehicles

in pedestrian zones are getting more attention (Jan

et al., 2020b; Keen et al., 2021; Alvarez et al., 2019;

Li et al., 2021).

This work deals with the challenges of driving an

autonomous vehicle in uneven pathways and follow-

ing a given direction at a high degree of varying cross

sections. For initial validation of the technique pro-

posed for this work, experiments are performed in a

Figure 1: Top view of a small area of RPTU campus. The

left image is a simulated model of the campus in the right

image. The blue line shows the line sights the test drive of

the model trained on the entire campus.

402

Jan, Q., Kleen, J. and Berns, K.

Deep Driving with Additional Guided Inputs for Crossings in Pedestrian Zones.

DOI: 10.5220/0011999600003479

In Proceedings of the 9th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2023), pages 402-409

ISBN: 978-989-758-652-1; ISSN: 2184-495X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

simulation environment shown on the left of Figure 1,

a replica of the real environment on the right. The

environment is taken from our previous work in (Jan

et al., 2020a; Jan et al., 2021). The gray texture in the

simulation is the walking and driving zone connecting

buildings within the campus.

End-to-end deep driving (Tampuu et al., 2020;

Pomerleau, 1988; Muller et al., 2005; Bojarski et al.,

2016; Codevilla et al., 2018), also known as behav-

ior reﬂex, methodology processes the input sensory

data directly and generates actuator values. Major-

ity of researchers use RGB cameras as input (Kaur

et al., 2021; Parekh et al., 2022; Toromanoff et al.,

2018) and control the steering and velocity of the ve-

hicle. Driving in an urban street environment has the

simplicity of clear markings and lanes with ﬁxed ge-

ometric crossings. On the contrary, pedestrian zones

not only offer anfractuous pathways but also irregular

crossways.

To solve the problem of selecting the desig-

nated path for the goal at intersections, authors have

used route planners such as TomTom and OSM

maps (Hecker et al., 2018). They claim that using a

route planner gives better prediction of steering angle.

For complex navigation systems, multiple approaches

are used which carry its own syntax of giving direc-

tion to the destination. Such syntax can not always

be integrated in the higher level map. To tackle such

problem, this work implements generic directions:

left, right and straight are applied for signaling at the

intersections. For this work, the type of turning is ig-

nored, for example, sharp turn, fork turn, etc. Since

there are various shapes of crossing, it is not possible

to assign them to a particular category. Hence, ﬁxed

generic commands were chosen based on driver intu-

ition to the crossing angle.

The vehicle used for this work is a type of

minibus, which is mounted with three cameras. The

conﬁguration is explained in Section 2. This study

drew partial inspiration from (Codevilla et al., 2018).

To realize the effect of different variables in the train-

ing process, multiple inputs are used: three cameras

(RGB and depth images), speed, direction, steering

values and throttle. Details are given in Section 2.

For this work, two types of approaches: single and

parallel models are exploited with three variants based

on number of cameras and the use of velocity input.

These approaches are explained in Section 3.

Note. All the variants include direction inputs along

with the combination of other inputs. Due to unavail-

ability of such data with particular inputs, data gath-

ering process is also part of this work discussed in

the next section. Detailed experiments are given in

section 4. From the aforementioned discussion, the

novelty of this work can easily be articulated as fol-

lowing:

• Data collection

• Model creation

• Examining the effect of single and parallel models

including the combination of all the input variants.

• Interpreting the effect of shadows

• Deciding for best approach

Figure 2: A virtual replica of vehicle and campus in sim-

ulation. The vehicle is a driver-less minibus used in the

campus (Jan and Berns, 2021).

2 DATA COLLECTION

Unreal Engine

(UE), a game development tool, is

used for training and testing models in this work. UE

offers a realistic rendering which supports the use of

visual sensors. The vehicle and RPTU campus were

recreated in the simulation as shown in Figure 2. A

comparison of the virtual and real world can be seen

in Figure 1. The proposed work was conducted in

simulation due to the following reasons:

• Simplicity. Since the neural network requires a

huge amount of data to train the network, it is not

Figure 3: Camera conﬁguration on the bus. It consists of

front, rear-left and rear-right cameras. The conﬁguration is

similar to the on used for the actual robot (Jan et al., 2022).

https://www.unrealengine.com/

Deep Driving with Additional Guided Inputs for Crossings in Pedestrian Zones

403

laborious work, compared to real systems, to do

such kind of initial testing in simulation.

• Accessibility. It was possible to collect data in

locations where a real vehicle can not drive due to

temporary restrictions or constructions.

• Safety. Most concerning aspects of driving in

a pedestrian zone are the pedestrians. In view of

the fact that vehicles and pedestrians have to share

the same space, it becomes critical to have safe

driving. In the real world, such concerns hinder

testing.

• Exploitation. Simulation allows us to simulate

every feature of the environment. Therefore the

validation of reasoning becomes easy.

For data collection, the vehicle was driven by a hu-

man driver with an average speed of 6 km/h (restricted

speed in pedestrian area). The vehicle was equipped

with three cameras, conﬁguration shown in Figure 3.

Using three cameras ampliﬁes the surrounding view

relative to the front of the vehicle. To cater the shadow

problem, additional depth images were taken into ac-

count. Hence, the system is able to drive in varying

light conditions.

Data was collected every 0.5 seconds. It consisted

of the following information:

• RGB images from all the three cameras shown in

Figure 3

• Depth image from the front camera

• Current speed of the vehicle

• Current throttle as a ﬂoating value from -1 to 1

• Three direction commands as integer: 0-straight,

1-left and 2-right

• Timestamp

• Addition comments

During the training of the model, since a human

was driving the vehicle with a certain direction and

average speed; throttle, steering and directions were

recorded directly from the user. To keep track of

countless situations and conditions such as presence

of pedestrians, presence of shadow, type of conﬁgu-

ration, variant kind and so on, were encoded in ad-

ditional comments. 76 driving sequences were per-

formed with more than 30000 frames recorded in to-

tal. Driving itself was based solely on the opera-

tor. The operator steered the vehicle using a joystick.

Speed of the vehicle was usually kept constant to a hu-

man walking speed. In order to give the direction, the

operator was told to press the given key number as-

signed to the respective direction once the intersection

was fully visible. By default, straight-direction was

enabled regardless of size or curve of a standard path.

One particular scene, with possible turning, is over-

laid in Figure 2. The blue arrow shows the driver’s

intuition of a possible driving path at the cross sec-

tion.

Figure 4: The ﬁgure assimilates RGB data to the corre-

sponding depth data. Undoubtedly, the shadow is indis-

cernible in depth image.

(a) Network structure for all images and speed.

(b) Single network structure from above image used sepa-

rately for different directions.

Figure 5: (a) Single Model: Single network structure for

input with all images and speed. (b) Parallel model: The

left, right and straight modules are equivalent to the single

structure as in the top image, except for the command input

which is used to switch between the three models.

3 APPROACH

Unlike single images given as input for classiﬁcation

and detections (Hoiem et al., 2005; Zhu et al., 2014),

the model implemented for this work uses multiple

inputs with varied conﬁguration. One of the objective

was to explore different combinations of inputs and

model type for better results. The division of applied

approaches is given in ﬂow chart of Figure 6. Firstly,

the approach was bifurcated into a single and paral-

VEHITS 2023 - 9th International Conference on Vehicle Technology and Intelligent Transport Systems

404

Figure 6: Block diagram for two variations of model: single and parallel models including the direction inputs with three

variants of input: front camera, all cameras, and all camera with velocity.

lel model. The Basis of such bifurcation was a train-

ing model with direction data. Single structure, as

the name suggests, has one model trained with all the

direction inputs. For parallel model, sub-models con-

sisting of each single structure were tabulated speciﬁc

to one direction input. One model was activated based

on direction input by a switching technique. The de-

sign of both models can be seen in Figure 5. The con-

volution neural networks are modiﬁed to the given in-

puts. After concatenation and ﬂattening, dense layers

are used for output prediction of steering. For clarity,

the variants are segregated in to different color seg-

ments shown in Figure 5a. The green segment shows

input for the ﬁrst variant, the blue shows the inputs for

the second input and maroon color shows the inputs

for all inputs including green and blue segments.

The last layer of Figure 6 gives the type of in-

puts for testing precision. The models were tested

ﬁrst with only the front camera, then with all the three

cameras, and ﬁnally using additional speed input with

all cameras. Using all cameras conﬁrms the effect of

a wide view on driving. Speed was added to check the

impact on steering values.

4 EXPERIMENTS

The experiments were performed in the same campus

environment by giving different directions at cross-

ings and adding other props for reasoning of unusual

behavior. This section explains the prediction error

for all the models and based on the best performance,

the respective model is chosen for further evaluation.

4.1 Prediction Errors

As a ﬁrst step in evaluating the network, model pre-

dictions were compared to the ground truth that was

recorded when gathering training data. For each

model, a subset of data points are randomly selected

which is compared to human driving (the training set).

Mean Square Error (MSE) can be calculated with the

given data which is demonstrated in Table 1.

Figure 7: Comparison plot of Trajectory for testing and

ground truth.

Figure 8: Relationship between the steering predictions of

Single Front Images model and steering of human driver.

Further investigating from Table 1, single model

with all variants shows high variances in performance

between different directions, the spread can be seen

Deep Driving with Additional Guided Inputs for Crossings in Pedestrian Zones

405

Table 1: Mean Squared Error of predictions and ground-truth.

Model Used Camera Speed Input All Straight Left Right

Single

Front No 0.0181 0.0169 0.0214 0.0229

All No 0.0214 0.0183 0.0185 0.0418

All Yes 0.0290 0.0247 0.0497 0.0385

Parallel

Front No 0.0219 0.0233 0.0237 0.0206

All No 0.0174 0.0168 0.0204 0.0201

All Yes 0.0225 0.0230 0.0213 0.0202

Table 2.

Model Minor

Errors

Major

Errors

Single:Front Image 1 5

Single:All Images 5 4

Single:All images, speed 4 7

Parallel:Front Image 0 0

Parallel:All Images 3 0

Parallel:All images, Speed 2 1

in Figure 8. One reasoning for this could be the un-

equal distribution of the training data. Unequal distri-

bution of data in terms of direction is coherent to the

fact that encountering crossings is seldom than driv-

ing paths between the crossings in pedestrian zones.

Hence, this results in over-ﬁtting of the model for this

direction. On the other hand, parallel models in which

each model is trained for a particular direction, do not

suffer drastically from this problem. The unequal dis-

tribution does not affect the trained model because ev-

ery model is directed to one direction.

The inclusion of speed data in the model reduces

the accuracy of the overall model for both structures.

Incorporating the left and right cameras along with

the front camera pertain to ambiguity in comprehend-

ing the results. Although, using additional cameras

increases the range of view, it also amounts for more

unnecessary information such as the patches enclos-

ing the pedestrian walk. Despite the sub-models of

the parallel structure being trained individually on

less data compared to a single structure, it performs

similarly if not better. This is established in the next

subsection.

4.2 Real-Time Testing

Following the insights of all the models by compari-

son of model predictions with ground truth, it is not

sufﬁcient to claim the credibility of the model in terms

of standard driving. Further performance evaluation

is done by letting the model drive the vehicle in a

simulated environment with the user directing at in-

tersections by giving one of the three direction inputs.

4.2.1 All-Model Testing

The tests were done on different routes of the campus,

but on the account of comparison, a speciﬁc path was

chosen marked with blue in the Figure 1. A trajec-

tory comparison for Parallel model with front image

is given in Figure 7. For quantifying the results, the

term minor and major error was established. Vehicle

going off-path during driving was considered a minor

error, whereas wrong turns or colliding into obstacles

was deﬁned as a major error. The comparison can be

seen in Table 2.

(a) Green dotted line shows the path driven by user during

the training process.

(b) Red dotted line shows the path of driving during the test

phase.

Figure 9: The images show the top view of driving path by

human (top) and model (bottom). The curve in the bottom

image shows the effect of shadow on the network.

VEHITS 2023 - 9th International Conference on Vehicle Technology and Intelligent Transport Systems

406

Figure 10: Effect of shadow on driving. The green arrow

shows the consistent path driven by the user. Red arrow

indicates the path driven by the network.

Figure 11: The attention maps from the model for two sim-

ilar scenarios taken from two convolution layers. One con-

sists of shadow and one without shadow.

4.2.2 Shadows

Shadows were the major problem identiﬁed during

the phase of testing. The network was unable to de-

tect the true path in the presence of a shadow. One

particular example is shown from the top view in Fig-

ure 9. The top image shows the vehicle driven by a

human. Shadows were always ignored when driven

by a human throughout the campus. Strikingly, the

model understood the change in contrast of shadow

and non-shadow region as the boundary of the path.

This is implied by the slight curve of the red dotted

line in Figure 9b. For justiﬁcation of this interpre-

tation, another example with similar context is pre-

sented in Figure 10. Green arrow shows the user

driven pathways, and red arrow is wrongly driven by

the network. To visualize the effect of shadows, atten-

tion map from network layer was overlaid over RGB

image seen in Figure 11. The second row from above

in Figure 11 shows the attention map from a shal-

low layer of the network, while the fourth row shows

the attention map from a deep layer of the network.

The color gradient shows that the system perceives

shadows as boundary regions and does not consider

as drivable region. Also, the red regions enables the

vehicle to move in a straight path.

4.2.3 Depth Images

Researchers have used depth images to tackle shad-

ows in RGB images (Krywolt, 1993; Bi et al., 2022;

Xiao et al., 2014). Depth images are invariant to shad-

ows existing in color images. A clear example is il-

lustrated in Figure 4. The comparison shows that the

depth clearly ignores the disparity caused by shadows.

Initially, it was proposed to use depth images as ad-

ditional input due to aforesaid reasons. Throughout

the tests, the driving was recurrently skewed from the

given pathway. Observing the depth images, it is not

easy to differentiate between pathways and surround-

ing areas. In other words, there exist no clear bound-

aries. Environments, such as used for this work, have

plain grass surrounding the paths. From a depth per-

spective, it is difﬁcult to differentiate between such

regions. Figure 12 extracts the silhouettes of ground

from depth image of Figure 4 to see this relation. For

this scene, the silhouettes slightly rise at the corners.

As a result, the depth images were removed from fur-

ther testing.

Figure 12: Silhouette for ground region from depth image

shown in Figure 4.

4.2.4 Blockage Effect

In a speciﬁc region, there was one instance where the

system experienced difﬁculty in identifying the cor-

rect path and accurately interpreting commands, as

depicted in the ﬁgure 13. Evidently, the system mis-

takenly perceived the large paved area ahead of the

vehicle as a dead end, akin to a parking space, and

consequently interpreted it as the designated route to

drive on when instructed to proceed straight. The en-

vironment in question comprises authentic dead ends

Deep Driving with Additional Guided Inputs for Crossings in Pedestrian Zones

407

that the system is trained to navigate through. No-

tably, the straight command directs the system to fol-

low the path in a manner consistent with human in-

terpretation of following the correct curve. However,

it is important to note that the same command can be

issued by an operator intending to park the vehicle in

this space, which underscores the potential ambigu-

ity of the commands. Thus, there is likely a need for

additional commands to enhance the system’s func-

tionality. The green arrow shows the drivable path for

this region, red arrow shows the predicted path.

Figure 13: The system can not identify the right curve as the

correct choice and drives in a straight line as shown by Red

arrow, colliding with the grass hill if not stopped. Changing

the command to ”right” causes the vehicle to follow the

right curve illustrated by blue arrow, but this should not be

needed as the straight command means following the path.

To explore this issue in greater depth, an obstruc-

tion in the form of a wall was introduced in the front

area. Despite the absence of such a blockage in the

training phase, the system successfully identiﬁed the

correct path and operated in accordance with it. As

depicted in Figure 14, the driving path is indicated by

the blue arrow. One possible explanation for this out-

come is that the system regarded the stairs located on

the opposite side of the drivable area as a viable path

and thus took a middle course. The introduction of

the obstruction enabled the system to clearly discern

a single path to follow.

Figure 14: The same situation as in ﬁgure 13, but the front

and right stair region is blocked off. The system is now able

to identify the right curve (blue arrow) as the correct choice

with straight input.

5 CONCLUSIONS

This paper explores the possibility of using direc-

tions as input to a neural network for crossings in

pedestrian zones. The network was able to follow

the commands correctly and navigate throughout the

path. Based on the number of cameras, speed input

and conﬁguration possibility, different models were

developed and evaluated against each other. From the

ﬁndings, it is suggested that using a parallel approach

(having direction speciﬁc input sub-model) performs

better than a single model trained on all the given

direction inputs. While the results are encouraging,

the performance lag in some conditions such as light-

ing conditions. Depth image, proposed solution for

shadow problem in simulation, did not assist in over-

coming lighting problem. In fact, depth added more

uncertainty. These shortcomings could be resolved

by further training the system in various environments

and situations, especially situations where the vehicle

is required to recover from a suboptimal position.

REFERENCES

Alvarez, W. M., De Miguel, M. A., Garc

ıa, F., and Olaverri-

Monreal, C. (2019). Response of vulnerable road

users to visual information from autonomous vehicles

in shared spaces. In 2019 IEEE Intelligent Transporta-

tion Systems Conference (ITSC), pages 3714–3719.

IEEE.

Bi, X., Yang, S., and Tong, P. (2022). Moving object de-

tection based on fusion of depth information and rgb

features. Sensors, 22(13):4702.

Bojarski, M., Del Testa, D., Dworakowski, D., Firner,

B., Flepp, B., Goyal, P., Jackel, L. D., Monfort,

M., Muller, U., Zhang, J., et al. (2016). End to

end learning for self-driving cars. arXiv preprint

arXiv:1604.07316.

Codevilla, F., M

uller, M., L

opez, A., Koltun, V., and Doso-

vitskiy, A. (2018). End-to-end driving via conditional

imitation learning. In 2018 IEEE international confer-

ence on robotics and automation (ICRA), pages 4693–

4700. IEEE.

Erhan, D., Szegedy, C., Toshev, A., and Anguelov, D.

(2014). Scalable object detection using deep neural

networks. In Proceedings of the IEEE conference on

computer vision and pattern recognition, pages 2147–

2154.

Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-

Martinez, V., and Garcia-Rodriguez, J. (2017). A re-

view on deep learning techniques applied to semantic

segmentation. arXiv preprint arXiv:1704.06857.

Hecker, S., Dai, D., and Van Gool, L. (2018). End-to-end

learning of driving models with surround-view cam-

eras and route planners. In Proceedings of the eu-

ropean conference on computer vision (eccv), pages

435–453.

VEHITS 2023 - 9th International Conference on Vehicle Technology and Intelligent Transport Systems

408

Hoiem, D., Efros, A. A., and Hebert, M. (2005). Geomet-

ric context from a single image. In Tenth IEEE Inter-

national Conference on Computer Vision (ICCV’05)

Volume 1, volume 1, pages 654–661. IEEE.

Jan, Q. H. and Berns, K. (2021). Safety-conﬁguration of

autonomous bus in pedestrian zone. In VEHITS, pages

698–705.

Jan, Q. H., Kleen, J. M. A., and Berns, K. (2020a). Self-

aware pedestrians modeling for testing autonomous

vehicles in simulation. In VEHITS, pages 577–584.

Jan, Q. H., Kleen, J. M. A., and Berns, K. (2021). Sim-

ulated pedestrian modelling for reliable testing of au-

tonomous vehicle in pedestrian zones. In Smart Cities,

Green Technologies, and Intelligent Transport Sys-

tems: 9th International Conference, SMARTGREENS

2020, and 6th International Conference, VEHITS

2020, Prague, Czech Republic, May 2-4, 2020, Re-

vised Selected Papers 9, pages 290–307. Springer.

Jan, Q. H., Klein, S., and Berns, K. (2020b). Safe and

efﬁcient navigation of an autonomous shuttle in a

pedestrian zone. In Advances in Service and Indus-

trial Robotics: Proceedings of the 28th International

Conference on Robotics in Alpe-Adria-Danube Re-

gion (RAAD 2019) 28, pages 267–274. Springer.

Jan, Q. H., Wolf, P., Berns, K., Reich, J., and Wellstein, M.

(2022). Integration of human skeleton posture models

into reaction for realizing dynamic risk management.

Kaur, P., Taghavi, S., Tian, Z., and Shi, W. (2021). A

survey on simulators for testing self-driving cars. In

2021 Fourth International Conference on Connected

and Autonomous Driving (MetroCAD), pages 62–70.

IEEE.

Keen, H. E., Jan, Q. H., and Berns, K. (2021). Drive

on pedestrian walk. tuk campus dataset. In 2021

IEEE/RSJ International Conference on Intelligent

Robots and Systems (IROS), pages 3822–3828. IEEE.

Krywolt, R. W. (1993). Post-Processed Shadow Determi-

nation for Composition of Depth Images. PhD thesis,

University of British Columbia.

Kuutti, S., Bowden, R., Jin, Y., Barber, P., and Fallah, S.

(2020). A survey of deep learning applications to au-

tonomous vehicle control. IEEE Transactions on In-

telligent Transportation Systems, 22(2):712–733.

Li, Y., Cheng, H., Zeng, Z., Liu, H., and Sester, M. (2021).

Autonomous vehicles drive into shared spaces: ehmi

design concept focusing on vulnerable road users. In

2021 IEEE International Intelligent Transportation

Systems Conference (ITSC), pages 1729–1736. IEEE.

Muller, U., Ben, J., Cosatto, E., Flepp, B., and Cun, Y.

(2005). Off-road obstacle avoidance through end-to-

end learning. Advances in neural information process-

ing systems, 18.

Parekh, D., Poddar, N., Rajpurkar, A., Chahal, M., Ku-

mar, N., Joshi, G. P., and Cho, W. (2022). A review

on autonomous vehicles: Progress, methods and chal-

lenges. Electronics, 11(14):2162.

Pomerleau, D. A. (1988). Alvinn: An autonomous land

vehicle in a neural network. Advances in neural infor-

mation processing systems, 1.

Tampuu, A., Matiisen, T., Semikin, M., Fishman, D., and

Muhammad, N. (2020). A survey of end-to-end driv-

ing: Architectures and training methods. IEEE Trans-

actions on Neural Networks and Learning Systems.

Tian, Y., Yang, G., Wang, Z., Wang, H., Li, E., and Liang,

Z. (2019). Apple detection during different growth

stages in orchards using the improved yolo-v3 model.

Computers and electronics in agriculture, 157:417–

426.

Toromanoff, M., Wirbel, E., Wilhelm, F., Vejarano, C., Per-

rotton, X., and Moutarde, F. (2018). End to end ve-

hicle lateral control using a single ﬁsheye camera. In

2018 IEEE/RSJ International Conference on Intelli-

gent Robots and Systems (IROS), pages 3613–3619.

IEEE.

Xiao, Y., Tsougenis, E., and Tang, C.-K. (2014). Shadow

removal from single rgb-d images. In Proceedings of

the IEEE Conference on Computer Vision and Pattern

Recognition, pages 3011–3018.

Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N.

(2018). Bisenet: Bilateral segmentation network for

real-time semantic segmentation. In Proceedings of

the European conference on computer vision (ECCV),

pages 325–341.

Zhu, M., Derpanis, K. G., Yang, Y., Brahmbhatt, S., Zhang,

M., Phillips, C., Lecce, M., and Daniilidis, K. (2014).

Single image 3d object detection and pose estima-

tion for grasping. In 2014 IEEE International Con-

ference on Robotics and Automation (ICRA), pages

3936–3943. IEEE.

Deep Driving with Additional Guided Inputs for Crossings in Pedestrian Zones

409