Bridging the Reality Gap: Investigation of Deep Convolution Neural

Networks Ability to Learn from a Combination of

Real and Synthetic Data

Omar Gamal, Keshavraj Rameshbabu, Mohamed Imran and Hubert Roth

Institute of Automatic Control Engineering, University of Siegen, Hölderlinstraße 3, Siegen, Germany

Keywords: Reality Gap, Domain Transfer, Dataset Scarcity, Artificial Data, Convolution Neural Networks.

Abstract: Recent advances in data-driven approaches especially deep learning and its application on visual imagery

have drawn a lot of attention in recent years. The lack of training data, however, highly affects the model

accuracy and its ability to generalize to unseen scenarios. Simulators are emerging as a promising alternative

source of data, especially for vision-based applications. Nevertheless, they still lack the visual and physical

properties of the real world. Recent works have shown promising approaches to close the reality gap and

transfer the knowledge obtained in simulation to the real world. This paper investigates Convolution Neural

Networks (CNNs) ability to generalize and learn from a mixture of real and synthetic data to overcome dataset

scarcity and domain transfer problems. The evaluation results indicate that the CNN models trained with real

and simulation data generalize to both simulation and real environments. However, models trained with only

real or simulation data fails drastically when it is transferred to an unseen target environment. Furthermore,

the utilization of simulation data has improved model accuracy significantly.

1 INTRODUCTION

Deep Convolution Neural Networks (CNNs) uses

multiple layers to automatically learn the hierarchical

representation from the raw input data. This process,

however, requires a massive amount of training data

which in most cases is difficult to find, e.g. in medical

and education domains. The amount of data required

during training depends mainly on the network

complexity, type of data, existing gaps between data

samples, and whether we are training from scratch or

fine-tuning a pre-trained model (Tkacz, 2005).

The CNN models trained with a low amount of

data are very poor in terms of performance and cannot

generalize to unseen scenarios (Yu and Yali, 2011).

Fine-tuning of pre-trained models and usage of

synthetic data, i.e. data collected from simulators are

promising solutions for dataset scarcity. Simulators

can provide the domain with an infinite amount of

data samples, however, they do not have a perfect

representation of the real world in both visual and

physical properties (Bousmalis and Levine, 2017).

This creates a huge gap between both environments

which is often called the "reality gap". The reality gap

has been addressed by researchers in different works

and there exist several strategies for bridging the gap

between simulation and reality. One of the strategies

adopted is to develop high-quality simulators that

represent the real-world visual and physical

properties as close as possible. Furthermore,

rendering real-world images has shown significant

improvement in performance (Peng, Sun, Ali, and

Saenko, 2014; Stark, Goesele, and Schiele, 2010; Su,

Qi, Li, and Guibas, 2015).

Domain randomization is one of the recent and

promising techniques used to bridge the gap between

simulation and reality. Sadeghi and Levine (2016)

showed that training deep reinforcement learning

algorithms only on simulation data can generalize to

the real world by randomizing the simulation

environment. Similarly, Tobin et al. (2017) showed

that randomizing the simulation environment using

non-realistic textures can generalize a deep neural

network model to the real environment.

Retraining the CNN model from scratch or fine-

tuning a pre-trained model in the unseen target

environment has shown also a significant increase in

performance. For example, Tzeng, Hoffman, Zhang,

Saenko, and Darrell (2014) proposed a new CNN

model architecture with adaption layer and domain

446

Gamal, O., Rameshbabu, K., Imran, M. and Roth, H.

Bridging the Reality Gap: Investigation of Deep Convolution Neural Networks Ability to Learn from a Combination of Real and Synthetic Data.

DOI: 10.5220/0009830804460454

In Proceedings of the 17th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2020), pages 446-454

ISBN: 978-989-758-442-8

confusion loss to automatically learn meaningful and

domain invariant representations. Li, Wang, Shi, Liu,

and Hou (2016) presented an adaptive batch

normalization approach that ease model transfer to an

unseen target domain. A variety of other studies have

shown a significant increase in CNN model

performance when trained in the unseen target

environment (see, e.g., Duan, Xu, and Tsang, 2012;

Hoffman, et al., 2014; Kulis, Saenko, and Darrell,

2011; Yosinski, Clune, Bengio, and Lipson, 2014).

In robotics field, however, domain transfer is

rather difficult to apply, especially when vision

sensors are used to perceive the surrounding

environment. Only a few studies have examined

domain transfer in robotics and showed successful

results (see, e.g., Cutler, Walsh, and How, 2014;

James, Davison, and Johns, 2017; Loquercio, et al.,

2019; Tobin, et al., 2017; Yan, Frosio, Tyree, and

Kautz, 2017; Zhang, Leitner, Milford, and Corke,

2016). Therefore, there exist fewer approaches to

transfer the knowledge obtained in simulation to

reality (Sünderhauf et al., 2018).

To our knowledge, most of the research in the

robotics field employs either simulation or real data

but not a mixture of both. This paper investigates

Convolution Neural Networks (CNNs) ability to learn

from a mixture of real and artificial data when trained

from scratch and fine-tuned to fit the dataset classes

of the respective task. Each CNN model is trained six

times wherein each time a different combination of

the collected real and simulation datasets is used. The

learning ability of the trained CNN models is

evaluated using classification models metrics and

deployment of models frozen inference graphs in both

environments, i.e. real and simulation environments.

The work of Bayraktar, Yigit, and Boyraz (2019)

is the most similar to our idea, however, their focus is

mainly on object detection. The authors proposed

their own dataset ‘‘ADORESet’’ which includes 30

classes with 2500 real plus 750 artificial images per

class. In their experiments, they used the dataset to

fine-tune four different pre-trained models, however,

they did not consider training models from scratch.

Furthermore, they did not deploy the trained models

to real and simulation environments.

2 METHODOLOGY

In this work, we will investigate the ability of CNN

models to generalize and learn from a mixture of real

and artificial data to overcome dataset scarcity and

domain transfer problems. We will examine the

learning ability of CNN models when trained from

scratch and fine-tuned to fit the dataset classes of lane

following task. In lane following, the models are

trained to infer the steering angle and velocity

required to drive a Radio-controlled (RC) car model

kit autonomously on the track.

For dataset collection and inference stages, we

created physical and simulation environments for the

lane following task. The simulation environment is

created using the robot simulator CoppeliaSim and

made to be as close as possible to the physical

environment. To ease the dataset collection in both

environments we used computer vision to define the

centreline coordinates of the track and generate a path

from start to endpoint. We then utilized a geometric

path tracking system developed in an earlier project

(Gamal, Imran, Roth, and Wahrburg, 2020) to follow

the generated path. To localize the vehicle in the

environment the Apriltags ROS wrapper is used.

Figure 1 illustrates the geometric path tracking

system used for lane tracking task.

Figure 1: Geometric path tracking system.

To evaluate the learning ability of the CNN

models six datasets are created wherein four of them

33% of the dataset collected from the real-world

environment is kept fixed based on the assumption

that we have a very limited dataset. In these four

datasets, the number of incorporated simulation data

samples is varied. More specifically, each CNN

model will be trained for six times wherein each time

the dataset is changed as follows; 0% real word data

+ 100% of simulation data, 100% real-world data +

0% simulation data, 33% real-world data + 0%

simulation data, 33% real-world data + 33%

simulation data, 33% real-world data + 66%

simulation data, and finally 33% real-world data +

100% simulation data. To measure the models'

performance and their learning ability classification

models evaluation metrics are used. Furthermore, the

frozen inference graphs of the models trained with all

datasets combinations are deployed on both

environments, i.e. simulation and real-world

environments.

Bridging the Reality Gap: Investigation of Deep Convolution Neural Networks Ability to Learn from a Combination of Real and Synthetic

Data

447

3 SYSTEM SETUP

In this section, we will discuss the procedure followed

to setup the physical and simulation environments for

lane tracking task. Furthermore, we will introduce the

platform used in the real-world environment for

dataset collection and CNN models’ inference.

3.1 Test Platform

The test platform is an RC car model kit with

Ackerman steering mechanism, see Figure 2-b.

Figure 2: (a) Vehicle control system (b) Test platform.

The platform features a DC motor belt drive

system and a servo motor for controlling the steering

angle. The servo motor angle ranges from 0° to 180 °

however it is restricted with Ackerman steering

mechanism range. A 90° steering angle steers the

front wheels to face forward direction, an angle below

90° steers it to the right and above 90° to the left.

Figure 2-a illustrates the vehicle control system where

the low-level controller (Arduino Mega 2560 board)

is used for sensor data collection, motors control, and

bidirectional communication with the high-level

controller. The high-level control is the robot’s

onboard computer (NVIDIA Jetson TX2 developer

kit). It is used for high processing tasks (e.g. CNN

models’ deployment) and to provide an interface with

sensors and hardware components, whose data

transmissions standards are found difficult to be

handled by the Arduino microcontroller.

Furthermore, the car is equipped with a Logitech

C920 webcam for dataset collection and CNN

models’ inference. The camera is placed facing the

heading direction of the vehicle and tilted with an

angle of 70° to keep the focus on the region of

interest, i.e. track lanes.

3.2 Test Environments

As stated earlier two test environments are setup for

dataset collection and CNN models’ inference. These

are real and simulation environments. In the next

subsections, we will discuss the procedures followed

to setup both environments.

3.2.1 Real Environment Setup

To prepare the lane track, we used black carton sheets

and white tape to mark the track lanes. Figure 3

depicts physical environments setup.

Figure 3: Lane tracking environment.

The ROS wrapper of the AprilTag 3 system is

used for localizing the vehicle in the environment

using external Logitech C920 webcam. The pose of

all April tags present in the field of view of the camera

is tracked by the tf ROS package which allows us to

get the transform between any two coordinate frames

in the system transformation tree. The position and

orientation of the detected tag are fed to the geometric

path tracking algorithm to infer the required vehicle

speed and steering angle to drive the vehicle

autonomously on the generated path.

3.2.2 Simulation Environment Setup

The robot simulator CoppeliaSim formally known as

V-REP is used to create the lane tracking simulation

environment. The car model used in the robot

simulator is a simple Ackermann Steering model

provided by CoppeliaSim, which is similar to the test

platform used in this work. A vision camera is placed

on top of the model for dataset collection and CNN

models’ inference.

To add a new environment in CoppeliaSim, a

Unified Robot Description Format (URDF) model of

the environment is required. We used Solidworks to

model the environment and sw_urdf_exporter add-in

to convert the environment model into URDF model

ICINCO 2020 - 17th International Conference on Informatics in Control, Automation and Robotics

448

format. Figures 4 illustrates the lane tracking

simulation environment.

Figure 4: Lane tracking simulation environment.

The CoppeliaSim platform provides dummy

objects to track the position and orientation of the

robot within the simulation environment. In the lane

following simulation environment, we used three

dummy objects to define the vehicle position, start,

and endpoints on the track. To use the same lane

tracking algorithms in the simulation environment we

used CoppeliaSim remote API to establish a

communication channel between CoppeliaSim and

our External client application, i.e. Python scripts.

4 CNN MODELS DESIGN AND

TRAINING

To investigate the learning ability of Convolution

Neural Networks two models are evaluated where the

first is trained from scratch and the second is fine-

tuned to fit the dataset classes of the lane tracking

task. To train these models two datasets are collected;

namely simulation and real-world datasets.

In the next subsections, we will discuss the dataset

collection and CNN model architecture design and

training.

4.1 CNN Model Architecture

The chosen CNN model architecture was designed in

one of our earlier projects related to lane tracking

which makes it an ideal candidate for our study. The

network architecture was inspired by different CNN

model architectures such as the Inception model by

Szegedy et al. (2015) and Network In Network

structure by Lin, Chen, and Yan (2013). Figure 5

illustrates the pre-trained model architecture. The

network model has 126,058 trainable parameters and

takes an input image of size (320 X 240 X 3). It

consists of six convolution layers, four Maxpooling

layers, one dropout layer, flatten layer, and two fully

connected layers. The first convolution layer consists

of 64 kernels of size 11X11 with stride 4X4 and

followed by a Maxpooling layer with a pool size of 2

X 2. The second convolution layer has 128 kernels of

size 7X7 with stride 2X2 and followed by a

Maxpooling layer with a pool size of 2 X 2. The Max-

pooling layer is then followed by two inception

modules with different kernel sizes. These are 5x5,

3x3, and 3X3, 1X1 respectively. Following this, a

Maxpooling layer and one additional inception

module with a kernel size of 1x1 combined with

average pooling. Finally, we have a network in

network and two fully connected layers.

4.2 Dataset Collection and Preparation

In lane following, the steering angle range used is 80°

to 100° and divided into 3 classes. The chosen range

will ensure that the vehicle doesn’t deviate from the

center of the track because of the small field of view

of the camera. Owing to the large weight that the RC

Car carries, the speed of the vehicle is kept fixed at

100% throttle value ‘‘full speed’’ so that the DC

motor is able to drive the vehicle. Table 1 illustrates

the dataset class labels of the lane following task

where F means forward and S stop.

Table 1: Lane tracking dataset class labels.

No. Class Turning Angle Velocity

0 80F100 80 100%

1 90F100 90 100%

2 100F100 100 100%

3 090S000 0 100%

Figure 5: Lane tracking CNN model architecture.

Bridging the Reality Gap: Investigation of Deep Convolution Neural Networks Ability to Learn from a Combination of Real and Synthetic

Data

449

In dataset collection, the geometric path tracking

system is used to drive the car autonomously in both

simulation and real environments and simultaneously

collects, labels, and store the data. Two datasets are

created, i.e. real and simulation datasets where each

contains around 30,000 labeled images. Figures 6

illustrates the number of images collected per class in

both environments.

Figure 6: Dataset collected for the lane tracking task.

4.3 CNN Models Training and Fine

Tuning

As mentioned earlier the two CNN models will be

trained for six times wherein each the dataset is

changed as follows:

a) 0% real word data + 100% simulation data

b) 100% real-world data + 0% simulation data

c) 33% real-world data + 0% simulation data

d) 33% real-world data + 33% simulation data

e) 33% real-world data + 66% simulation data

f) 33% real-world data + 100% simulation data

The datasets are further divided into training and

validation datasets. The training and validation

datasets percentage are chosen to be 90% and 10% of

the original dataset respectively. The CNN model

used in this work is a pre-trained model with ten

classes. The model will be fine-tuned to fit the four

classes of the lane tracking task. To fine-tune the

model, we truncated the last fully connected layer of

the pre-trained network and replaced it with our own

fully connected layer with four units and softmax

activation. To train the same CNN model from

scratch the complete model parameters are randomly

initialized and trained to find the optimal parameter

set that maps the inputs to their associated targets.

The networks are trained using Adam optimizer

and binary cross-entropy loss function. The batch size

used for training is 32 and the learning rate is 0.001.

During the training phase, we used model

checkpoints to continually save the model along with

the corresponding epoch number. Figures 7, 8 depict

the training loss and validation graphs of the trained

Figure 7: Fine-tuned CNN models training graphs (a) 100% of Simulation dataset (b) 100% of Real dataset (c) 33% of Real

dataset + 0% of simulation data (d) 33% of Real dataset + 33% of simulation data (e) 33% of Real data + 66% of simulation

data (f) 33% of Real data + 100% of simulation data.

10256

10058

10163

10003

12415

10753

10863

10028

0123

NumberofSamples

DataLabel

RealEnvironment SimulationEnvironment

ICINCO 2020 - 17th International Conference on Informatics in Control, Automation and Robotics

450

Figure 8: Training graphs of the CNN models trained from scratch (a) 100% of Simulation dataset (b) 100% of Real dataset

66% of simulation data (f) 33% of Real data + 100% of simulation data.

CNN models where a, b, c, d, e, and f refers to the six

dataset variations mentioned above.

Table 2: Training loss and accuracy of the chosen fine-

tuned CNN models.

Model Epoch

Training

Loss Accuracy

a 19 0.012 0.99

b 16 0.048 0.97

c 13 0.064 0.96

d 17 0.036 0.98

e 13 0.032 0.98

f 19 0.022 0.99

To avoid overfitting and improve the

generalization of the trained models the epoch with

minimum validation loss value is selected for

inference. Tables 2, 3 illustrate the chosen epoch

number and the corresponding training loss and

accuracy for all trained models.

Table 3: Training loss and accuracy of the chosen CNN

models trained from scratch.

Model Epoch

Training

Loss Accuracy

a 9 0.225 0.89

b 3 0.068 0.97

c 12 0.053 0.97

d 6 0.032 0.98

e 6 0.025 0.99

f 2 0.19 0.91

5 EVALUATION OF THE

TRAINED CNN MODELS

To investigate the learning ability of the models we

computed the confusion matrix, accuracy, precision,

recall, and F1 score for all trained models. The results

showed that there is a significant increase in the

accuracy per class as well as overall accuracy with the

continuous addition of simulation data. In other

words, the accuracy of the model trained with 33%

real-world data +100% simulation data is higher than

the model trained with only 33% of real data. Table 4

depicts the overall accuracy for models trained with

datasets c, d, e, and f.

Table 4: Overall accuracy for models trained with datasets

c,d,e, and f.

Model

Overall Accuracy

c d e f

Fine-tuned 0.816 0.9 0.93 0.95

From scratch 0.77 0.52 0.77 0.93

Table 5 depicts the F1 score for all trained

models. The F1 score measures the model accuracy

by calculating the weighted average of precision and

recall. An F1 score value of 1 is considered perfect,

i.e. model predictions have low false positives and

negatives.

Bridging the Reality Gap: Investigation of Deep Convolution Neural Networks Ability to Learn from a Combination of Real and Synthetic

Data

451

Table 5: F1 score of the trained models.

Model

Class

Dataset

a b c d e f

Fine-

tuned

1 0.98 0.84 0.76 0.87 0.91 0.93

2 0.96 0.80 0.71 0.87 0.89 0.92

3 0.97 0.85 0.78 0.87 0.91 0.94

4 0.96 0.99 1.0 0.98 0.97 0.98

From

scratch

1 0.99 0.79 0.71 0.5 0.71 0.91

2 0.97 0.78 0.71 0.5 0.71 0.89

3 0.97 0.80 0.72 0.3 0.72 0.92

4 0.96 0.94 0.91 0.56 0.9 1.0

Figure 9 illustrates an example of the confusion

matrix for the models fine-tuned with datasets c, d, e,

and f. The confusion matrix diagonal elements

represent the accuracy of correct predictions per

class. The analysis of the confusion matrix for all

trained models showed a major confusion between

one class and its neighbor classes. This confusion,

however, arises because of the small difference in

steering angle ‘‘10°’’ between one class and another

as well as the small field of view of the camera ‘‘78°’’

which decreases the size of the region of interest

drastically.

To analyze the models' performance further the

frozen inference graphs of the models trained with

datasets a, b, c, d, e, and f are obtained and deployed

on both environments, i.e. simulation and real

environments. In the inference stage, the models take

the incoming image frame and infer the required

action, i.e. vehicle velocity and steering angle.

Several test runs have been conducted in both

environments. The results of these experiments

demonstrated that the model trained only with real or

simulation data works only in its respective

environment. However, the models trained with

mixed data, i.e. real and simulation data generalize

perfectly to both environments especially when the

number of incorporated simulation data samples is

increased to 100%.

6 DISCUSSION

Datasets are a very crucial element in the training of

deep convolution neural networks as it requires a

massive amount of training data which in most cases

is difficult to find. Simulators can provide the domain

with an infinite amount of data samples, however,

they do not have a perfect representation of the

physical world which results in a huge gap between

both environments. Most of the research in the

robotics field employs either simulation or real data

but not a mixture of both. In this study, we

demonstrated that CNN models trained with a

mixture of real and artificial data generalize to both

real and simulation environments without model

adaption. This holds for fine-tuned models as well as

models trained from scratch. These findings are in

accordance with findings reported by Bayraktar,

Yigit, and Boyraz (2019). However, the authors did

not consider training models from scratch as well as

Figure 9: Fine-tuned CNN models confusion matrix (c) 33% of Real dataset + 0% of simulation data (d) 33% of Real dataset

+ 33% of simulation data (e) 33% of Real data + 66% of simulation data (f) 33% of Real data + 100% of simulation data

ICINCO 2020 - 17th International Conference on Informatics in Control, Automation and Robotics

452

deploying the trained models to real and simulation

environments.

Our results cast a new light on the performance

and accuracy of the trained models. The models'

overall accuracy and performance increased

proportionally with the continuous addition of

simulation data.

7 CONCLUSIONS AND FUTURE

WORK

In this paper, we have investigated the ability of

CNNs models to generalize and learn from a mixture

of real and artificial data. Two CNN models have

been evaluated where the first was trained from

scratch and the other was fine-tuned to fit the dataset

classes of the lane tracking task. The CNN models are

trained six times wherein each a different

combination of the collected real and simulation

datasets is used. The results show that the models

trained with a dataset collected from a particular

environment can work only in this environment and

fails when it is transferred to an unseen target

environment. Another promising finding was that the

models' performance increased significantly and were

able to generalize to both real and simulation

environments with the inclusion of simulation data.

On this basis, we conclude that a mixture of

simulation and real data can help the CNN models to

generalize in cases where datasets are scarce and

when models trained in a particular domain are

transferred to an unseen target domain. This paper

provides a good starting point for further research. In

our future research, we intend to examine more

complex model architectures and environments.

REFERENCES

Bayraktar, E., Yigit, C., & Boyraz, P. (2019). A hybrid

image dataset toward bridging the gap between real and

simulation environments for robotics. (pp. 23–40).

Machine Vision and Applications.

Bousmalis, K., & Levine, S. (2017, October 30). Closing

the Simulation-to-Reality Gap for Deep Robotic

Learning. Retrieved March 3, 2020, from

https://ai.googleblog.com/2017/10/closing-simulation-

to-reality-gap-for.html.

Cutler, M., Walsh, T. J., & How, J. P. (2014).

Reinforcement learning with multi-fidelity simulators.

(pp. 3888-3895). IEEE.

Duan, L., Xu, D., & Tsang, I. (2012). Learning with

augmented features for heterogeneous domain

adaptation. (pp. 667–674). arXiv.

Gamal, O., Imran, M., Roth, H., & Wahrburg, J. (2020).

Assistive Parking Systems Knowledge Transfer to End-

to-End Deep Learning for Autonomous Parking. (pp.

216-221). IEEE.

Hoffman, J., Guadarrama, S., Tzeng, E. S., Hu, R.,

Donahue, J., Girshick, R., Darrell, T., Saenko, K.

(2014). LSDA: Large scale detection through

adaptation. (pp. 3536-3544). arXiv.

James, S., Davison, A. J., & Johns, E. (2017). Transferring

end-to-end visuomotor control from simulation to real

world for a multi-stage task. arXiv.

Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw

is not what you get: Domain adaptation using

asymmetric kernel transforms. (pp. 1785-1792). IEEE.

Li, Y., Wang, N., Shi, J., Liu, J., & Hou, X. (2016).

Revisiting batch normalization for practical domain

adaptation. arXiv.

Lin, M., Chen, Q., & Yan, S. (2013). Network in network.

arXiv.

Loquercio, A., Kaufmann, E., Ranftl, R., Dosovitskiy, A.,

Koltun, V., & Scaramuzza, D. (2019). Deep drone

racing: From simulation to reality with domain

randomization. 36. IEEE Transactions on Robotics.

Peng, X., Sun, B., Ali, K., & Saenko, K. (2014). Exploring

invariances in deep convolutional neural networks

using synthetic images. arXiv.

Sadeghi, F., & Levine, S. (2016). Cad2rl: Real single-image

flight without a single real image. arXiv.

Stark, M., Goesele, M., & Schiele, B. (2010). Back to the

Future: Learning Shape Models from 3D CAD Data. 2,

p. 5. BMVC.

Su, H., Qi, C. R., Li, Y., & Guibas, L. J. (2015). Render for

cnn: Viewpoint estimation in images using cnns trained

with rendered 3d model views. (pp. 2686-2694). ICCV.

Sünderhauf, N., Brock, O., Scheirer, W., Hadsell, R., Fox,

D., Leitner, J., Upcroft, B., Abbeel, P., Burgard, W.,

Milford, M., Corke, P. (2018). The limits and potentials

of deep learning for robotics. 37, pp. 405-420. IJRR.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,

Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich,

A. (2015). Going deeper with convolutions. (pp. 1-9).

IEEE.

Tkacz, M. (2005). Artificial neural networks in incomplete

data sets processing. In M. Kłopotek, S. Wierzchoń, &

T. K., Intelligent Information Processing and Web

Mining (Vol. 31, pp. 577-583). Berlin, Heidelberg,

Berlin: Springer.

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., &

Abbeel, P. (2017). Domain randomization for

transferring deep neural networks from simulation to

the real world. (pp. 23-30). IEEE.

Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., & Darrell,

T. (2014). Deep Domain Confusion: Maximizing for

Domain Invariance. arXiv.

Yan, M., Frosio, I., Tyree, S., & Kautz, J. (2017). Sim-to-

real transfer of accurate grasping with eye-in-hand

observations and continuous control. arXiv.

Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014).

How transferable are features in deep neural networks?

NIPS ’14. NIPS Foundation.

Bridging the Reality Gap: Investigation of Deep Convolution Neural Networks Ability to Learn from a Combination of Real and Synthetic

Data

453

Yu, Z., & Yali, W. (2011). Analyses on Influence of

Training Data Set to Neural Network Supervised

Learning Performance. In L. S. Jin D., Advances in

Computer Science, Intelligent System and Environment.

Advances in Intelligent and Soft Computing (Vol. 106).

Heidelberg, Berlin: Springer.

Zhang, F., Leitner, J., Milford, M., & Corke, P. (2016).

Modular deep q networks for sim-to-real transfer of

visuo-motor policies. arXiv.

ICINCO 2020 - 17th International Conference on Informatics in Control, Automation and Robotics

454