Using the CGAN Model Extend Encounter Targets Image Training
Ruolan Zhang
and Masao Furusho
Graduate School of Maritime Sciences, Kobe University, 5-1-1 Fukae-Minami, Higashinada, Kobe, Japan
Keywords: Data Generated, Target Ships, Training Set, Unmanned Navigation, Autonomous Decision.
Abstract: A fully capable unmanned ship navigation requires full autonomous decision-making, large-scale decision
model training data to answer for these conditions is essential. However, it is difficult to obtain enough scenes
training data in a real sea navigation environment. In response to possible emergency situations even no shore-
station support, this paper proposes a method using conditional generative adversarial networks (CGAN) to
generate the most executable large-scale target ships image set, which can be used to training various sea
conditions autonomous decision-making model. In practice, most of the current research on unmanned ships
are based onshore remote control or monitoring. Nonetheless, in some extremely special circumstances, such
as communication interruption, or if the ship cannot be guided or remotely controlled in real time on the shore,
the unmanned ship must make an appropriate decision and form new plans according to the encounter targets
and the whole current situation. The CGAN model is a novel means to generate the target ships to construct
the whole encounter sea scenes situation. The generated targets training image set can be used to train decision
models, and explore a new way to approach large-scale, fully autonomous navigation decisions.
The equipment used in modern ocean-going vessels
can be roughly divided into two types: navigation aids
that help the crew to make the right decisions and
control equipment that the seafarer’s control. To
reduce running costs and human factors in accidents,
unmanned vessels with autonomous perception and
decision-making are the future development
direction. For this, a model or system is required to
receive data from the navigation aids and make
appropriate decisions for the obtained data through
the control device to complete the autonomous
navigation. The heavy sea environment may interrupt
satellite communication and result in loss of remote-
control capacity. To achieve long-distance and
completely autonomous, unmanned merchant ship
transportation, the system must be able to make its
decisions at any time in response to
emergencies, change the established strategy, and
eliminate the danger.
As shown in Figure 1 on the following page, the
ship is sailing along the coast, the target ship
represented by T1~T3, and T* represents the fishing
boat group that performs fishing operations in one
area. Short-distance path replanning is possible,
especially in extreme navigation environments, and
without remote assistance from the shore, the
unmanned ship must rely on the limited data
information to make appropriate decisions (Liu and
Bucknall, 2015). From the perspective of the bridge,
the confirmation of the target ship and its trajectory
are not easily presented in a three-dimensional
manner. Furthermore, the relative positional
relationship between the target ship and the ship is
critical to the training of decision-making neural
networks, so it is important to construct enough
confrontation scenarios to train large-scale
unsupervised decision models. Another problem of
unsupervised learning is the determining of how to
generate a new path for no-human participation in
decision making; this problem sets high demands on
the model.
First, the available models built using
unsupervised methods are reviewed. The most
straightforward idea is to estimate the sample
distribution p(x) from the training set and sample p(x)
to generate a new sample “similar to the training set”.
Zhang, R. and Furusho, M.
Using the CGAN Model Extend Encounter Targets Image Training Set.
DOI: 10.5220/0007676803270332
In Proceedings of the 5th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2019), pages 327-332
ISBN: 978-989-758-374-2
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
Figure 1: Real sea encounter environment.
For low-dimensional samples, a simple
probabilistic model with only a few parameters (such
as Gaussian) can be fitted to p(x), whereas high-
dimensional samples (such as images) are difficult to
implement. In the extreme navigation environment,
the input of the sensing device may be encoding low-
latitude information, or other high-latitude
information such as images. A classic method is to
construct an undirected graph by using the Restricted
Boltzmann Machine (RBM). The energy values and
node probability of the graph have an exponential
relationship (Nair and Hinton, 2010). The training set
is used to set the coefficients of nodes and edges in
the graph to express the relationship between
individual elements and connected elements in x.
This method is cumbersome and computationally
complex. The mixing speed of the Markov chain is
very slow when sampling (Neal, 2000). Another
method is the use of deep belief networks (DBNs), in
which a single RBM and several directed layers are
used to form a network. This method has the same
computational complexity (Hinton et al., 2006).
Another popular method is the use of convolutional
neural networks (CNNs). Although CNNs show
immediate results in supervised learning including
classification and segmentation, how to conduct
unsupervised learning has always been a problem
(Wang and Gupta, 2015). Generative adversarial nets
(GAN) can solve this problem systematically.
2.1 Generative Adversarial Nets (GAN)
GAN is a new method of training the generation
model proposed by Goodfellow et al., (2014); the
method includes the generation and discrimination of
two “adversarial” models. The generated model (G)
is used to capture the data distribution, and the
discriminant model (D) is used to estimate the
probability that a sample is derived from real data
rather than generating samples. Both the generator
and discriminator are common convolutional
networks as well as fully connected networks. The
generator generates a sample from the random vector,
and the discriminator discriminates between the
generated sample and training set sample. Both train
simultaneously as shown in equation (1):
When training the discriminator, discriminant
model D is fixed, while the parameters of generator G
are adjusted to minimize the expectation of  
 as shown in equation (2):
where model G is fixedly generated and the
parameters of the D are adjusted to maximize the
expectation of 
   . As
shown in equation (3)
This optimization process can be attributed to a
“two-player minimax game” problem. Both purposes
can be achieved through a backpropagation method.
A well-trained generation network can transform any
noise vector into a sample similar to the training set.
This noise can be seen as the encoding of the sample
in a low dimensional space. The generator generates
meaningful data based on random vectors. In contrast,
the discriminator learns how to determine real and
generated data, and then passes the learning
experience to the generator, enabling the generator to
generate more workable data based on random
vectors. Such a trained generator can have many uses,
one of them being environmental generation in
automatic navigation. This paper proposes the
feasibility of this method and applies it in image set
generation to allow unmanned ships to achieve fully
autonomous decision-making processes.
2.2 Generated Image Set Data
The acquiring of real data on critical sea conditions is
difficult; therefore, acquiring data similar to real
scenes is important, especially when data is scarce.
When acquiring data for training the automatic
driving system, according to the concept used in this
study, the use of GAN can be extended to replace the
real image according to the virtual image generated
  
  
  
  
  
VEHITS 2019 - 5th International Conference on Vehicle Technology and Intelligent Transport Systems
from the generator. (Yang et al., 2018) used an
opposing concept, in which the scene image obtained
was directly acquired through real driving by using
unsupervised learning to remove the details unrelated
to the prediction of driving behavior. It was simplified
to the refinement specification representation in the
virtual domain. Accordingly, the ship-driving
instructions were predicted to form a new training
program that is more efficient and accurate.
Unmanned surface navigation involves a type of
driverless navigation. The deep Q-network (DQN)
algorithm can be used to train the unmanned ship
navigation model (Mnih et al., 2015); this is the
embodiment of the wider application of GAN.
Goodwin (Goodwin 1975) derived, training data from
sensory data collected by the ship’s real navigation,
and real navigation seafarers inevitably maintained
sufficient safety distance. Therefore, when an
unmanned ship encounters a dangerous situation, the
experience replay utilized is actually not sufficient.
This is because adopting appropriate decision-
making and behavioral judgment based on the
previous data-training results is difficult when the
unmanned ship actually encounters a sea state that is
different from typical sea conditions (Schaul et al.,
The purpose of the present study involves
generating data containing more similar critical sea
conditions through the GAN model by using a small
number of maritime-navigation real data in critical
situations. The aforementioned data potentially
corresponds to pre-collision scenes encountered by
two ships [including pictures, Automatic
identification system (AIS) data, and radar data]. For
example, the data can also correspond to a scene that
occurred prior to when a ship is stranded in the waters
with insufficient water depth (the most important
factor corresponds to the water-depth data). The
problem for which the DQN algorithm does not learn
from experience in the case of the aforementioned
data sparseness is solved using the GAN algorithm.
3. 1 Conditional GANs
Conditional GANs (CGAN) is an extension of the
original GAN, in which both the generator and
discriminator add additional information y to the
condition. Here, y can be any information, such as
category information or any other modality
information data as shown equation (4). (Mirza and
Osindero, 2014). If condition variable y is a category
label, CGAN can be considered as an improved
supervised model of the pure unsupervised GAN.
This simple and straightforward improvement has
proven to be very effective and widely used in
subsequent related work (Denton et al, 2015; Radford
et al., 2015).
The conditional GAN is achieved by feeding
additional information y to the discriminator and
generator models as part of the input layer. In the
generator model, the input a priori noise p(z) and
condition information y are combined to form a joint
hidden-layer representation. The adversarial training
framework is relatively flexible in terms of the
composition of the hidden-layer representation.
Similarly, the objective function of conditional GAN
is a “Conditional two-player minimax game.”
3.2 The Process of CGAN in
The specific process to obtain various sea-condition
scenarios is shown in Figure 2. First, condition
information Y is entered into the generator and
discriminator, and then a few random vectors are
input to the generator network, and fake data are
subsequently generated by the generator. These fake
data can correspond to a few ship-state pictures or a
few other navigation data, such as AIS data of nearby
  
  
Figure 2: Applying CGAN algorithm to generate encounter targets image training set.
Using the CGAN Model Extend Encounter Targets Image Training Set
Figure 3: Generated lifeboat from submarine by Big-GAN.
ship or the path planning data after the ship route is
updated. We inputted the fake data to the
discriminator, which determines whether the input
data are real or generated by the generator based on a
random comparison with the real data. The similarity
between the data by the generator and the real data
from the discriminator progressively increases, thus
increasing the discriminating ability required by the
discriminator. Additionally, the generator and
discriminator share a mutually competitive and
adversarial relationship. The generated data are
considered to sufficiently mirror real data when the
fake data input by the generator appears sufficiently
realistic, and the accuracy of the discriminator at this
time is approximately 50%. This corresponds to the
sea-scene data required in critical sea situations.
3.3 A Case Study for Generated Image
CGAN model image generation as shown in figure3.
This section mainly demonstrates how to generate a
lifeboat using images of submarines taken from
different angles using the Big-GAN model. Big-GAN
model proposed by Andrew Brock from Heriot-Watt
University (Brock et al., 2018). The authors proposed
a model (Big-GAN) with modifications focused on
the following three aspects: a. improving
conditioning by applying orthogonal regularization to
the generator; b. The orthogonal regularization
applied to the generator makes the model amenable to
the “truncation trick” so that fine control of the trade-
offs between fidelity and variety is possible by
truncating the latent space; c. stability is very
significant for large-scale image generate.
Figure 4: Generate lifeboat images by Big-GAN model.
Figure 5: A portion of the generated lifeboat image set.
Figure 6: A portion of the generated ocean liner image set.
We can input different random vectors and
combine the real data input by the discriminator to get
a large number of Real enough image data sets. This
study, we input random vectors are submarine. As
shown in Figure 4, we get three different types of
lifeboats, more importantly, the different background
environments of lifeboats can also achieve various
changes. It can be provided a large amount of training
data for our unsupervised decision model. It is much
VEHITS 2019 - 5th International Conference on Vehicle Technology and Intelligent Transport Systems
richer than the target ship data collected from real sea
4.1 Generated Target Image Set
The most important part of this study is to obtain a
data set of the target ship with sufficient quality and
quantity. As shown in Figure 5, a portion of the entire
large-scale lifeboat target image data set is shown.
These images were not taken by the camera and were
generated entirely from our GAN model. Using our
model, we can generate various situations at sea
scenes and the various forms that the own ship may
encounter; even various types of accidents, such as
collisions, stranding, fire, loss of goods, etc. Not only
the training data set for lifeboats, but also the ocean
liner data set shown in Figure 6, as well as data sets
for various other types of marine moving targets.
4.2 Generated Data for Unsupervised
Decision Model Training
GAN is easy to embed into the framework of
reinforcement learning. For example, when using
Deep Q-Network to solve collision avoidance
problems, GAN can be used to learn the conditional
probability distribution of an action. The agent can
select reasonable images based on the response of the
generated model to different actions.
In the training of image recognition models of
convolutional neural networks and the training of
decision models such as deep reinforcement learning,
the quality of the input data greatly affects the effect
of the training results. The target image dataset
generated by CGAN has the same image size and the
same image density, which can easily solve the
problem of inconsistent input data during the training
process. In addition, the CGAN model solves many
of the scene data that are difficult to obtain in a real
navigation environment, making it possible to use
large-scale data input for deep learning.
This paper using the Conditional Generative
Adversarial Networks to generate image set of the
available target ships and improve the quantity and
quality of training data. The surrounding environment
data of the own ship obtained by the sensors, mainly
includes AIS data, radar data, and image data. Small
vessels, especially those in some areas, do not have
AIS data, radar data is greatly affected by the weather.
Therefore, training automatic driving unmanned
ships are inseparable from the support of image data
sets, especially the image data of small ships in
various states. This paper uses the image data of
target ship as a sample, which obtained from the
perspective of the ship’s bridge, using the CGAN
algorithm to generate more, and the same type of the
target ship image data to support model training.
According to different condition information, through
the CGAN model, it is possible to generate more
different environmental states, such as different near-
shore backgrounds, different city lighting pollution,
different weather conditions, and even different
seasons, new images of different ocean wave levels.
This method can greatly expand the quantity and
quality of the training data set, therefore, easy for
completing the construction of a better autonomous
unsupervised decision model.
Brock A, Donahue J, Simonyan K. Large scale gan training
for high fidelity natural image synthesis[J]. arXiv
preprint arXiv:1809.11096, 2018.
Denton, E. L., Chintala, S., and Fergus, R. (2015): Deep
generative image models using a laplacian pyramid of
adversarial networks. In Advances in neural
information processing systems.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., ... and Bengio, Y. (2014):
Generative adversarial nets. In Advances in neural
information processing systems.
Goodwin, E. (1975): A statistical study of ship domains.
The Journal of Navigation.
Hinton, G. E., Osindero, S., and Teh, Y. W. (2006): A fast
learning algorithm for deep belief nets. Neural
Liu, Y. and Bucknall, R. (2015): Path planning algorithm
for unmanned surface vehicle formations in a practical
maritime environment. Ocean Engineering.
Mirza, M., and Osindero, S. (2014): Conditional generative
adversarial nets. arXiv preprint arXiv:1411.1784.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A.,
Veness, J., Bellemare, M. G., ... and Petersen, S. (2015):
Human-level control through deep reinforcement
learning. Nature.
Nair, V. and Hinton, G. E. (2010): Rectified linear units
improve restricted boltzmann machines. In
Proceedings of the 27th international conference on
machine learning (ICML-10).
Neal, R. M. (2000): Markov chain sampling methods for
Dirichlet process mixture models. Journal of
computational and graphical statistics.
Radford, A., Metz, L., and Chintala, S. (2015):
Unsupervised representation learning with deep
Using the CGAN Model Extend Encounter Targets Image Training Set
convolutional generative adversarial networks. arXiv
preprint arXiv:1511.06434.
Schaul, T., Quan, J., Antonoglou, I., and Silver, D. (2015):
Prioritized experience replay. arXiv preprint
Wang, X. and Gupta, A. (2015): Unsupervised learning of
visual representations using videos. In Proceedings of
the IEEE International Conference on Computer
Yang, L., Liang, X., and Xing, E. (2018): Unsupervised
Real-to-Virtual Domain Unification for End-to-End
Highway Driving. arXiv preprint arXiv: 1801.03458.
VEHITS 2019 - 5th International Conference on Vehicle Technology and Intelligent Transport Systems