
2 RELATED WORK
Research in the field of machine learning (ML) and
artificial neuronal networks (ANN) became increas-
ingly popular over the last few years. However, there
is still only a limited number of projects which ad-
dress the use of machine learning in the context of
software engineering. This holds in particular for
the analysis of (handwritten) UML diagrams. In the
following section we give an overview about related
work in this context.
In (Gosala et al., 2021) the following binary clas-
sification problem is studied: For a given image the
network should decide whether it contains a UML
class diagram or not. A classifier solving this problem
may have several applications. In different phases of
the software development process, different types of
diagrams are used, including class diagrams. An au-
tomated analysis of diagrams defined for a project al-
lows for a quantification of the use of class diagrams
for a given phase in the development process. Fur-
thermore, the classifier may be used to automatically
build a collection of class diagrams generated from
images taken from the internet. These diagrams may
serve as examples for novice developers.
A classifier is introduced in (Gosala et al., 2021),
which is able to solve the aforementioned classifica-
tion problem and which evaluates the results on a test
set. The classifier is based on a CNN (convolutional
neural network). A type of ANN which is popular for
image related classification tasks. It consists of four
convolutional layers and two fully connected layers as
output layers.
The problem of offline recognition of handwrit-
ten diagrams – i. e. having no additional information
about how the text was created by the writer – is de-
scribed in (Sch
¨
afer et al., 2021). The tool introduced
in the paper is based on a sophisticated ANN (called
Arrow R-CNN in the paper) and allows for being used
for a large number of different diagram types due to
its generic approach. It is not limited to a certain dia-
gram type, e. g. class diagrams, but it requires a large
number of classified training data for each type of di-
agram. The tool consists of two different parts: In
the first part, a ANN is used to detect and classify the
different shapes that are contained in the image. In a
second processing step, these shapes are passed to a
diagram-specific algorithm which produces a digital
representation of the diagram.
The Arrow R-CNN network consists of three com-
ponents: A CNN which is used for feature extraction
of the images. The result is then fed into an ANN,
called a Region Proposal Network, which is used to
calculate a large number of Regions of Interest (RoIs.
Each RoI consists of a feature map which is passed to
a ANN consisting of fully connected layers. For each
RoI a corresponding class is determined, which yields
the respective type of model element.
3 ARCHITECTURE
This section describes the architecture of our tool.
We employ techniques from computer vision to de-
tect the classes, their features and relationships be-
tween classes. Details of the implementation of these
steps are discussed in 4. Apart from classical algo-
rithms and concepts from the field of computer vi-
sion, two classifiers based on ANNs, that detect the
hand-written text and numbers and cardinality sym-
bols used for association ends respectively, were im-
plemented. Therefore, we present a short overview of
their specifics in the following paragraphs.
3.1 Classifier for Detecting Multiplicity
Symbols and Numbers
There are already many approaches that tackle the
problem of recognizing hand-written numbers. In
particular broad research was done for the classifica-
tion problem based on the MNIST data set. Results
listed in (LeCun et al., ) reveal that classifiers using
ANNs achieve the best results, especially when CNNs
are used in the first step for feature extraction. Conse-
quently, our classifier follows this approach. We use
a data set containing hand-written numbers. These
are written in the european style contrary to those of
the american-style MNIST dataset
1
. Furthermore, the
dataset is augmented with images of the hand-written
* symbol used for representing unbounded multiplic-
ity in UML. This data set is referred to as ESHWD
(european-style hand-written digits) in the remainder
of this paper.
3.1.1 Preprocessing
We use a 28x28 pixel sized binary image of a sym-
bol or a number as an input for the neural network.
In order to meet this precondition, the images taken
from the ESHWD data set need to undergo several
preprocessing steps: (1) The grey-scale images are
binarized, before (2) artefacts are removed. Since the
line width of the numbers is usually not large enough,
it is enhanced (3) using dilation. In order to meet the
size requirements, each image is (4) resized to 18x18
pixels, and 5 black pixels are added in each direction
1
https://github.com/kensanata/numbers
MODELSWARD 2024 - 12th International Conference on Model-Based Software and Systems Engineering
228