2 RELATED WORK
There are many interesting papers proposing that re-
quirements engineering should be supported by tools
based on the linguistic approach.
In paper (Rolland and Proix, 1992), the authors
introduced a tool called OISCI. The mentioned tool
processes French natural language. The approach pre-
sented in the paper targets the creation of the characte-
rization of the parts of the sentence patterns that will
be thereafter matched. OISCI also uses a text gene-
ration technique from the conceptual specification to
natural language for the validation purposes.
In paper (Ambriola and Gervasi, 1997), a web-
based system called Circe is presented that primary
processes Italian natural language (but may be also
adapted for other languages). It consists of partial
tools. For our purposes, the most interesting tool is
the main one called Cico. Cico performs recogni-
tion of natural language sentences and prepares inputs
for other tools – graphical representation, metrication,
and analysis. The paper presents the idea that requi-
rements specification may be connected with a corre-
sponding glossary describing all the domain-specific
terms used in the requirements. The glossary also
handles synonyms of terms. Similarly to the previous
paper, Cico uses predefined patterns that are matched
against the sentences from requirements specification.
In papers (Kof, 2005) and (Kof, 2004), the NLP
approach is broken down into three groups. The first
one is related to lexical methods, the second one build
syntactical methods (part-of-speech tagging), and the
last one is represented by semantic methods, i.e., by
methods that interpret each sentence as a logical for-
mula or looking for predefined patterns.
Linguistic assistant for Domain Analysis (LIDA)
is a tool presented in the paper (Overmyer et al., 2001)
from 2001. According to previous tools, LIDA is con-
ceived as a supportive tool – it can recognize multi-
word phrases, retrieves base form of words (stemming
and lemmatization), presents frequency of words, etc.
– but it doesn’t contain algorithms for automatic re-
cognition of model elements. The decisions about
modeling are fully user-side, i.e., the user marks can-
didates for entities, attributes, and relations (inclusive
operations and roles).
In paper (Arellano et al., 2015), there is presented
tool TextReq based on The Natural Language Tool-
kit (NLTK). This toolkit is an open source platform
for natural language processing in Python. It is an al-
ternative to Stanford CoreNLP that is based on Java.
From the papers mentioned above, the concept pre-
sented in this paper is the closest one to our approach.
There are also papers about generating the dyna-
mic diagrams – e.g., Activity and Sequence Diagram
(Gulia and Choudhury, 2016). A nice survey of pa-
pers on this field is given in (Dawood and Sahraoui,
2017).
3 PROBLEMS OF TEXTUAL
REQUIREMENTS
SPECIFICATIONS
The process of textual requirements processing,
which is implemented in our tool TEMOS, is repre-
sented in Fig. 1. The schema also contains the
swim lanes that visualize which parts are computed
by our algorithms (TEMOS swim lanes), and which
part is provided by Standford Core NLP framework
(the middle swim lane). We recall that our tool accep-
ted any free text as the input.
In the first phase, the text is perceived as a plain se-
quence of characters. We have to identify some cases
that are not properly handled by Stanford Core NLP
system, e.g., ”his/her interpretation”.
3.1 Nature Language Processing using
Stanford CoreNLP
The text processing part of our tool is based on Stan-
ford CoreNLP (Manning et al., 2014) and on its an-
notators. Annotators are procedures solving the dif-
ferent parts of the linguistic processing and genera-
ting notations that describe the results. The tokeni-
zation annotator parses the input text and provides
lexical analysis. Tokens are used to build sentences
by annotator Words to Sentence Annotator. The an-
notator POS Tagger Annotator provides the part of
speech (POS) annotation (tagging) of every token –
such as noun, verb, adjective, etc. Interpunction and
other special characters are annotated with the same
character that their represent. The Morpha Annotator
generates base forms (lemmas) for every token.
From our point of view, the most interesting anno-
tator is called Dependency Parse Annotator. It analy-
zes the grammatical structure of a sentence and looks
for relationships between words (nominal subject(s)
of verb, dependency object(s) of verb, etc.). Fig. 2
presents the output of the dependencies annotation.
The dependency direction is indicated by an arrow.
Every sentence has one or more root words. These
are the words that have no input dependencies. In Fig.
2, there is one root word – bedroom. We can see that
the word bedroom is connected by a compound de-
pendency to the word hotel. It may indicate that a
hotel bedroom is a multi-word term, similarly like a
ICSOFT 2018 - 13th International Conference on Software Technologies
198