Authors:
Takwa Kochbati
;
Shuai Li
;
Sébastien Gérard
and
Chokri Mraidha
Affiliation:
Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France
Keyword(s):
User Story, Machine Learning, Word Embedding, Clustering, Natural Language Processing, UML Use-case.
Abstract:
In modern software development, manually deriving architecture models from software requirements expressed in natural language becomes a tedious and time-consuming task particularly for more complex systems. Moreover, the increase in size of the developed systems raises the need to decompose the software system into sub-systems at early stages since such decomposition aids to better design the system architecture. In this paper, we propose a machine learning based approach to automatically break-down the system into sub-systems and generate preliminary architecture models from natural language user stories in the Scrum process. Our approach consists of three pillars. Firstly, we compute word level similarity of requirements using word2vec as a prediction model. Secondly, we extend it to the requirement level similarity computation, using a scoring formula. Thirdly, we employ the Hierarchical Agglomerative Clustering algorithm to group the semantically similar requirements and provide
an early decomposition of the system. Finally, we implement a set of specific Natural Language Processing heuristics in order to extract relevant elements that are needed to build models from the identified clusters. Ultimately, we illustrate our approach by the generation of sub-systems expressed as UML use-case models and demonstrate its applicability using three case studies.
(More)