cept mining (Abebe and Tonella, 2010) use NLP to
cluster features/concepts. (Ratiu et al., 2008) builds
domain ontologies as the intersection of graphs of
APIs, but does not focus on the statistical dimension
of problem. Metamodel recovery (Javed et al., 2008)
is another approach which assumes a once existing
(but somehow lost) metamodel, and does not hold for
our scenario. (Dijkman et al., 2011) applies a tech-
nique similar to ours, specifically for business pro-
cess models using process footprints and thus lacks
the genericness of our approach. Note that a thor-
ough literature study beyond the technological space
of MDE, for instance regarding data schema match-
ing and ontology matching/alignment, is out of scope
for this paper and is therefore omitted.
6 CONCLUSIONS AND FUTURE
WORK
In this paper, we have presented a new perspective
on the N-way comparison and analysis of models as
a first step in model recovery. We have proposed a
generic approach using the IR techniques VSM and
tf-idf to uniformly represent multiple models, and ap-
ply statistical analysis with K-means and hierarchi-
cal clustering. Using a model mutation framework,
we have synthetically generated a dataset to apply our
method and demonstrate its potential uses. The re-
sults indicate that our approach is a promising first
step for analysing large datasets, being generic and
scalable/efficient using R.
As future work, the most important goal is to work
with real datasets of possibly heterogeneous models,
rather than the synthetic one. A real dataset (e.g. class
diagrams of multiple domain tools) can be acquired
through reverse engineering, which we have omitted
for this paper. The NLP or semantic issues pose the
next set of challenges to tackle. A careful assessment
of different and more advanced options for model rep-
resentation, distance measures, and clustering tech-
niques needs to be done in order to increase the accu-
racy and efficiency of our approach. Although this is
presented as an exploratory step, it can also be inves-
tigated how VSM and clustering information can be
used for model merging and domain model recovery.
REFERENCES
Abebe, S. L. and Tonella, P. (2010). Natural language pars-
ing of program element names for concept extraction.
In Program Comprehension (ICPC), 2010 IEEE 18th
International Conference on, pages 156–159. IEEE.
Altmanninger, K., Seidl, M., and Wimmer, M. (2009). A
survey on model versioning approaches. International
Journal of Web Information Systems, 5(3):271–304.
Babur,
¨
O., Smilauer, V., Verhoeff, T., and van den Brand, M.
(2015a). Multiphysics and multiscale software frame-
works: An annotated bibliography. Technical Report
15-01, Dept. of Mathematics and Computer Science,
Technische Universiteit Eindhoven, Eindhoven.
Babur,
¨
O., Smilauer, V., Verhoeff, T., and van den Brand,
M. (2015b). A survey of open source multiphysics
frameworks in engineering. Procedia Computer Sci-
ence, 51:1088–1097.
Brunet, G., Chechik, M., Easterbrook, S., Nejati, S., Niu,
N., and Sabetzadeh, M. (2006). A manifesto for model
merging. In Proc. of the 2006 Int. Workshop on Global
Integrated Model Management, pages 5–12. ACM.
Budinsky, F. (2004). Eclipse modeling framework: a devel-
oper’s guide. Addison-Wesley Professional.
Deissenboeck, F., Hummel, B., Juergens, E., Pfaehler, M.,
and Schaetz, B. (2010). Model clone detection in prac-
tice. In Proc. of the 4th Int. Workshop on Software
Clones, pages 57–64. ACM.
Dijkman, R., Dumas, M., Van Dongen, B., K
¨
a
¨
arik, R.,
and Mendling, J. (2011). Similarity of business pro-
cess models: Metrics and evaluation. Inf. Systems,
36(2):498–516.
Jain, A. K. and Dubes, R. C. (1988). Algorithms for clus-
tering data. Prentice-Hall, Inc.
Javed, F., Mernik, M., Gray, J., and Bryant, B. R. (2008).
Mars: A metamodel recovery system using grammar
inference. Inf. and Software Tech., 50(9):948–968.
Klint, P., Landman, D., and Vinju, J. (2013). Exploring the
limits of domain model recovery. In Software Mainte-
nance (ICSM), 2013 29th IEEE International Confer-
ence on, pages 120–129. IEEE.
Kolovos, D. S., Ruscio, D. D., Pierantonio, A., and Paige,
R. F. (2009). Different models for model matching:
An analysis of approaches to support model differenc-
ing. In Comparison and Versioning of Software Mod-
els, 2009. ICSE Workshop on, pages 1–6. IEEE.
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., and
Hornik, K. (2013). cluster: Cluster Analysis Basics
and Extensions. R package version 1.14.4.
Manning, C. D., Raghavan, P., Sch
¨
utze, H., et al. (2008).
Introduction to information retrieval, volume 1. Cam-
bridge university press Cambridge.
Ratiu, D., Feilkas, M., and J
¨
urjens, J. (2008). Extracting
domain ontologies from domain specific apis. In Soft-
ware Maintenance and Reengineering, 2008. CSMR
2008. 12th European Conf. on, pages 203–212. IEEE.
Reinhartz-Berger, I. (2010). Towards automatization of
domain modeling. Data & Knowledge Engineering,
69(5):491–515.
Rubin, J. and Chechik, M. (2013). N-way model merging.
In Proc. of the 2013 9th Joint Meeting on Foundations
of Software Engineering, pages 301–311. ACM.
She, S., Lotufo, R., Berger, T., Wøsowski, A., and Czar-
necki, K. (2011). Reverse engineering feature models.
In Software Engineering (ICSE), 2011 33rd Interna-
tional Conference on, pages 461–470. IEEE.
MODELSWARD 2016 - 4th International Conference on Model-Driven Engineering and Software Development
366