portant for applying model clone detection in prac-
tice (Deissenboeck et al., 2010; Stephan and Cordy,
2014). In this work, we have fixed scoping of EClass,
thus do not run into the nested clones problem.
This would be somewhat important for e.g. EPackage
scope, but even more so for other types of models.
Other aspects including clone ranking, reporting and
inspection, visualisation are also left as future work.
Threats to Validity. A threat to validity due to the
preliminary nature of this study is the limited evalu-
ation on a synthetic dataset using a single mutation
analysis approach. Further evaluation using chains
of mutations which lead to Type C clones, and even-
tually using a real dataset is needed. It is at this
phase not clear to us what the frequency of the various
changes (i.e. mutations) is in reality, which directly
contributes to the overall accuracy of our approach
(cf. the weakness of bigrams for certain mutations).
An evaluation on a real dataset, combined with addi-
tional comparative evaluation of existing model clone
detectors would be necessary to properly assess the
precision and more importantly our relative recall.
7 CONCLUSION AND FUTURE
WORK
In this paper we present a novel model clone detec-
tion approach based on SAMOS using information
retrieval and machine learning techniques. We have
extended SAMOS with additional scoping, compari-
son schemes, customised distance measures and clus-
tering algorithms in the context of metamodel clone
detection. We have evaluated our approach using mu-
tation analysis and identified the strengths and weak-
nesses of our approach in a case-based manner.
As future work, we plan to further extend SAMOS
with additional features (e.g. n-grams with n > 2 and
subtrees), customised, improved weighting schemes,
distance measures and statistical algorithms. Another
next step is to extend state-of-the-art model clone
detectors such as Simone, ConQAT and MQ
lone
for
metamodel clone detection for evaluating those tools
separately and comparatively with our approach for
precision and relative recall.
REFERENCES
Alalfi, M. H., Cordy, J. R., Dean, T. R., Stephan, M., and
Stevenson, A. (2012). Models are code too: Near-
miss clone detection for simulink models. In Software
Maintenance (ICSM), 2012 28th IEEE Int. Conf. on,
pages 295–304. IEEE.
Babur,
¨
O. (2016). Statistical analysis of large sets of mod-
els. In 31th IEEE/ACM Int. Conf. on Automated Soft-
ware Engineering, pages 888–891.
Babur,
¨
O. and Cleophas, L. (2017). Using n-grams for the
automated clustering of structural models. In 43rd
Int. Conf. on Current Trends in Theory and Practice
of Computer Science, pages 510–524.
Babur,
¨
O., Cleophas, L., and van den Brand, M. (2016). Hi-
erarchical clustering of metamodels for comparative
analysis and visualization. In Proc. of the 12th Eu-
ropean Conf. on Modelling Foundations and Applica-
tions, 2016, pages 3–18.
Babur,
¨
O., Cleophas, L., van den Brand, M., Tekinerdogan,
B., and Aksit, M. (2017). Models, more models and
then a lot more. In Grand Challenges in Modeling, to
appear.
Deissenboeck, F., Hummel, B., Juergens, E., Pfaehler, M.,
and Schaetz, B. (2010). Model clone detection in prac-
tice. In Proc. of the 4th Int. Workshop on Software
Clones, pages 57–64. ACM.
Deissenboeck, F., Hummel, B., J
¨
urgens, E., Sch
¨
atz, B.,
Wagner, S., Girard, J.-F., and Teuchert, S. (2008).
Clone detection in automotive model-based devel-
opment. In Software Engineering, 2008. ICSE’08.
ACM/IEEE 30th Int. Conf. on, pages 603–612. IEEE.
Deza, M. M. and Deza, E. (2009). Encyclopedia of Dis-
tances. Springer.
Dijkman, R., Dumas, M., van Dongen, B., K
¨
a
¨
arik, R.,
and Mendling, J. (2011). Similarity of business pro-
cess models: Metrics and evaluation. Inf. Systems,
36(2):498–516.
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996).
A density-based algorithm for discovering clusters a
density-based algorithm for discovering clusters in
large spatial databases with noise. In Proc. of the
Second Int. Conf. on Knowledge Discovery and Data
Mining, KDD’96, pages 226–231. AAAI Press.
G
´
omez-Abajo, P., Guerra, E., and de Lara, J. (2016). Wodel:
a domain-specific language for model mutation. In
Proceedings of the 31st Annual ACM Symposium on
Applied Computing, pages 1968–1973. ACM.
Liu, H., Ma, Z., Zhang, L., and Shao, W. (2006). Detect-
ing duplications in sequence diagrams based on suffix
trees. In Software Engineering Conf., 2006. APSEC
2006. 13th Asia Pacific, pages 269–276. IEEE.
Manning, C. D., Raghavan, P., Sch
¨
utze, H., et al. (2008).
Introduction to information retrieval, volume 1. Cam-
bridge University Press.
Manning, C. D. and Sch
¨
utze, H. (1999). Foundations of
Statistical Natural Language Processing. MIT Press.
Pham, N. H., Nguyen, H. A., Nguyen, T. T., Al-Kofahi,
J. M., and Nguyen, T. N. (2009). Complete and ac-
curate clone detection in graph-based models. In Pro-
ceedings of the 31st Int. Conf. on Software Engineer-
ing, pages 276–286. IEEE Computer Society.
Roy, C. K., Cordy, J. R., and Koschke, R. (2009). Compari-
son and evaluation of code clone detection techniques
MODELSWARD 2018 - 6th International Conference on Model-Driven Engineering and Software Development
418