lines of source code to ensure that we analyze large
amounts of code and different coding styles. Howe-
ver, we cannot generalize our results beyond our cur-
rent dataset and learning algorithm.
8 CONCLUSIONS
In this paper, we present the first benchmark for ana-
lyzing the trade-offs between three pattern types (se-
quential, partial and no-order) with respect to real
code. Our approach consists of three steps: the trans-
formation of source-code into a stream of events, the
adaptation of an event mining algorithm to the special
context of pattern mining for software engineering,
and filtering of the resulting patterns.
Our empirical investigation shows that there are
different types of patterns learned in code reposito-
ries. While there are tradeoffs between pattern types
in terms of expressiveness, consistency and generali-
zability, they are comparable in terms of the patterns
size and number of API types. Our results empiri-
cally show that the sweet spot are partial-order pat-
terns, which are a superset of sequential-order pat-
terns, without losing valuable information like no-
order patterns. Partial-order mining finds additional
patterns that are missed by sequence mining, which
generalize across repositories. Compared to no-order
mining, partial-order learns a smaller percentage of
cross-repository patterns (58% vs. 48%), due to the
order constraints between events within a pattern.
Evaluation results show that all three configurations
end-up learning only repository-specific patterns for
pattern sizes with 6-events or more. Furthermore, our
results empirically show the consistency of order in-
formation in sequential and partial-order patterns: on
average 90% and 96% respectively.
Our findings are useful indications for researchers
who work with code patterns in applications such as
code recommendation and misuse detection.
ACKNOWLEDGEMENTS
This work has been supported by the European Re-
search Council with grant No. 321217, and by the
German Science Foundation (DFG) in the context of
the CROSSING Collaborative Research Center (SFB
#1119, project E1). The authors want to thank Raajay
Viswanathan for the technical support with the epi-
sode mining algorithm, and Ulf Brefeld for the useful
suggestions on the analyses of the data presented on
this paper. The authors take full responsibility for the
content of the paper.
REFERENCES
Achar, A., Laxman, S., Viswanathan, R., and Sastry, P.
(2012). Discovering injective episodes with general
partial orders. Data Mining and Knowledge Disco-
very, pages 67–108.
Achar, A. and Sastry, P. (2015). Statistical significance
of episodes with general partial orders. Information
Sciences, pages 175–200.
Acharya, M. and Xie, T. (2009). Mining API error-handling
specifications from source code. In International Con-
ference on Fundamental Approaches to Software En-
gineering, pages 370–384.
Acharya, M., Xie, T., Pei, J., and Xu, J. (2007). Mining
API patterns as partial orders from source code: from
usage scenarios to specifications. In European Soft-
ware Engineering Conference and the ACM SIGSOFT
Symposium on The Foundations of Software Engineer-
ing, pages 25–34.
Agrawal, R., Imieli
´
nski, T., and Swami, A. (1993). Mining
association rules between sets of items in large data-
bases. In ACM SIGMOD, pages 207–216.
Buse, R. P. and Weimer, W. (2012). Synthesizing api usage
examples. In Proceedings of the 34th International
Conference on Software Engineering, pages 782–792.
IEEE Press.
De Roover, C., Lammel, R., and Pek, E. (2013). Multi-
dimensional exploration of api usage. In Program
Comprehension (ICPC), 2013 IEEE 21st Internatio-
nal Conference on, pages 152–161. IEEE.
Gabel, M. and Su, Z. (2008). Javert: fully automatic mining
of general temporal properties from dynamic traces.
In ACM SIGSOFT International Symposium on Foun-
dations of Software Engineering, pages 339–349.
Haase, J. and Brefeld, U. (2014). Mining positional data
streams. In International Workshop on New Frontiers
in Mining Complex Patterns, pages 102–116.
Ma, H., Amor, R., and Tempero, E. (2006). Usage patterns
of the java standard api. In Software Engineering Con-
ference, 2006, pages 342–352.
Mannila, H., Toivonen, H., and Inkeri Verkamo, A. (1997).
Discovery of frequent episodes in event sequences.
Data Mining and Knowledge Discovery, pages 259–
289.
Martin, R. C. (2003). Agile software development: princi-
ples, patterns, and practices. Prentice Hall PTR.
Mendez, D., Baudry, B., and Monperrus, M. (2013). Em-
pirical evidence of large-scale diversity in API usage
of object-oriented software. In Source Code Analysis
and Manipulation, pages 43–52.
Michail, A. (2000). Data mining library reuse patterns using
generalized association rules. In International Confe-
rence on Software Engineering, pages 167–176.
Montandon, J. E., Borges, H., Felix, D., and Valente, M. T.
(2013). Documenting APIs with examples: Lessons
learned with the APIMiner platform. In WCRE, pages
401–408.
Negara, S., Codoban, M., Dig, D., and Johnson, R. E.
(2014). Mining fine-grained code changes to detect
Investigating Order Information in API-Usage Patterns: A Benchmark and Empirical Study
67