An accuracy evaluation based on five real-world
data sets with externally provided ground truth shown
that the proposed algorithm exhibits a superior ac-
curacy. In the best case, the algorithm exhibited an
average F-measure of 0.9969. In the future, we plan
to deal with the problem of incremental discovery and
message type evolution, which is somewhat inherent to
the logging domain. We plan to address this by altering
the algorithm so that it can work in an online manner,
i.e. the message types would be discovered on-the-fly
and the pattern-set for regex matching would be also
updated dynamically.
ACKNOWLEDGEMENTS
The publication of this paper and the follow-
up research was supported by the ERDF „Cy-
berSecurity, CyberCrime and Critical Informa-
tion Infrastructures Center of Excellence“ (No.
CZ.02.1.01/0.0/0.0/16_019/0000822).
REFERENCES
Chen, B. and (Jack) Jiang, Z. M. (2017). Characterizing
logging practices in java-based open source software
projects – a replication study in apache software foun-
dation. Empirical Software Engineering, 22(1):330–
374.
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I.,
Zomaya, A. Y., Foufou, S., and Bouras, A. (2014).
A survey of clustering algorithms for big data: Tax-
onomy and empirical analysis. IEEE Transactions on
Emerging Topics in Computing, 2(3):267–279.
Fu, Q., Lou, J.-G., Wang, Y., and Li, J. (2009). Execution
anomaly detection in distributed systems through un-
structured log analysis. In International conference on
Data Mining (full paper). IEEE.
He, P., Zhu, J., He, S., Li, J., and Lyu, M. R. (2016). An eval-
uation study on log parsing and its use in log mining. In
2016 46th Annual IEEE/IFIP International Conference
on Dependable Systems and Networks (DSN), pages
654–661.
Jiang, Z. M., Hassan, A. E., Flora, P., and Hamann, G. (2008).
Abstracting execution logs to execution events for en-
terprise applications (short paper). In 2008 The Eighth
International Conference on Quality Software, pages
181–186.
Makanju, A., Zincir-Heywood, A. N., and Milios, E. E.
(2012). A lightweight algorithm for message type ex-
traction in system application logs. IEEE Transactions
on Knowledge and Data Engineering, 24(11):1921–
1936.
Manning, C. D., Raghavan, P., and Schütze, H. (2008). In-
troduction to Information Retrieval. Cambridge Uni-
versity Press.
Nagappan, M. and Vouk, M. A. (2010). Abstracting log
lines to log event types for mining software system
logs. In 2010 7th IEEE Working Conference on Mining
Software Repositories (MSR 2010), pages 114–117.
Taerat, N., Brandt, J., Gentile, A., Wong, M., and Leang-
suksun, C. (2011). Baler: deterministic, lossless log
message clustering tool. Computer Science - Research
and Development, 26(3):285.
Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduc-
tion to Data Mining, (First Edition). Addison-Wesley
Longman Publishing Co., Inc.
Tang, L., Li, T., and Perng, C.-S. (2011). Logsig: Generating
system events from raw textual logs. In Proceedings of
the 20th ACM International Conference on Information
and Knowledge Management, CIKM ’11, pages 785–
794. ACM.
Tovarnak, D. (2017). Normalization of Unstructured Log
Data into Streams of Structured Event Objects [online].
Dissertation thesis, Masaryk University, Faculty of
Informatics, Brno. Available from <https://is.muni.cz/
th/rjfzq/> [cit. 2019-05-10].
Vaarandi, R. (2003). A data clustering algorithm for mining
patterns from event logs. In Proceedings of the 3rd
IEEE Workshop on IP Operations Management, IPOM
’03, pages 119–126.
Vaarandi, R. (2004). A Breadth-First Algorithm for Mining
Frequent Patterns from Event Logs, pages 293–308.
Springer Berlin Heidelberg.
Vaarandi, R. (2008). Mining event logs with slct and
loghound. In NOMS 2008 - 2008 IEEE Network Opera-
tions and Management Symposium, pages 1071–1074.
Vaarandi, R. and Pihelgas, M. (2015). Logcluster - a data
clustering and pattern mining algorithm for event logs.
In Proceedings of the 2015 11th International Confer-
ence on Network and Service Management (CNSM),
CNSM ’15, pages 1–7. IEEE Computer Society.
Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan, M. I.
(2009). Detecting large-scale system problems by min-
ing console logs. In Proceedings of the ACM SIGOPS
22Nd Symposium on Operating Systems Principles,
SOSP ’09, pages 117–132. ACM.
Yuan, D., Mai, H., Xiong, W., Tan, L., Zhou, Y., and Pasupa-
thy, S. (2010). Sherlog: Error diagnosis by connecting
clues from run-time logs. In Proceedings of the Fif-
teenth Edition of ASPLOS on Architectural Support
for Programming Languages and Operating Systems,
ASPLOS XV, pages 143–154. ACM.
Yuan, D., Park, S., and Zhou, Y. (2012). Characterizing
logging practices in open-source software. In Proceed-
ings of the 34th International Conference on Software
Engineering, ICSE ’12, pages 102–112. IEEE Press.
ICSOFT 2019 - 14th International Conference on Software Technologies
676