management projects dramatically. It can also shine a
light on previously undiscovered log file entries as it
allows their exploration in a reasonable time frame.
Additionally frequently logged variables such as
hostnames or users can be identified for further
investigation. These capabilities are highly
interesting for small or medium enterprises that intent
or have to use log management but do not have the
necessary personnel to successfully implement such a
practice. It also is not limited to information
technology security logs but can also be used within
industrial applications to reveal hidden patterns
within such environments. Because of its practical
implementation, LAMaLearner can be introduced
into any relevant system architecture and can handle
large amounts of data by having been designed to
scale out form the beginning. The fact that no
connection to an external provider is necessary and
explanations for labeling decisions can be generated
the same way as explained by Eljasik-Swoboda et al.
(2019) make this software safe to use under strict
privacy legislature like the European Union’s GDPR
(EU, 2016).
In future works we will use LAMaLearner
generated event labeling results as input for time
series anomaly detection and regression computation.
The current version is limited to identify named
entities from single words. As of now, word n-grams
cannot be analyzed. Therefore additional ways to
create a named entity recognition (NER) component
minimizing manual effort in its definition will also be
researched.
REFERENCES
Chang, C., Lin, C., LIBSVM: A library for support vector
machines, ACM Transactions on Intelligent Systems and
Technology, volume 2, issue 3, pp 27:1 –27:27, 2011
Chawla, N. V., Bowyer, K. W., Hall, L. O., Keelmeyer, W.
P. 2002. SMOTE: Synthetic Minority Over-sampling
Technique, Journal of Artificial Intelligence Research,
Issue 16, pp. 321-357
Dean, J., Ghemawat S., 2008. MapReduce: simplified data
processing on large clusters. In Communications of the
ACM issue 51, pp. 107-113.
Dropwizard 2019, Production-ready, out of the box.
https://dropwizard.io Accessed September 12, 2019
Elastic, 2019. Open Source Search & Analytics
Elasticsearch | Elastic https://elastic.co Accessed
September 12, 2019
Eljasik-Swoboda, T., Engel, F., Hemmje, M., 2019. Using
Topic Specific Features for Argument Stance
Recognition. In: Proceedings of the 8
th
international
conference on data science, technology and applications
(DATA 2019), DOI: 10.5220/0007769700130022
Regulation (EU) 2016/679 of the European Parliament and
of the Council of 27 April 2016 on the protection of
natural persons with regard to the processing of
personal data and on the free movement of such data,
and repealing Directive 95/46/EC (General Data
Protection Regulation) (Text with EEA relevance); OJ
L 119, 4.5.2016, p. 1–88;
Gartner, 2011. Gartner Says Solving ‘Big Data’ Challenge
Involves More Than Just Managing Volumes of Data,
http://www.gartner.com/newsroom/id/1731916
Published June 27, 2011 Accessed May 2, 2016
Graylog, 2019. Industry Leading Log Management | Graylog
https://graylog.org Accessed September 12, 2019
IBM, 2019. IBM QRadar SIEM – Overview,
https://www.ibm.com/us-en/marketplace/ibm-qradar-
siem Accessed September 12, 2019
Jurafsky, D., Martin, J. J., 2009. Speech and language
processing. An introduction to natural language
processing, computational linguistics and speech
recognition. 2
nd
edition, Upper Saddle River, N.J.,
London: Pearson Prentice Hall (Prentice Hall series in
aritificial intelligence), pp 761 ff.
Kent, K., Souppaya, M., 2006. Guide to Computer Security
Log Management, Recommendations of the National
Institute of Standards and Technology (NIST), DOI:
10.6028/NIST.SP.800-92
Logentries, 2019. Logentries: Log Management & Analysis
Software Made Easy. https://logentries.com Accessed
September 12, 2019
Loggly, 2019. Log Analysis | Log Management by Loggly
https://loggly.com Accessed September 12, 2019
Singh, D., Reddy, C. K., 2014. A survey on platforms for
big data analytics, Journal of Big Data. DOI:
10.1186/s40537-014-0008-6
Splunk, 2019. SIEM, AIOps, Application Management,
Log Management, Machine Learning, and Compliance.
https://splunk.com Accessed September 12, 2019
Sumo Logic, 2019. Log Management & Security Analysis,
Continuous Intelligence, Sumo Logic.
https://sumologic.com Accessed September 12, 2019
Swift, D., 2010. Successful SIEM and Log Management
Strategies for Audit and Compliance, White Paper
SANS Institute, https://www.sans.org/reading-
room/whitepapers/auditing/paper/33528 Accessed
September 5, 2019
Swoboda, T., Kaufmann, M., Hemmje, M. L., Toward
Cloud-based Classification and Annotation Support,
Proceedings of the 6th International Conference on
Cloud Computing and Services Science (CLOSER
2016) – Volume 2, pp. 131-237, 2016
Teixeira, A., 2017. Get over SIEM event normalization.
https://medium.com/@ateixei/get-over-siem-event-
normalization-595fc36559b4 Accessed Sept. 16, 2019
Varanadi, R., 2003. A Data Clustering Algorithm for
Mining Patterns from Event Logs. In: Proceedings of
the 2003 IEEE Workshop on IP Operations and
Management, ISBN: 0-7803-8199-8
Williams, A. T., Nicolett, M., 2005. Improve IT Security
with Vulnerability Management, Gartner Research ID
G00127481