Authors:
Roy B. Ofer
1
;
Adi Eldar
1
;
Adi Shalev
2
and
Yehezkel S. Resheff
3
Affiliations:
1
Microsoft ILDC, Israel
;
2
Hebrew University of Jerusalem, Israel
;
3
Hebrew University, Israel
Keyword(s):
Data Mining, Pattern Mining, Software Telemetry, Failure Analysis, Subspace Clustering.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Clustering
;
Data Engineering
;
Information Retrieval
;
Learning in Process Automation
;
Learning of Action Patterns
;
Ontologies and the Semantic Web
;
Pattern Recognition
;
Software Engineering
;
Theory and Methods
Abstract:
As the cost of collecting and storing large amounts of data continues to drop, we see a constant rise in the amount of telemetry data collected by software applications and services. With the data mounding up, there is an increasing need for algorithms to automatically and efficiently mine insights from the collected data. One interesting case is the description of large tables using frequently occurring patterns, with implications for failure analysis and customer engagement. Finding frequently occurring patterns has applications both in an interactive usage where an analyst repeatedly query the data and in a completely automated process queries the data periodically and generate alerts and or reports based on the mining. Here we propose two novel mining algorithms for the purpose of computing such predominant patterns in relational data. The first method is a fast heuristic search, and the second is based on an adaptation of the apriori algorithm. Our methods are demonstrated on re
al-world datasets, and extensions to some additional fundamental mining tasks are discussed.
(More)