Algorithms for Telemetry Data Mining using Discrete Attributes

Roy B. Ofer, Adi Eldar, Adi Shalev, Yehezkel S. Resheff

2017

Abstract

As the cost of collecting and storing large amounts of data continues to drop, we see a constant rise in the amount of telemetry data collected by software applications and services. With the data mounding up, there is an increasing need for algorithms to automatically and efficiently mine insights from the collected data. One interesting case is the description of large tables using frequently occurring patterns, with implications for failure analysis and customer engagement. Finding frequently occurring patterns has applications both in an interactive usage where an analyst repeatedly query the data and in a completely automated process queries the data periodically and generate alerts and or reports based on the mining. Here we propose two novel mining algorithms for the purpose of computing such predominant patterns in relational data. The first method is a fast heuristic search, and the second is based on an adaptation of the apriori algorithm. Our methods are demonstrated on real-world datasets, and extensions to some additional fundamental mining tasks are discussed.

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications, volume 27. ACM.
  2. Agrawal, R., Srikant, R., et al. (1994). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487- 499.
  3. Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., et al. (2010). A view of cloud computing. Communications of the ACM, 53(4):50-58.
  4. Couto, J. (2005). Kernel k-means for categorical data. In Advances in Intelligent Data Analysis VI, pages 46- 56. Springer.
  5. El Gebaly, K., Agrawal, P., Golab, L., Korn, F., and Srivastava, D. (2014). Interpretable and informative explanations of outcomes. Proceedings of the VLDB Endowment, 8(1):61-72.
  6. Hegland, M. (2005). The apriori algorithm-a tutorial. Mathematics and computation in imaging science and information processing, 11:209-262.
  7. Parsons, L., Haque, E., and Liu, H. (2004). Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter, 6(1):90-105.
  8. Purdom, P. W., Van Gucht, D., and Groth, D. P. (2004). Average-case performance of the apriori algorithm. SIAM Journal on Computing, 33(5):1223-1260.
  9. Qian, L., Luo, Z., Du, Y., and Guo, L. (2009). Cloud computing: an overview. In Cloud computing, pages 626- 631. Springer.
  10. Roy, S., König, A. C., Dvorkin, I., and Kumar, M. (2015). Perfaugur: Robust diagnostics for performance anomalies in cloud services. ICDE - 31st International Conference on Data Engineering.
  11. Tan, P.-N., Steinbach, M., Kumar, V., et al. (2006). Introduction to data mining, volume 1. Pearson Addison Wesley Boston.
  12. Vaarandi, R. et al. (2003). A data clustering algorithm for mining patterns from event logs. In Proceedings of the 2003 IEEE Workshop on IP Operations and Management (IPOM), pages 119-126.
  13. Vidal, R. (2010). A tutorial on subspace clustering. IEEE Signal Processing Magazine, 28(2):52-68.
  14. Wilder, B. (2012). Cloud architecture patterns: using microsoft azure. ” O'Reilly Media, Inc.”.
  15. Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan, M. I. (2009). Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 117-132. ACM.
  16. Xu, W., Huang, L., Fox, A., Patterson, D. A., and Jordan, M. I. (2008). Mining console logs for large-scale system problem detection. SysML, 8:4-4.
Download


Paper Citation


in Harvard Style

B. Ofer R., Eldar A., Shalev A. and S. Resheff Y. (2017). Algorithms for Telemetry Data Mining using Discrete Attributes . In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-222-6, pages 309-317. DOI: 10.5220/0006117903090317


in Bibtex Style

@conference{icpram17,
author={Roy B. Ofer and Adi Eldar and Adi Shalev and Yehezkel S. Resheff},
title={Algorithms for Telemetry Data Mining using Discrete Attributes},
booktitle={Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2017},
pages={309-317},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006117903090317},
isbn={978-989-758-222-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Algorithms for Telemetry Data Mining using Discrete Attributes
SN - 978-989-758-222-6
AU - B. Ofer R.
AU - Eldar A.
AU - Shalev A.
AU - S. Resheff Y.
PY - 2017
SP - 309
EP - 317
DO - 10.5220/0006117903090317