terns from both runs to include the proportion differ-
ence of each pattern between the two tables.
Finally, the scoring and choosing of the winning
segment(s) remains unchanged (however, the score
functions best suited for this task are naturally slightly
different than the single table case). We defer the rest
of the details of this variant to further research.
7 CONCLUSION
This paper describes the mining task of finding dense
segments in application and service telemetry data,
corresponding to interesting regions to be further an-
alyzed by the user. We propose a novel heuristic
method that locally searches for segments in order
to optimize a segment scoring function, as well as
an adaptation of the apriori algorithm guaranteed to
find all frequent segments, rank and filter them ac-
cording to the scoring. Requiring only lenient con-
straints from the scoring function leaves a relatively
large degree of freedom for score variants and allow
an easy way of customizing the end results for the
specific mining task without changing the algorithms
themselves.
The main contribution of this paper is in defining
and solving the mining task, which helps close the gap
between the reality of increasing amounts of data be-
ing collected on the one hand, and the relative lack
of tools to automatically and efficiently mine it on
the other. The two methods demonstrate the tradeoff
between a heuristic fast search approach and a com-
prehensive and potentially worst-case exponential ap-
proach. In practice, as shown in the experiments, both
methods are applicable for real-world telemetry min-
ing when combined with the right pre-processing.
REFERENCES
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P.
(1998). Automatic subspace clustering of high dimen-
sional data for data mining applications, volume 27.
ACM.
Agrawal, R., Srikant, R., et al. (1994). Fast algorithms for
mining association rules. In Proc. 20th int. conf. very
large data bases, VLDB, volume 1215, pages 487–
499.
Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz,
R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A.,
Stoica, I., et al. (2010). A view of cloud computing.
Communications of the ACM, 53(4):50–58.
Couto, J. (2005). Kernel k-means for categorical data. In
Advances in Intelligent Data Analysis VI, pages 46–
56. Springer.
El Gebaly, K., Agrawal, P., Golab, L., Korn, F., and Srivas-
tava, D. (2014). Interpretable and informative expla-
nations of outcomes. Proceedings of the VLDB En-
dowment, 8(1):61–72.
Hegland, M. (2005). The apriori algorithm–a tutorial.
Mathematics and computation in imaging science and
information processing, 11:209–262.
Parsons, L., Haque, E., and Liu, H. (2004). Subspace clus-
tering for high dimensional data: a review. ACM
SIGKDD Explorations Newsletter, 6(1):90–105.
Purdom, P. W., Van Gucht, D., and Groth, D. P. (2004).
Average-case performance of the apriori algorithm.
SIAM Journal on Computing, 33(5):1223–1260.
Qian, L., Luo, Z., Du, Y., and Guo, L. (2009). Cloud com-
puting: an overview. In Cloud computing, pages 626–
631. Springer.
Roy, S., K
¨
onig, A. C., Dvorkin, I., and Kumar, M.
(2015). Perfaugur: Robust diagnostics for perfor-
mance anomalies in cloud services. ICDE - 31st In-
ternational Conference on Data Engineering.
Tan, P.-N., Steinbach, M., Kumar, V., et al. (2006). Intro-
duction to data mining, volume 1. Pearson Addison
Wesley Boston.
Vaarandi, R. et al. (2003). A data clustering algorithm for
mining patterns from event logs. In Proceedings of the
2003 IEEE Workshop on IP Operations and Manage-
ment (IPOM), pages 119–126.
Vidal, R. (2010). A tutorial on subspace clustering. IEEE
Signal Processing Magazine, 28(2):52–68.
Wilder, B. (2012). Cloud architecture patterns: using mi-
crosoft azure. ” O’Reilly Media, Inc.”.
Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan,
M. I. (2009). Detecting large-scale system problems
by mining console logs. In Proceedings of the ACM
SIGOPS 22nd symposium on Operating systems prin-
ciples, pages 117–132. ACM.
Xu, W., Huang, L., Fox, A., Patterson, D. A., and Jordan,
M. I. (2008). Mining console logs for large-scale sys-
tem problem detection. SysML, 8:4–4.
Algorithms for Telemetry Data Mining using Discrete Attributes
317