Table 3: Average attribute dependency in the data size scenario.
Time Span from, target ⇒ size from, target, action ⇒ size target ⇒ size
Full year 0,855 0,860 0,621
February original 0,904 0,908 0,666
February modified 0,928 0,930 0,725
April original 0,830 0,835 0,596
April modified 0,869 0,872 0,683
combinations. We therefore conclude that for all three
investigated anomaly scenarios, a choice for attribute
removal based on the rough set analysis would have
not interfered with anomalies being given in the sys-
tem.
6 CONCLUSION AND FUTURE
WORK
We present rough logs, a concept for reducing log data
(factor 1 : 10, 000) used by large scale anomaly detec-
tion facilities. Our exhaustive tests with a real-world
data set showed that the proposed method gives a sta-
ble indication about which attribute in log data are
most likely to be redundant. This even holds when
the data starts to contain signs of a system anomaly.
Given this stability, it seems to be save to remove
attributes from the high dependency class in online
anomaly detection approaches, since they provide no
benefit for the representation of system behavior.
The advantage of the proposed method lays in its
generality. Due to the sound theoretical foundation,
it can be applied to any kind of system event proto-
col that boils down to attributes and their symbolic
values. The semantics of the system are only rele-
vant once, for choosing candidate attribute combina-
tions to be tested for their dependency. After that, the
whole approach is data-agnostic.
The experimental results are most likely to change
when the amount and kind of original log data
changes. It would be therefore interesting to perform
the same kind of use case study with other sets of log
data.
Depending on the nature of the data, it may also be
possible to use the rough logs idea as direct anomaly
detection approach. The underlying assumption here
would be that attribute dependencies may change sig-
nificantly when the system anomaly is impacting the
log data strong enough. In this case, it would be pos-
sible to treat a change in attribute dependencies as
warning sign for structural problems in the system,
maybe due to security or availability incidents. We
plan to investigate this possibility in our future work,
by investigating more anomaly scenarios and differ-
ent log data sets.
REFERENCES
Bose, R. P. J. C. and van der Aalst, W. M. P. (2013). Dis-
covering signature patterns from event logs. In IEEE
Symposium on Computational Intelligence and Data
Mining, CIDM 2013, Singapore, 16-19 April, 2013,
pages 111–118.
Chandola, V., Banerjee, A., and Kumar, V. (2009).
Anomaly detection: A survey. ACM Comput. Surv.,
41(3):15:1–15:58.
Cheng, F., Sapegin, A., Gawron, M., and Meinel, C. (2015).
Analyzing boundary device logs on the in-memory
platform. In 17th IEEE International Conference on
High Performance Computing and Communications,
HPCC 2015, 7th IEEE International Symposium on
Cyberspace Safety and Security, CSS 2015, and 12th
IEEE International Conference on Embedded Soft-
ware and Systems, ICESS 2015, New York, NY, USA,
August 24-26, 2015, pages 1367–1372. IEEE.
Edgeworth, F. (1887). Xli. on discordant observations. The
London, Edinburgh, and Dublin Philosophical Maga-
zine and Journal of Science, 23(143):364–375.
Fronza, I., Sillitti, A., Succi, G., Terho, M., and Vlasenko,
J. (2013). Failure prediction based on log files using
random indexing and support vector machines. Jour-
nal of Systems and Software, 86(1):2–11.
Hellerstein, J. L., Ma, S., and Perng, C. (2002). Discov-
ering actionable patterns in event data. IBM Systems
Journal, 41(3):475–493.
Jaeger, D., Azodi, A., Cheng, F., and Meinel, C.
(2015). Normalizing security events with a hierar-
chical knowledge base. In WISTP, volume 9311 of
Lecture Notes in Computer Science, pages 237–248.
Springer.
Liang, Y., Zhang, Y., Sivasubramaniam, A., Sahoo, R. K.,
Moreira, J., and Gupta, M. (2005). Filtering fail-
ure logs for a bluegene/l prototype. In 2005 Inter-
national Conference on Dependable Systems and Net-
works (DSN’05), pages 476–485.
Ma, S. and Hellerstein, J. L. (2002). Mining Partially Pe-
riodic Patterns With Unknown Periods From Event
Stream. In Chen, D. and Cheng, X., editors, Pat-
tern Recognition and String Matching, pages 353–
377. Springer US, Boston, MA.
Oliner, A. and Stearley, J. (2007). What Supercomputers
Say: A Study of Five System Logs. In IEEE Proceed-
ings of International Conference on Dependable Sys-
Rough Logs: A Data Reduction Approach for Log Files
301