Rules outperformed our method. Outstanding
performance in the “next action prediction” task and
average results in anomaly detection mean that
proposed method very precisely guesses the most
expected action, but not enough accurately estimates
the set of all expected actions (that Seq-EM and A-
Rules do). It means that the mechanism of
probabilities estimation used in the decision tree
algorithm (9) is not perfect for the anomaly
detection task. In the future research we will check
the anomaly detection ability of the proposed
approach with other probabilistic multi-class
classification algorithms, e.g. with kernel methods
(Hastie, 2001), and we hope to obtain outperforming
results in this scenario as well.
5 CONCLUSIONS
The main contributions of this paper can be
summarized as following:
1. New type of data source for user behavior
modeling has been considered. This is the database
access log consisting of traces of SQL queries
executed by users. It is promising information
source because the major part of modern software
systems use relational databases as information
storage, and usually all critical user actions leave a
trace in database access logs.
2. Simple but effective procedure for translating
SQL traces structures into a finite alphabet of
symbols has been proposed. It allows analyzing
database access log data with traditional data mining
techniques such as sequential mining and association
rules mining methods.
3. Novel method for mining probabilistic user
behavior models has been formulated. Unlike other
existing data mining methods it incorporates time
feature in the user model. The empirical feature
map, motivated by potential functions theory, has
been proposed for that. Combining this feature map
with decision tree algorithm we obtain new method
with following advantages: it is precise enough; it
takes into account time intervals between user
actions; it gives understandable for a human expert
interpretation of generated behavior models in the
form of “IF…THEN” rules.
4. Experimental performance evaluation on real-
world data has been conducted. It has demonstrated
that database access logs can be successfully used
for user behavior modeling and reliable models can
be constructed. In these experiments, our proposed
method has demonstrated outstanding results in the
“next action prediction” scenario and competitive
results in “anomaly detection” scenario.
ACKNOWLEDGEMENTS
This research is supported by grant of RFFI (Russian
Foundation for Basic Research) # 05-01-00744 and
by grant of the President of Russian Federation MK-
2111.2005.9.
REFERENCES
Aizerman, M.A., Braverman, E.M., & Rozonoer, L.I.,
(1970). Method of Potential Functions in the Theory of
Learning Machines. Nauka, Moscow (in Russian).
Dan, P., Yu, S. & Chung, J.-Y. (1995). Characterization
of database access pattern for analytic prediction of
buffer hit probability. VLDB J., 4(1):127--154.
Debar, H., Becke, M. & Siboni, D. (1992). A neural
network component for an intrusion detection system.
In IEEE Symp. on Security and Privacy, pp. 240--250.
Ghosh, A., Schwartzbard, A. & Schatz, M. (1999).
Learning Program Behavior for Intrusion Detection. In
1th USENIX Workshop on Intrusion Detection and
Network Monitoring. Florida, CA.
Hastie, T. (2001). The Elements of Statistical Learning,
Springer, New York.
Lee, W. & Stolfo, S. (1998). Data mining approaches for
intrusion detection. In 7th USENIX Security
Symposium (SECURITY'98).
Liu, B., Hsu, W. & Ma, Y. (1998). Integrating
classification and association rule mining. In 4th Int.
Conf. on KDD and Data Mining, pages 80–96.
Manavoglu, E., Pavlov, D. & Giles, C. (2003).
Probabilistic User Behavior Models. In IEEE Int.
Conf. on Data Mining (ICDM-03). Melbourne, FL.
Maxion, R. & Roberts, R. (2004). Proper Use of ROC
Curves in Intrusion/Anomaly Detection, Tech. report
CS-TR-871, University of Newcastle upon Tyne.
Piatetsky-Shapiro, G., Fayyad, U., Smyth, P. &
Uthurusamy, R. (1996). Advances in Knowledge
Discovery and Data Mining, AAAI Press/MIT Press.
Quinlan, J. (1987). Generating production rules from
decision trees. In 10th International Joint Conference
on Artificial Intelligence, pp. 304--307.
Sarwar, B., Karypis, G., Konstan, J. & Riedl, J. (2001).
Item-based Collaborative Filtering Recommendation
Algorithms. In 10th International World Wide Web
Conference, pp. 285-295
Tang, Z.-H. & MacLennan, J. (2005). Data Mining with
SQL Server 2005, Wiley Publishing.
Valeur, F., Mutz, D. & Vigna, G. (2005). A Learning-
Based Approach to the Detection of SQL Attacks. In
IEEE Conf. on Detection of Intrusions and Malware &
Vulnerability Assessment, pp. 123-140.
ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES
78