1
10
100
1000
10000
100000
1000000
10000000
100000000
1000000000
12481632641282565121024
Page size
I/O operations (pages)
ART
C4.5
CN2 - STAR
CN2 - DL
RIPPER
Naive Bayes
Figure 3: Number of disk pages read by each algorithm for
different page sizes. The page size indicates the number of
training examples each page contains.
However, while ART I/O cost is bound by the clas-
sifier complexity, CN2 and RIPPER performance is
determined by the search space they explore.
In summary, ART classifiers exhibit excellent
scalability properties, which make them suitable for
data mining problems. They provide a well-behaved
alternative to decision tree learners where rule and de-
cision list inducers do not work in practice.
5 CONCLUSIONS
In this paper, we have shown how AspectJ, an aspect-
oriented extension for the Java programming lan-
guage, can be used in real-world applications to pro-
vide fine-grained performance evaluation and moni-
toring capabilities. This unintrusive technique avoids
the inadvertent insertion of bugs into the system under
evaluation. It also frees developers from the burden
of introducing scattered code to do their performance
evaluation and monitoring work.
Finally, we have described how our proposed ap-
proach can be employed for evaluating the I/O cost
associated to some data mining techniques. In our ex-
periments, we have witnessed how associative clas-
sifiers such as ART possess good scalability proper-
ties. In fact, the efficient association rule mining algo-
rithms underlying ART make it orders of magnitude
more efficient than alternative rule and decision list
inducers, whose I/O requirements heavily constrain
their use in real-world situations unless sampling is
employed. Moreover, we have confirmed that the ad-
ditional cost required by ART, when compared to de-
cision tree learners such as C4.5, is reasonable if we
take into account the desirable properties of the clas-
sification models it helps us obtain, thus making of
associative classifiers a viable alternative to standard
decision tree learners, the most common classifiers in
data mining tools nowadays.
ACKNOWLEDGEMENTS
Work partially supported by research project
TIN2006-07262.
REFERENCES
Berzal, F., Cubero, J. C., S´anchez, D., and Serrano, J. M.
(2004). ART: A hybrid classification model. Machine
Learning, 54(1):67–92.
Blake, C. and Merz, C. (1998). UCI repository
of machine learning databases. Available at
http://www.ics.uci.edu/∼mlearn/MLRepository.html.
Clark, P. and Boswell, R. (1991). Rule induction with CN2:
Some recent improvements. In EWSL, pages 151–163.
Cohen, W. W. (1995). Fast effective rule induction. In
Prieditis, A. and Russell, S., editors, Proc. of the
12th International Conference on Machine Learning,
pages 115–123, Tahoe City, CA. Morgan Kaufmann.
F¨urnkranz, J. and Widmer, G. (1994). Incremental reduced
error pruning. In ICML, pages 70–77.
Gehrke, J., Ramakrishnan, R., and Ganti, V. (2000). Rain-
forest - a framework for fast decision tree construc-
tion of large datasets. Data Mining and Knowledge
Discovery, 4(2/3):127–162.
Gradecki, J. D. and Lesiecki, N. (2003). Mastering AspectJ:
Aspect-Oriented Programming in Java. Wiley.
Kiczales, G., Hilsdale, E., Hugunin, J., Kersten, M., Palm,
J., and Griswold, W. G. (2001). Getting started with
aspectj. Communications of the ACM, 44(10):59–65.
Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C.,
Lopes, C. V., Loingtier, J.-M., and Irwin, J. (1997).
Aspect-oriented programming. In ECOOP’97: 11th
European Conference on Object-Oriented Program-
ming, LNCS 1241, pages 220–242.
Laddad, R. (2003). AspectJ in Action: Practical Aspect-
Oriented Programming. Manning Publications.
Quinlan, J. R. (1986). Induction of decision trees. Machine
Learning, 1(1):81–106.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learn-
ing. Morgan Kaufmann.
ICSOFT 2008 - International Conference on Software and Data Technologies
262