A TRANSACTIONAL APPROACH TO ASSOCIATIVE XML CLASSIFICATION BY CONTENT AND STRUCTURE

Gianni Costa, Riccardo Ortale, Ettore Ritacco

Abstract

We propose XCCS, which is short for XML Classification by Content and Structure, a new approach for the induction of intelligible classification models for XML data, that are a valuable support for more effective and efficient XML search, retrieval and filtering. The idea behind XCCS is to represent each XML document as a transaction in a space of boolean features, that are informative of its content and structure. Suitable algorithms are developed to learn associative classifiers from the transactional representation of the XML data. XCCS induces very compact classifiers with outperforming effectiveness compared to several established competitors.

References

  1. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. In Proc. of Int. Conf. on Very Large Data Bases, pages 487 - 499.
  2. Arunasalam, B. and Chawla, S. (2006). CCCS: A topdown association classifier for imbalanced class distribution. In Proc. of Int. Conf. on Knowledge Discovery and Data Mining, pages 517-522.
  3. Baker, L. and McCallum, A. (1998). Distributional clustering of words for text classification. In Proc. of ACM Int. Conf. on Research and Development in Information Retrieval, pages 96 - 103.
  4. Coenen, F. (2004). LUCS KDD implementations of CBA and CMAR. Dpt of Computer Science, University of Liverpool - www.csc.liv.ac.uk/ frans/KDD/Software/.
  5. de Campos, L., Fernández-Luna, J., Huete, J., and Romero, A. (2008). Probabilistic methods for structured document classification at inex'07. In Proc. of INitiative for the Evaluation of XML Retrieval, pages 195 - 206.
  6. Denoyer, L. and Gallinari, P. (2007). Report on the xml mining track at inex 2005 and inex 2006. ACM SIGIR Forum, 41(1):79 - 90.
  7. Denoyer, L. and Gallinari, P. (2008). Report on the xml mining track at inex 2007. ACM SIGIR Forum, 42(1):22 - 28.
  8. Garboni, C., Masseglia, F., and Trousse, B. (2006). Sequential pattern mining for structure-based xml document classification. In Proc. of the INitiative for the Evaluation of XML Retrieval, pages 458 - 468.
  9. Knijf, J. D. (2007). Fat-cat: Frequent attributes tree based classification. In Proc. of the INitiative for the Evaluation of XML Retrieval, pages 485 - 496.
  10. Li, W., Han, J., and Pei, J. (2001). CMAR: Accurate and efficient classification based on multiple classassociation rules. In Proc. of Int. Conf. on Data Mining, pages 369 - 376.
  11. Liu, B., Hsu, W., and Ma, Y. (1998). Integrating classification and association rule mining. In Proc. of Conf. on Kwnoledge Discovery and Data Mining, pages 80-86.
  12. Liu, B., Ma, Y., and Wong, C. (2000). Improving an association rule based classifier. In Proc. of Int. Conf. on Principles of Data Mining and Knowledge Discovery, pages 504 - 509.
  13. Manning, C., Raghavan, P., and Sch├╝tze., H. (2008). Introduction to Information Retrieval. Cambridge University Press.
  14. Murugeshan, M., Lakshmi, K., and Mukherjee, S. (2008). A categorization approach for wikipedia collection based on negative category information and initial descriptions. In Proc. of the INitiative for the Evaluation of XML Retrieval.
  15. Ning, P., Steinbach, M., and Kumar, V. (2006). Introduction to Data Mining. Addison Wesley.
  16. Theobald, M., Schenkel, R., and Weikum, G. (2003). Exploiting structure, annotation, and ontological knowledge for automatic classification of xml data. In Proc. of WebDB Workshop, pages 1 - 6.
  17. Xin, X. and Han, J. (2003). CPAR: Classification based on predictive association rules. In Proc. of SIAM Int. Conf. on Data Mining, pages 331-335.
  18. Xing, G., Guo, J., and Xia, Z. (2007). Classifying xml documents based on structure/content similarity. In Proc. of the INitiative for the Evaluation of XML Retrieval, pages 444 - 457.
  19. Yang, J. and Zhang, F. (2008). Xml document classification using extended vsm. In Proc. of the INitiative for the Evaluation of XML Retrieval, pages 234 - 244.
  20. Yi, J. and Sundaresan, N. (2000). A classifier for semistructured documents. In Proc. of Int. Conf. on Knowledge Discovey and Data Mining, pages 340 - 344.
  21. Yong, S., Hagenbuchner, M., Tsoi, A., Scarselli, F., and Gori, M. (2007). Xml document mining using graph neural network. In Proc. of the INitiative for the Evaluation of XML Retrieval, pages 458 - 472.
  22. Zaki, M. and Aggarwal, C. (2006). Xrules: An effective algorithm for structural classification of xml data. Machine Learning, 62(1-2):137-170.
Download


Paper Citation


in Harvard Style

Costa G., Ortale R. and Ritacco E. (2011). A TRANSACTIONAL APPROACH TO ASSOCIATIVE XML CLASSIFICATION BY CONTENT AND STRUCTURE . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 104-113. DOI: 10.5220/0003662401040113


in Bibtex Style

@conference{kdir11,
author={Gianni Costa and Riccardo Ortale and Ettore Ritacco},
title={A TRANSACTIONAL APPROACH TO ASSOCIATIVE XML CLASSIFICATION BY CONTENT AND STRUCTURE},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={104-113},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003662401040113},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - A TRANSACTIONAL APPROACH TO ASSOCIATIVE XML CLASSIFICATION BY CONTENT AND STRUCTURE
SN - 978-989-8425-79-9
AU - Costa G.
AU - Ortale R.
AU - Ritacco E.
PY - 2011
SP - 104
EP - 113
DO - 10.5220/0003662401040113