A STUDY ON ALIGNING DOCUMENTS USING THE CIRCLE OF INTEREST TECHNIQUE

Daniel Joseph, César A. Marín

2010

Abstract

In this paper we present a study on applying a technique called Circle of Interest, along with Formal Concept Analysis and Rough Set Theory to semantically align documents such as those found in a business domain. Indeed, when companies try to engage in business it becomes crucial to keep the semantics when exchanging information usually known as a business document. Typical approaches are not practical or require a high cost to implement. In contrast, we consider the concepts and their relationships discovered within an exchanged business document to find automatically an alignment to a local interpretation known as a document type. We present experimental results on applying Formal Concept Analysis as the ontological representation of documents, the Circle of Interest for selecting the most relevant document types to choose from, and Rough Set Theory for discerning among them. The results on a set of business documents show the feasibility of our approach and its direct application to a business domain.

References

  1. Bao, H. T. (1999). Formal Concept Analysis and Rough Set Theory in Clustering. In The Mathematical Foundation of Informatics. World Scientific Publishing.
  2. Chalupsky, H. (2000). OntoMorph: A Translation System for Symbolic Knowledge. In Principles of Knowledge Representation and Reasoning, pages 471--482.
  3. Cui, X. and Potok, T. E. (2006). A Distributed Agent Implementation of Multiple Species Flocking Model for Document Partitioning Clustering. In Cooperative Information Agents, volume 4149 of Lecture Notes in Artificial Intelligence, pages 124-137, Heilderberg. Springer-Verlag.
  4. Dou, D., McDermott, D., and Qi, P. (2006). Ontology Translation by Ontology Merging and Automated Reasoning. In Tamma, V., Cranefield, S., Finin, T. W., and Willmott, S., editors, Ontology for Agents: Theory and Experiences, Whitestein Series in Software Agent Technologies and Autonomic Computing, pages 73-94. Birkhäuser, Basel.
  5. Geng, L., Korba, L., Wang, Y., Wang, X., and You, Y. (2008). Finding Topics in Email Using Formal Concept Analysis and Fuzzy Membership Functions. In Advances in Artificial Intelligence: 21st Conference of the Canadian Society for Computational Studies of Intelligence, volume 5032 of Lecture Notes in Artificial Intelligence , pages 108-113, Heilderberg. Springer-Verlag.
  6. Laclavík, M., S? eleng, M., and HluchÉ, L. (2008). Towards Large Scale Semantic Annotation Built on MapReduce Architecture. In ICCS 7808: 8th International Conference on Computational Science Part III, pages 331-338, Berlin, Heidelberg. Springer-Verlag.
  7. OASIS (2001). ebXML Technical Architecture Specification. Technical report, ebXML.
  8. Pawlak, Z. (1982). Rough sets. International Journal of Information and Computer Sciences, 11:341-356.
  9. Priss, U. (2006). Formal Concept Analysis in information science. Annual Review of Information Science and Technology, 40.
  10. Scerri, S., Davis, B., and Handschuh, S. (2007). Improving Email Conversation Efficiency through Semantically Enhanced Email. In Proceedings of the 18th International Conference on Database and Expert Systems Applications, pages 490-494, Washington. IEEE Computer Society.
  11. Scerri, S., Davis, B., and Handschuh, S. (2009). Semanta Supporting E-mail Workflows in Business Processes. In Proceedings of the 2009 IEEE Conference on Commerce and Enterprise Computing, pages 483- 484, Washington. IEEE Computer Society.
  12. Stumme, G. and Maedche, A. (2001). FCA-MERGE: Bottom-Up Merging of Ontologies. In IJCAI, pages 225-234.
  13. UN/CEFACT (2003). Core Components Technical Specification - Part 8 of the ebXML Framework. Technical report, UN/CEFACT.
  14. Wang, L. and Liu, X. (2008). A New Model of Evaluating Concept Similarity. Knowledge-Based Systems, 21(8):842-846.
  15. Wermter, S. and Hung, C. (2002). Selforganizing classification on the Reuters news corpus. In Proceedings of the 19th international conference on Computational linguistics, pages 1-7, Morristown, USA. Association for Computational Linguistics.
  16. Wille, R. (2005). Formal Concept Analysis as Mathematical Theory of Concepts and Concept Hierarchies. In Ganter, B., Stumme, G., and Wille, R., editors, Formal Concept Analysis: Foundations and Applications, Lecture Notes on Artificial Intelligence 3626. Springer-Verlag, Heilderberg.
  17. Zdzislaw (1997). Rough set approach to knowledge-based decision support. European Journal of Operational Research, 99:48-57.
  18. Zhao, Y., Wang, X., and Halang, W. (2006). Ontology Mapping based on Rough Formal Concept Analysis. In Proceedings of the Advanced International Conference on Telecommunications and International Conference on Internet and Web Applications and Services. IEEE.
Download


Paper Citation


in Harvard Style

Joseph D. and A. Marín C. (2010). A STUDY ON ALIGNING DOCUMENTS USING THE CIRCLE OF INTEREST TECHNIQUE . In Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT, ISBN 978-989-8425-23-2, pages 374-383. DOI: 10.5220/0002965003740383


in Bibtex Style

@conference{icsoft10,
author={Daniel Joseph and César A. Marín},
title={A STUDY ON ALIGNING DOCUMENTS USING THE CIRCLE OF INTEREST TECHNIQUE},
booktitle={Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT,},
year={2010},
pages={374-383},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002965003740383},
isbn={978-989-8425-23-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT,
TI - A STUDY ON ALIGNING DOCUMENTS USING THE CIRCLE OF INTEREST TECHNIQUE
SN - 978-989-8425-23-2
AU - Joseph D.
AU - A. Marín C.
PY - 2010
SP - 374
EP - 383
DO - 10.5220/0002965003740383