DETECTION OF INCOHERENCES IN A TECHNICAL AND NORMATIVE DOCUMENT CORPUS

Susana Martin-Toral, Gregorio I. Sainz-Palmero, Yannis Dimitriadis

2008

Abstract

This paper is focused on the problems and effects generated by the use of a document corpus with mistakes, content incoherences amongst its connected documents and other errors. The problem introduced in this paper is very relevant in any area of human activity when this corpus is used as base element in the relationships between company partners, legal support, etc., and the way in which these incoherences can be detected. These problems can appear in several ways, and the produced effects are different, but a common situation exists in those areas of activity where many linked documents must be generated, managed and updated by different authors. This paper describes some examples of this problem in the case of a technical document corpus used amongst partners, and the solution framework developed for this case. Several types of incoherence have been detected and formulated, connected with problems described in other research areas such as information extraction and retrieval, text mining, document interpretation and others, but all of them have been bounded and introduced from the point of view of document incoherences and their effects, specially in a company context. Finally the computational architecture and methodology uses are described and some initial results of incoherence detection are discussed.

References

  1. Arango, F. (2003). Gestion de inconsistencias en la evolucion e interoperacion de los esquemas conceptuales OO, en el marco formal de OASIS. PhD thesis.
  2. Berry, M. W. (2004). Survey of Text Mining : Clustering, Classification, and Retrieval. Springer.
  3. CARTIF, F. (2006). Gestor documental de normativa (DOCNOR). PROFIT project (PROgrama de Fomento de la Investigacin Tecnologica). Project reference: FIT-350100-2006-272.
  4. Jain, A. K., Duin, R. P. W., and Mao, J. (2000). Statistical pattern recognition: A review. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(1):4-37.
  5. Krulwich, B. and Burkey, C. (1997). The infofinder agent: Learning user interests through heuristic phrase extraction. IEEE Expert: Intelligent Systems and Their Applications, 12(5):22-27.
  6. Mani, I. and Bloedorn, E. (1999). Summarizing similarities and differences among related documents. Information Retrieval, 1(1-2):35-67.
  7. Mannila, H., Toivonen, H., and Verkamo, A. I. (1997). Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259-289.
  8. McCallum, A. K. (1996). Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow.
  9. Mingshan, L. and Ching-to, A. M. (2002). Consistency in performance evaluation reports and medical records. The Journal of Mental Health Policy and Economics, 5(4):191-192.
  10. Nahm, U. Y. (2004). Text mining with information extraction. PhD thesis. Supervisor-Raymond J. Mooney.
  11. OASIS (2007). OASIS. Organization for the Advancement of Structured Information Standards. URL: http://www.oasis-open.org. Last visit: February 2007.
  12. O'Gorman, L. and Kasturi, R. (1995). Document Image Analysis. IEEE Computer Society, Los Alamitos, California, USA.
  13. Otterbacher, J., Radev, D., and Luo, A. (2002). Revisions that improve cohesion in multi-document summaries: A preliminary study. In Proc. of the Workshop on Automatic Summarization (including DUC 2002), pages 27-36. Association for Computational Linguistics.
  14. Ruiz, M. (2002). Sistemas jurdicos y conflictos normativos. Dykinson, Universidad Carlos III de Madrid, Instituto de Derechos Humanos Bartolom de las Casas.
  15. Salton, G., Wong, A., and Yang, C. S. (1975). A vector space model for automatic indexing. Communication of the ACM, 18(11):613-620.
Download


Paper Citation


in Harvard Style

Martin-Toral S., I. Sainz-Palmero G. and Dimitriadis Y. (2008). DETECTION OF INCOHERENCES IN A TECHNICAL AND NORMATIVE DOCUMENT CORPUS . In Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8111-37-1, pages 282-287. DOI: 10.5220/0001699102820287


in Bibtex Style

@conference{iceis08,
author={Susana Martin-Toral and Gregorio I. Sainz-Palmero and Yannis Dimitriadis},
title={DETECTION OF INCOHERENCES IN A TECHNICAL AND NORMATIVE DOCUMENT CORPUS},
booktitle={Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2008},
pages={282-287},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001699102820287},
isbn={978-989-8111-37-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - DETECTION OF INCOHERENCES IN A TECHNICAL AND NORMATIVE DOCUMENT CORPUS
SN - 978-989-8111-37-1
AU - Martin-Toral S.
AU - I. Sainz-Palmero G.
AU - Dimitriadis Y.
PY - 2008
SP - 282
EP - 287
DO - 10.5220/0001699102820287