Taking into account the technical and normative
documentation involved in this work, an attempt to
define and classify incoherence has been introduced,
which could be used in most technical contexts. From
that, each incoherence has been connected with sev-
eral research areas (information extraction and re-
trieval, document analysis, etc.) in order to find
the best way to detect the incoherence by informa-
tion processing. The results obtained by experiments
have allowed us discover several categories of inco-
herences, even some unknown to the domain experts.
The study of new domain and sector documen-
tation could expand and improve the proposed inco-
herence classification, but not all incoherences have
the same relevance or the same importance for the af-
fected sectors. The experimentation in this work has
tried to apply the more suitable techniques to detect
those with the most relevant impact in the affected ar-
eas.
To achieve all these objectives, different docu-
ment representations and comparison techniques have
been applied. In this aspect, a new relevant informa-
tion pattern, repeated in technical documentation, has
been used (N-tuples), allowing the detection of one
of the most important and negative incoherence types
found in technical domains: numerical incoherences.
The interpretation and evaluation of the results
have been developed in both unsupervised and super-
vised ways, in this latter case, with the help of the
domain expert. From this evaluation, different levels
of incoherence have been detected:
- Some experimentalresults haveshownstrange be-
haviours, and therefore the presence of potential
incoherences. A deeper study could be needed to
detect specific incoherences. These results are ob-
tained by classification or clustering methods.
- Other results have directly shown potential cases
of incoherences, and the help of the domain expert
is only needed to ensure that the problem exists.
This is the case of incoherences of wrongly coded
and non-existent norms, structural, or content in-
coherences using VSM, and numerical measures
and attribute incoherences using 4-tuples.
Due to the existence of incoherence and its neg-
ative effects, for both to organization and citizens,
future work could deal with the definition of a new
methodology for the generation of new documenta-
tion free of incoherences, to avoid the initial seed of
the problem.
ACKNOWLEDGEMENTS
This work has been supported in part by the Spanish
Industry, Tourism, and Commerce Ministry through
the project FIT-350100-2006-272.
REFERENCES
Arango, F. (2003). Gestion de inconsistencias en la evolu-
cion e interoperacion de los esquemas conceptuales
OO, en el marco formal de OASIS. PhD thesis.
Berry, M. W. (2004). Survey of Text Mining : Clustering,
Classification, and Retrieval. Springer.
CARTIF, F. (2006). Gestor documental de normativa
(DOCNOR). PROFIT project (PROgrama de Fo-
mento de la Investigacin Tecnologica). Project refer-
ence: FIT-350100-2006-272.
Jain, A. K., Duin, R. P. W., and Mao, J. (2000). Statistical
pattern recognition: A review. IEEE Trans. on Pattern
Analysis and Machine Intelligence, 22(1):4–37.
Krulwich, B. and Burkey, C. (1997). The infofinder agent:
Learning user interests through heuristic phrase ex-
traction. IEEE Expert: Intelligent Systems and Their
Applications, 12(5):22–27.
Mani, I. and Bloedorn, E. (1999). Summarizing similarities
and differences among related documents. Informa-
tion Retrieval, 1(1-2):35–67.
Mannila, H., Toivonen, H., and Verkamo, A. I. (1997). Dis-
covery of frequent episodes in event sequences. Data
Min. Knowl. Discov., 1(3):259–289.
McCallum, A. K. (1996). Bow: A toolkit for statistical lan-
guage modeling, text retrieval, classification and clus-
tering. http://www.cs.cmu.edu/ mccallum/bow.
Mingshan, L. and Ching-to, A. M. (2002). Consistency in
performance evaluation reports and medical records.
The Journal of Mental Health Policy and Economics,
5(4):191–192.
Nahm, U. Y. (2004). Text mining with information extrac-
tion. PhD thesis. Supervisor-Raymond J. Mooney.
OASIS (2007). OASIS. Organization for the Advance-
ment of Structured Information Standards. URL:
http://www.oasis-open.org. Last visit: February 2007.
O’Gorman, L. and Kasturi, R. (1995). Document Image
Analysis. IEEE Computer Society, Los Alamitos, Cal-
ifornia, USA.
Otterbacher, J., Radev, D., and Luo, A. (2002). Revisions
that improve cohesion in multi-document summaries:
A preliminary study. In Proc. of the Workshop on Au-
tomatic Summarization (including DUC 2002), pages
27–36. Association for Computational Linguistics.
Ruiz, M. (2002). Sistemas jurdicos y conflictos normativos.
Dykinson, Universidad Carlos III de Madrid, Instituto
de Derechos Humanos Bartolom de las Casas.
Salton, G., Wong, A., and Yang, C. S. (1975). A vector
space model for automatic indexing. Communication
of the ACM, 18(11):613–620.
DETECTION OF INCOHERENCES IN A TECHNICAL AND NORMATIVE DOCUMENT CORPUS
287