tics, matching of document contents based on well-
known techniques such as the Levenshtein distance or
the Cosine similarity, and a supervised learning pro-
cedure based on a neuro-fuzzy system.
Synthetic and real documents summarized by 4-
tuples, and matching using the similarity criterion de-
scribed in the previous section, were used as inputs of
the neuro-fuzzy system for detecting incoherences.
The experiments have shown that the system is
able to cope with most cases of coherences and in-
coherences that can feasibly take place within a docu-
ments set, with a success rate higher than 94% in most
of the cases. Tests with both synthetically-created
cases and real ones have shown that the system is able
to learn and detect incoherences by means of the sim-
ilarities of two 4-tuples holding numerical informa-
tion.
At present the work is underway concerning the
specialization of the FasArt system to be able, not
only to detect the existence or not of an incoherence,
but also to determine incoherence categories, using
the summarization by 4-tuples. On the other hand,
using this fuzzy approach, it is possible to extract
the learnt and subjective expert knowledge from the
neuro-fuzzy system, through a set of fuzzy rules that
can support a decision making system about this com-
plex and non objective problem.
ACKNOWLEDGEMENTS
This work has been supported in part by the Spanish
Industry, Tourism and Commerce Ministry through
the project TSI-020302-2008-73.
REFERENCES
Afantenos, S. D., Karkaletsis, V., and Stamatopoulos, P.
(2005). Summarization from medical documents: a
survey. Artificial Intelligence in Medicine, 33(2):157–
177.
Arango, F. (2003). Gestion de inconsistencias en la evolu-
cion e interoperacion de los esquemas conceptuales
OO, en el marco formal de OASIS. PhD thesis, Univ.
Politecnica de Valencia, Valencia, Spain.
Berry, M. W. (2004). Survey of Text Mining : Clustering,
Classification, and Retrieval. Springer.
Cano Izquierdo, J. M., Dimitriadis, Y. A., G´omez S´anchez,
E., and Coronado L´opez, J. (2001). Learnning from
noisy information in FasArt and fasback neuro-fuzzy
systems. Neural Networks, 14(4-5):407–425.
Chapman, S. (2006). Sam’s String Metrics page. Available
at http://www.dcs.shef.ac.uk/ sam/stringmetrics.html
(Accessed Dec.09).
Cohen, W. W., Ravikumar, P., and Fienberg, S. E. (2003).
A comparison of string metrics for matching names
and records. In Proceedings of the KDD-2003 Work-
shop on Data Cleaning, Record Linkage, and Object
Consolidation, pages 13–18, Washington DC, USA.
Garcia, E. Cosine Similarity and Term Weight
Tutorial. Mi Islita, Oct 2006. Available at
http://www.miislita.com/information-retrieval-
tutorial/cosine-similarity-tutorial.html (Accessed
Dec.09).
Koudas, N., Marathe, A., and Srivastava, D. (2005). SPI-
DER: flexible matching in databases. In SIGMOD
’05: Proceedings of the 2005 ACM SIGMOD interna-
tional conference on Management of data, pages 876–
878, New York, NY, USA. ACM.
Krulwich, B. and Burkey, C. (1997). The infofinder agent:
Learning user interests through heuristic phrase ex-
traction. IEEE Expert: Intelligent Systems and Their
Applications, 12(5):22–27.
Mannila, H., Toivonen, H., and Verkamo, A. I. (1997). Dis-
covery of frequent episodes in event sequences. Data
Min. Knowl. Discov., 1(3):259–289.
Martin, S., Arribas, V., and Sainz, G. (2009). Detection of
incoherences in a document corpus based on the ap-
plication of a neuro-fuzzy system. In Tenth Int. Conf.
on Document Analysis and Recognition.
Mart´ın, S., Sainz, G., and Dimitriadis, Y. (2008). Detec-
tion of incoherences in a technical and normative doc-
ument corpus. In Tenth ICEIS’08, volume Artficial In-
telligence and Decission Support Systems, pages 282–
287, Barcelona, Spain.
Mingshan, L. and Ching-to, A. M. (2002). Consistency in
performance evaluation reports and medical records.
The Journal of Mental Health Policy and Economics,
5(4):191–192.
Ruiz, M. (2002). Sistemas jur´ıdicos y conflictos normativos.
Dykinson, Universidad Carlos III de Madrid, Instituto
de Derechos Humanos Bartolom´e de las Casas.
Sainz, G. I., Fuente, M. J., and Vega, P. (2004). Recur-
rent neuro-fuzzy modelling of a wastewater treatment
plant. European Journal of Control, 10:83–95.
Sainz Palmero, G., Dimitriadis, Y., Cano Izquierdo, J.,
G´omez S´anchez, E., and Parrado Hern´andez, E.
(2000). ART based model set for pattern recogni-
tion: FasArt family. In Bunke, H. and Kandel, A.,
editors, Neuro-fuzzy pattern recognition, pages 147–
177. World Scientific Pub. Co.
Sainz Palmero, G. I. and Dimitriadis, Y. A. (1999). Struc-
tured document labeling and rule extraction using a
new recurrent fuzzy-neural system. In Fifth Int. Conf.
on Document Analysis and Recognition, ICDAR’ 99,
page 3181.
HYBRID APPROACH FOR INCOHERENCE DETECTION BASED ON NEURO-FUZZY SYSTEMS AND EXPERT
KNOWLEDGE
413