Authors:
Arnaud Renard
;
Sylvie Calabretto
and
Béatrice Rumpler
Affiliation:
Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, France
Keyword(s):
Information retrieval, (semi-)Structured documents, XML, Fuzzy semantic matching, Semantic resource, Thesaurus, Ontology, Error correction, OCR.
Related
Ontology
Subjects/Areas/Topics:
Accessibility Issues and Technology
;
Internet Technology
;
Ontology and the Semantic Web
;
Web Information Systems and Technologies
;
Web Interfaces and Applications
;
XML and Data Management
Abstract:
Nowadays, semantics is one of the greatest challenges in IR systems evolution, as well as when it comes to (semi-)structured IR systems which are considered here. Usually, this challenge needs an additional external semantic resource related to the documents collection. In order to compare concepts and from a wider point of view to work with semantic resources, it is necessary to have semantic similarity measures. Similarity measures assume that concepts related to the terms have been identified without ambiguity. Therefore, misspelled terms interfere in term to concept matching process. So, existing semantic aware (semi-)structured IR systems lay on basic concept identification but don’t care about terms spelling uncertainty. We choose to deal with this last aspect and we suggest a way to detect and correct misspelled terms through a fuzzy semantic weighting formula which can be integrated in an IR system. In order to evaluate expected gains, we have developed a prototype which firs
t results on small datasets seem interesting.
(More)