Authors:
Daniel Osuna-Ontiveros
;
Ivan Lopez-Arevalo
and
Victor Sosa-Sosa
Affiliation:
CINVESTAV - IPN, Mexico
Keyword(s):
Indexing models, Information retrieval, Semantic clustering, Semantic search.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Process Mining
;
Symbolic Systems
Abstract:
Information retrieval (IR) models process documents for preparing them for search by humans or computers. In the early models, the general idea was making a lexico-syntactic processing of documents, where the importance of the documents retrieved by a query is based on the frequency of its terms in the document. Another approach is return predefined documents based on the type of query the user make. Recently, some researchers have combined text mining techniques to enhance the document retrieval. This paper proposes a semantic clustering approach to improve traditional information retrieval models by representing topics associated to documents. This proposal combines text mining algorithms and natural language processing. The approach does not use a priori queries, instead clusters terms, where each cluster is a set of related words according to the content of documents. As result, a document-topic matrix representation is obtained denoting the importance of topics inside documents.
For query processing, each query is represented as a set of clusters considering its terms. Thus, a similarity measure (e.g. cosine similarity) can be applied over this array and the matrix of documents to retrieve the most relevant documents.
(More)