Fabio Clarizia
Francesco Colace
Massimo De Santo
Luca Greco
Paolo Napoletano
University of Salerno, Italy
Text retrieval, Query expansion, Term extraction, Probabilistic topic model, Relevance feedback.
Artificial Intelligence
Clustering and Classification Methods
Computational Intelligence
Concept Mining
Evolutionary Computing
Information Extraction
Interactive and Online Data Mining
Knowledge Discovery and Information Retrieval
Knowledge-Based Systems
Machine Learning
Mining Text and Semi-Structured Data
Soft Computing
Symbolic Systems
Web Mining
It is well known that one way to improve the accuracy of a text retrieval system is to expand the original query with additional knowledge coded through topic-related terms. In the case of an interactive environment, the expansion, which is usually represented as a list of words, is extracted from documents whose relevance is known thanks to the feedback of the user. In this paper we argue that the accuracy of a text retrieval system can be improved if we employ a query expansion method based on a mixed Graph of Terms representation instead of a method based on a simple list of words. The graph, that is composed of a directed and an undirected subgraph, can be automatically extracted from a small set of only relevant documents (namely the user feedback) using a method for term extraction based on the probabilistic Topic Model. The evaluation of the proposed method has been carried out by performing a comparison with two less complex structures: one represented as a set of pairs of wo
rds and another that is a simple list of words.