essary that a document contains terms of query. From
the recall, the boolean representation has the better re-
sult. This was because BR returned a lot of documents
(more than 300) for each query.
An unsupervised approach for indexing documents is
proposed in this paper. The proposal combines text
minig and natural language processing to obtain a
document-topic matrix representation for a set of doc-
uments. First, verb-noun relationships are obtained
by using a POS tagger. Then the Clustering by Com-
mittee algorithm is used to group terms according to
verb-noun relations. After that, the Latent Direch-
let Allocation is applied to obtain the most relevant
terms. The parameters for LDA are obtained without
human intervention. According to the experiments,
in general, the proposal has a better performance in
topic-based semantic searches over traditional mod-
els (boolean and vector space model). Future work
should include semantic processing in web search or
analysis of tweets/posts.
This research was partialy funded by project number
165474 from “Fondo Mixto Conacyt-Gobierno del
Estado de Tamaulipas”.
