Authors:
Stephen Bradshaw
1
;
Colm O'Riordan
1
and
Daragh Bradshaw
2
Affiliations:
1
National University Ireland Galway, Ireland
;
2
National University Limerick, Ireland
Keyword(s):
Document Clustering, Graph Theory, WordNet, Classification, Word Sense Disambiguation, Data Mining.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Context Discovery
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Symbolic Systems
Abstract:
Clustering documents is a common task in a range of information retrieval systems and applications. Many approaches for improving the clustering process have been proposed. One approach is the use of an ontology to better inform the classifier of word context, by expanding the items to be clustered. Wordnet is commonly cited as an appropriate source from which to draw the additional terms; however, it may not be sufficient to achieve strong performance. We have two aims in this paper: first, we show that the use of Wordnet may lead to suboptimal performance. This problem may be accentuated when a document set has been drawn from comments made in social forums; due to the unstructured nature of online conversations compared to standard document sets. Second, we propose a novel method which involves constructing a bespoke ontology that facilitates better clustering. We present a study of clustering applied to a sample of threads from a social forum and investigate the effectiveness of
the application of these methods.
(More)