Evaluating Better Document Representation in Clustering with Varying Complexity

Stephen Bradshaw; Colm O’Riordan

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Evaluating Better Document Representation in Clustering with Varying Complexity

Topics: Clustering and Classification Methods; Context Discovery; Visual Data Mining and Data Visualization

In Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 0IC3K, 194-202, 2018 , Seville, Spain

Authors: Stephen Bradshaw and Colm O’Riordan

Affiliation: National University of Ireland, Galway and Ireland

Keyword(s): Clustering and Classification Methods, Mining Text and Semi-structured Data, Context Discovery.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Clustering and Classification Methods ; Context Discovery ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Symbolic Systems ; Visual Data Mining and Data Visualization

Abstract: Micro blogging has become a very popular activity and the posts made by users can be a valuable source of information. Classifying this content accurately can be a challenging task due to the fact that comments are typically short in nature and on their own may lack context. Reddita is a very popular microblogging site whose popularity has seen a huge and consistent increase over the years. In this paper we propose using alternative but related Reddit threads to build language models that can be used to disambiguate intend mean of terms in a post. A related thread is one which is similar in content, often consisting of the same frequently occurring terms or phrases. We posit that threads of a similar nature use similar language and that the identification of related threads can be used as a source to add context to a post, enabling more accurate classification. In this paper, graphs are used to model the frequency and co-occurrence of terms. The terms of a document are mapped to node s, and the co-occurrence of two terms are recorded as edge weights. To show the robustness of our approach, we compare the performance in using related Reddit threads to the use of an external ontology; Wordnet. We apply a number of evaluation metrics to the clusters created and show that in every instance, the use of alternative threads to improve document representations is better than the use of Wordnet or standard augmented vector models. We apply this approach to increasingly harder environments to test the robustness of our approach. A tougher environment is one where the classifying algorithm has more than two categories to choose from when selecting the appropriate class. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.219

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Bradshaw, S. and O’Riordan, C. (2018). Evaluating Better Document Representation in Clustering with Varying Complexity. In Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018) - KDIR; ISBN 978-989-758-330-8; ISSN 2184-3228, SciTePress, pages 194-202. DOI: 10.5220/0006930901940202

@conference{kdir18,
author={Stephen Bradshaw and Colm O’Riordan},
title={Evaluating Better Document Representation in Clustering with Varying Complexity},
booktitle={Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018) - KDIR},
year={2018},
pages={194-202},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006930901940202},
isbn={978-989-758-330-8},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018) - KDIR
TI - Evaluating Better Document Representation in Clustering with Varying Complexity
SN - 978-989-758-330-8
IS - 2184-3228
AU - Bradshaw, S.
AU - O’Riordan, C.
PY - 2018
SP - 194
EP - 202
DO - 10.5220/0006930901940202
PB - SciTePress