loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Elisabete Cunha 1 ; Álvaro Figueira 2 and Óscar Mealha 3

Affiliations: 1 Universidade do Porto, Universidade de Aveiro, ESE and IPVC, Portugal ; 2 Universidade do Porto, Portugal ; 3 CETAC.MEDIA, Portugal

Keyword(s): Semantic Document Classification, Clustering, Tagging, Seed Selection, k-means, k-C, Cosine Similarity.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Clustering and Classification Methods ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Symbolic Systems

Abstract: In this paper we analyze and discuss two methods that are based on the traditional k-means for document clustering and that feature integration of social tags in the process. The first one allows the integration of tags directly into a Vector Space Model, and the second one proposes the integration of tags in order to select the initial seeds. We created a predictive model for the impact of the tags’ integration in both models, and compared the two methods using the traditional k-means++ and the novel k-C algorithm. To compare the results, we propose a new internal measure, allowing the computation of the cluster compactness. The experimental results indicate that the careful selection of seeds on the k-C algorithm present better results to those obtained with the k-means++, with and without integration of tags.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.217.132.107

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Cunha, E.; Figueira, Á. and Mealha, Ó. (2013). Clustering and Classifying Text Documents - A Revisit to Tagging Integration Methods. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing (IC3K 2013) - KDIR; ISBN 978-989-8565-75-4; ISSN 2184-3228, SciTePress, pages 160-168. DOI: 10.5220/0004545201600168

@conference{kdir13,
author={Elisabete Cunha. and Álvaro Figueira. and Óscar Mealha.},
title={Clustering and Classifying Text Documents - A Revisit to Tagging Integration Methods},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing (IC3K 2013) - KDIR},
year={2013},
pages={160-168},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004545201600168},
isbn={978-989-8565-75-4},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing (IC3K 2013) - KDIR
TI - Clustering and Classifying Text Documents - A Revisit to Tagging Integration Methods
SN - 978-989-8565-75-4
IS - 2184-3228
AU - Cunha, E.
AU - Figueira, Á.
AU - Mealha, Ó.
PY - 2013
SP - 160
EP - 168
DO - 10.5220/0004545201600168
PB - SciTePress