EXPLOITING N-GRAM IMPORTANCE AND WIKIPEDIA BASED ADDITIONAL KNOWLEDGE FOR IMPROVEMENTS IN GAAC BASED DOCUMENT CLUSTERING

Niraj Kumar; Venkata Vinay Babu Vemula; Kannan Srinathan; Vasudeva Varma

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

EXPLOITING N-GRAM IMPORTANCE AND WIKIPEDIA BASED ADDITIONAL KNOWLEDGE FOR IMPROVEMENTS IN GAAC BASED DOCUMENT CLUSTERING

Topics: Information Extraction

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 0IC3K, 182-187, 2010 , Valencia, Spain

Authors: Niraj Kumar ; Venkata Vinay Babu Vemula ; Kannan Srinathan and Vasudeva Varma

Affiliation: International Institute of Information Technology, India

Keyword(s): Document clustering, Group-average agglomerative clustering, Community detection, Similarity measure, N-gram, Wikipedia based additional knowledge.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Information Extraction ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Symbolic Systems

Abstract: This paper provides a solution to the issue: “How can we use Wikipedia based concepts in document clustering with lesser human involvement, accompanied by effective improvements in result?” In the devised system, we propose a method to exploit the importance of N-grams in a document and use Wikipedia based additional knowledge for GAAC based document clustering. The importance of N-grams in a document depends on a many features including, but not limited to: frequency, position of their occurrence in a sentence and the position of the sentence in which they occur, in the document. First, we introduce a new similarity measure, which takes the weighted N-gram importance into account, in the calculation of similarity measure while performing document clustering. As a result, the chances of topical similarity in clustering are improved. Second, we use Wikipedia as an additional knowledge base both, to remove noisy entries from the extracted N-grams and to reduce the information gap betwe en N-grams that are conceptually-related, which do not have a match owing to differences in writing scheme or strategies. Our experimental results on the publicly available text dataset clearly show that our devised system has a significant improvement in performance over bag-of-words based state-of-the-art systems in this area. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.61

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Kumar, N., Vinay Babu Vemula, V., Srinathan, K. and Varma, V. (2010). EXPLOITING N-GRAM IMPORTANCE AND WIKIPEDIA BASED ADDITIONAL KNOWLEDGE FOR IMPROVEMENTS IN GAAC BASED DOCUMENT CLUSTERING . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2010) - KDIR; ISBN 978-989-8425-28-7; ISSN 2184-3228, SciTePress, pages 182-187. DOI: 10.5220/0003081201820187

@conference{kdir10,
author={Niraj Kumar and Venkata {Vinay Babu Vemula} and Kannan Srinathan and Vasudeva Varma},
title={EXPLOITING N-GRAM IMPORTANCE AND WIKIPEDIA BASED ADDITIONAL KNOWLEDGE FOR IMPROVEMENTS IN GAAC BASED DOCUMENT CLUSTERING },
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2010) - KDIR},
year={2010},
pages={182-187},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003081201820187},
isbn={978-989-8425-28-7},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2010) - KDIR
TI - EXPLOITING N-GRAM IMPORTANCE AND WIKIPEDIA BASED ADDITIONAL KNOWLEDGE FOR IMPROVEMENTS IN GAAC BASED DOCUMENT CLUSTERING
SN - 978-989-8425-28-7
IS - 2184-3228
AU - Kumar, N.
AU - Vinay Babu Vemula, V.
AU - Srinathan, K.
AU - Varma, V.
PY - 2010
SP - 182
EP - 187
DO - 10.5220/0003081201820187
PB - SciTePress