Informativeness-based Keyword Extraction from Short Documents

Mika Timonen; Timo Toivanen; Yue Teng; Chao Chen; Liang He

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Informativeness-based Keyword Extraction from Short Documents

Topics: Context Discovery; Information Extraction; Machine Learning; Mining Text and Semi-Structured Data; User Profiling and Recommender Systems

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, 411-421, 2012 , Barcelona, Spain

Authors: Mika Timonen ¹ ; Timo Toivanen ² ; Yue Teng ³ ; Chao Chen ³ and Liang He ³

Affiliations: ¹ VTT Technical Research Centre of Finland and University of Helsinki, Finland ; ² VTT Technical Research Centre of Finland, Finland ; ³ East China Normal University, China

Keyword(s): Keyword Extraction, Machine Learning, Short Documents, Term Weighting, Text Mining.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Computational Intelligence ; Context Discovery ; Evolutionary Computing ; Information Extraction ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Machine Learning ; Mining Text and Semi-Structured Data ; Soft Computing ; Symbolic Systems ; User Profiling and Recommender Systems

Abstract: With the rise of user created content on the Internet, the focus of text mining has shifted. Twitter messages and product descriptions are examples of new corpora available for text mining. Keyword extraction, user modeling and text categorization are all areas that are focusing on utilizing this new data. However, as the documents within these corpora are considerably shorter than in the traditional cases, such as news articles, there are also new challenges. In this paper, we focus on keyword extraction from documents such as event and product descriptions, and movie plot lines that often hold 30 to 60 words. We propose a novel unsupervised keyword extraction approach called Informativeness-based Keyword Extraction (IKE) that uses clustering and three levels of word evaluation to address the challenges of short documents. We evaluate the performance of our approach by using manually tagged test sets and compare the results against other keyword extraction methods, such as CollabRan k, KeyGraph, Chi-squared, and TF-IDF. We also evaluate the precision and effectiveness of the extracted keywords for user modeling and recommendation and report the results of all approaches. In all of the experiments IKE out-performs the competition. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.108

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Timonen, M., Toivanen, T., Teng, Y., Chen, C. and He, L. (2012). Informativeness-based Keyword Extraction from Short Documents. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2012) - SSTM; ISBN 978-989-8565-29-7; ISSN 2184-3228, SciTePress, pages 411-421. DOI: 10.5220/0004130704110421

@conference{sstm12,
author={Mika Timonen and Timo Toivanen and Yue Teng and Chao Chen and Liang He},
title={Informativeness-based Keyword Extraction from Short Documents},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2012) - SSTM},
year={2012},
pages={411-421},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004130704110421},
isbn={978-989-8565-29-7},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2012) - SSTM
TI - Informativeness-based Keyword Extraction from Short Documents
SN - 978-989-8565-29-7
IS - 2184-3228
AU - Timonen, M.
AU - Toivanen, T.
AU - Teng, Y.
AU - Chen, C.
AU - He, L.
PY - 2012
SP - 411
EP - 421
DO - 10.5220/0004130704110421
PB - SciTePress