Authors:
Dante Degl'Innocenti
;
Dario De Nart
and
Carlo Tasso
Affiliation:
University of Udine, Italy
Keyword(s):
Keyphrase Extraction, Information Extraction, Italian Language, Natural Language Processing, Text Analysis, Text Classification, Text Summarization.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Concept Mining
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Soft Computing
;
Symbolic Systems
;
Web Mining
Abstract:
Associating meaningful keyphrases to text documents and Web pages is an activity that can significantly increase
the accuracy of Information Retrieval, Personalization and Recommender systems, but the growing
amount of text data available is too large for an extensive manual annotation. On the other hand, automatic
keyphrase generation can significantly support this activity. This task is already performed with satisfactory
results by several systems proposed in the literature, however, most of them focuses solely on the English language
which represents approximately more than 50% of Web contents. Only few other languages have been
investigated and Italian, despite being the ninth most used language on the Web, is not among them. In order
to overcome this shortage, we propose a novel multi-language, unsupervised, knowledge-based approach
towards keyphrase generation. To support our claims, we developed DIKpE-G, a prototype system which integrates
several kinds of knowledge for selec
ting and evaluating meaningful keyphrases, ranging from linguistic
to statistical, meta/structural, social, and ontological knowledge. DIKpE-G performs well over English and
Italian texts.
(More)