Authors:
Dario De Nart
and
Carlo Tasso
Affiliation:
University of Udine, Italy
Keyword(s):
Keyphrase Extraction, Keyphrase Inference, Information Extraction, Text Classification, Text Summarization.
Related
Ontology
Subjects/Areas/Topics:
Metadata and Metamodeling
;
Ontology and the Semantic Web
;
Web Information Systems and Technologies
;
Web Interfaces and Applications
Abstract:
The annotation of documents and web pages with semantic metatdata is an activity that can greatly increase the accuracy of Information Retrieval and Personalization systems, but the growing amount of text data available is too large for an extensive manual process. On the other hand, automatic keyphrase generation, a complex task involving Natural Language Processing and Knowledge Engineering, can significantly support this activity.
Several different strategies have been proposed over the years, but most of them require extensive training data, which are not always available, suffer high ambiguity and differences in writing style, are highly domain-specific, and often rely on a well-structured knowledge that is very hard to acquire and encode.
In order to overcome these limitations, we propose in this paper an innovative domain-independent approach that consists of an unsupervised keyphrase extraction phase and a subsequent keyphrase inference phase based on loosely structured, coll
aborative knowledge such as Wikipedia, Wordnik, and Urban Dictionary. This double layered approach allows us to generate keyphrases that both describe and classify the text.
(More)