Authors:
Simone D’Amico
1
;
Lorenzo Malandri
2
;
3
;
Fabio Mercorio
2
;
3
and
Mario Mezzanzanica
2
;
3
Affiliations:
1
Department of Economics, Management and Statistics, University of Milano-Bicocca, Milan, Italy
;
2
Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
;
3
CRISP Research Centre, University of Milan-Bicocca, Milan, Italy
Keyword(s):
Keyphrases Extraction, Keyphrases Evaluation, Keyphrases Benchmark Evaluation, Word Embeddings, Natural Language Processing.
Abstract:
A research area of NLP is known as keyphrases extraction, which aims to identify words and expressions in a text that comprehensively represent the content of the text itself. In this study, we introduce a new approach called KRAKEN (Keyphrease extRAction maKing use of EmbeddiNgs). Our method takes advantage of widely used NLP techniques to extract keyphrases from a text in an unsupervised manner and we compare the results with well-known benchmark datasets in the literature. The main contribution of this work is developing a novel approach for keyphrase extraction. Both natural language text preprocessing techniques and distributional semantics techniques, such as word embeddings, are used to obtain a vector representation of the texts that maintains their semantic meaning. Through KRAKEN, we propose and design a new method that exploits word embedding for identifying keyphrases, considering the relationship among words in the text. To evaluate KRAKEN, we employ benchmark datasets a
nd compare our approach with state-of-the-art methods. Another contribution of this work is the introduction of a metric to rank the identified keyphrases, considering the relatedness of both the words within the phrases and all the extracted phrases from the same text.
(More)