Authors:
Jingwen Wang
;
Changfeng Yu
;
Wenjing Yang
and
Jie Wang
Affiliation:
Department of Computer Science, University of Massachusetts, Lowell, MA and U.S.A.
Keyword(s):
TFIDF, TextRank, RAKE, Word2Vec, Minimum Edit Distance.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Symbolic Systems
Abstract:
We present a text mining system called Vocab Learn to assist users to learn new words with respect to a knowledge base, where a knowledge base is a collection of written materials. Vocab Learn extracts words, excluding stop words, from a knowledge base and recommends new words to a user according to their importance and frequency. To enforce learning and assess how well a word is learned, Vocab Learn generates, for each word recommended, a number of semantically close words using word embeddings (Mikolov et al., 2013a), and a number of words with look-alike spellings/strokes but with different meanings using Minimum Edit Distance (Levenshtein, 1966). Moreover, to help learn how to use a new word, Vocab Learn links each word to its dictionary definitions and provides sample sentences extracted from the knowledge base that includes the word. We carry out experiments to compare word-ranking algorithms of TFIDF (Salton and McGill, 1986), TextRank (Mihalcea and Tarau, 2004), and RAKE (Ros
e et al., 2010) over the dataset of Inspec abstracts in Computer Science and Information Technology Journals with a set of keywords labeled by human editors. We show that TextRank would be the best choice for ranking words for this dataset. We also show that Vocab Learn generates reasonable words with similar meanings and words with similar spellings but with different meanings.
(More)