Keyword Extraction in German: Information-theory vs. Deep Learning

Max Kölbl, Yuki Kyogoku, J. Philipp, Michael Richter, Clemens Rietdorf, Tariq Yousef

2020

Abstract

This paper reports the results of a study on automatic keyword extraction in German. We employed in general two types of methods: (A) an unsupervised method based on information theory (Shannon, 1948). We employed (i) a bigram model, (ii) a probabilistic parser model (Hale, 2001) and (iii) an innovative model which utilises topics as extra-sentential contexts for the calculation of the information content of the words, and (B) a supervised method employing a recurrent neural network (RNN). As baselines, we employed TextRank and the TF-IDF ranking function. The topic model (A)(iii) outperformed clearly all remaining models, even TextRank and TF-IDF. In contrast, RNN performed poorly. We take the results as first evidence, that (i) information content can be employed for keyword extraction tasks and has thus a clear correspondence to semantics of natural language’s, and (ii) that - as a cognitive principle - the information content of words is determined from extra-sentential contexts, that is to say, from the discourse of words.

Download


Paper Citation


in Harvard Style

Kölbl M., Kyogoku Y., Philipp J., Richter M., Rietdorf C. and Yousef T. (2020). Keyword Extraction in German: Information-theory vs. Deep Learning. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI, ISBN 978-989-758-395-7, pages 459-464. DOI: 10.5220/0009374704590464


in Bibtex Style

@conference{nlpinai20,
author={Max Kölbl and Yuki Kyogoku and J. Philipp and Michael Richter and Clemens Rietdorf and Tariq Yousef},
title={Keyword Extraction in German: Information-theory vs. Deep Learning},
booktitle={Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI,},
year={2020},
pages={459-464},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009374704590464},
isbn={978-989-758-395-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI,
TI - Keyword Extraction in German: Information-theory vs. Deep Learning
SN - 978-989-758-395-7
AU - Kölbl M.
AU - Kyogoku Y.
AU - Philipp J.
AU - Richter M.
AU - Rietdorf C.
AU - Yousef T.
PY - 2020
SP - 459
EP - 464
DO - 10.5220/0009374704590464