The GDR Through the Eyes of the Stasi - Data Mining on the Secret Reports of the State Security Service of the former German Democratic Republic
Christoph Kuras, Thomas Efer, Christian Adam, Gerhard Heyer
2014
Abstract
The conjunction of NLP and the humanities has been gaining importance over the last years. As part of this development more and more historical documents are getting digitized and can be used as an input for established NLP methods. In this paper we present a corpus of texts from reports of the Ministry of State Security of the former GDR. Although written in a distinctive kind of sublanguage, we show that traditional NLP can be applied with satisfying results. We use these results as a basis for providing new ways of presentation and exploration of the data which then can be accessed by a wide spectrum of users.
References
- Biemann, C. (2006). Chinese whispers - an efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the HLT-NAACL-06 Workshop on Textgraphs-06, New York, USA.
- Han, J., Pei, J., and Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM SIGMOD Record, volume 29, pages 1-12. ACM.
- J. R. Finkel, T. G. and Manning, C. (2005). Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pages 363-370.
- Keim, D. A., Kohlhammer, J., Ellis, G., and Mansmann, F. (2010). Mastering The Information Age-Solving Problems with Visual Analytics. Florian Mansmann.
- Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788-791.
- Münkel, D., editor (2009ff). Die DDR im Blick der Stasi 1953-1989. Die geheimen Berichte an die SEDF ührung. www.ddr-im-blick.de.
- Tipping, M. E. and Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611-622.
- Toutanova, K., Klein, D., Manning, C. D., and Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of HLTNAACL.
- Toutanova, K. and Manning, C. D. (2000). Enriching the knowledge sources used in a maximum entropy partof-speech tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.
Paper Citation
in Harvard Style
Kuras C., Efer T., Adam C. and Heyer G. (2014). The GDR Through the Eyes of the Stasi - Data Mining on the Secret Reports of the State Security Service of the former German Democratic Republic . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 360-365. DOI: 10.5220/0005136703600365
in Bibtex Style
@conference{kdir14,
author={Christoph Kuras and Thomas Efer and Christian Adam and Gerhard Heyer},
title={The GDR Through the Eyes of the Stasi - Data Mining on the Secret Reports of the State Security Service of the former German Democratic Republic},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={360-365},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005136703600365},
isbn={978-989-758-048-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - The GDR Through the Eyes of the Stasi - Data Mining on the Secret Reports of the State Security Service of the former German Democratic Republic
SN - 978-989-758-048-2
AU - Kuras C.
AU - Efer T.
AU - Adam C.
AU - Heyer G.
PY - 2014
SP - 360
EP - 365
DO - 10.5220/0005136703600365