Contextual Approaches for Identification of Toponyms in Ancient Documents

Hendrik Schöneberg, Frank Müller

Abstract

Performing Named Entity Recognition on ancient documents is a time-consuming, complex and error-prone manual task. It is a prerequisite though to being able to identify related documents and correlate between named entities in distinct sources, helping to precisely recreate historic events. In order to reduce the manual effort, automated classification approaches could be leveraged. Classifying terms in ancient documents in an automated manner poses a difficult task due to the sources’ challenging syntax and poor conservation states. This paper introduces and evaluates two approaches that can cope with complex syntactial environments by using statistical information derived from a term’s context and combining it with domain-specific heuristic knowledge to perform a classification. Furthermore, these approaches can easily be adapted to new domains.

References

  1. Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. ACM Press, New York, 1st edition.
  2. Billhardt, H., (corresponding), H. B., Borrajo, D., and Maojo, V. (2002). A context vector model for information retrieval. Journal of the American Society for Information Science and Technology, 53:236-249.
  3. Gauch, S., Wang, J., and Rachakonda, S. M. (1999). A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Trans. Inf. Syst., 17(3):250-269.
  4. Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pages 282-289, San Fransisco. Morgan Kaufmann.
  5. Miller, G. A. and Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6.
  6. Müller, F. (2012). Identifikation von Toponymen in historischen Texten. Master's thesis, Julius Maximilians Universität W ürzburg.
  7. Schapire, R. E. (2002). The Boosting Approach to Machine Learning An Overview. In MSRI Workshop on Nonlinear Estimation and Classification.
  8. Schütze, H. (1992). Dimensions of meaning. In Supercomputing 7892: Proceedings of the 1992 ACM/IEEE conference on Supercomputing, pages 787-796, Los Alamitos, CA, USA. IEEE Computer Society Press.
Download


Paper Citation


in Harvard Style

Schöneberg H. and Müller F. (2012). Contextual Approaches for Identification of Toponyms in Ancient Documents . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012) ISBN 978-989-8565-29-7, pages 163-168. DOI: 10.5220/0004110701630168


in Bibtex Style

@conference{kdir12,
author={Hendrik Schöneberg and Frank Müller},
title={Contextual Approaches for Identification of Toponyms in Ancient Documents},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)},
year={2012},
pages={163-168},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004110701630168},
isbn={978-989-8565-29-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)
TI - Contextual Approaches for Identification of Toponyms in Ancient Documents
SN - 978-989-8565-29-7
AU - Schöneberg H.
AU - Müller F.
PY - 2012
SP - 163
EP - 168
DO - 10.5220/0004110701630168