Authors:
Hendrik Schöneberg
and
Frank Müller
Affiliation:
University of Würzburg, Germany
Keyword(s):
Named Entity Recognition, Information Retrieval, Contextual Approach, Feature Vector, Conditional Random Field, Classification Framework, Digitization.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Context Discovery
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Structured Data Analysis and Statistical Methods
;
Symbolic Systems
Abstract:
Performing Named Entity Recognition on ancient documents is a time-consuming, complex and error-prone
manual task. It is a prerequisite though to being able to identify related documents and correlate between
named entities in distinct sources, helping to precisely recreate historic events. In order to reduce the manual
effort, automated classification approaches could be leveraged. Classifying terms in ancient documents in an
automated manner poses a difficult task due to the sources’ challenging syntax and poor conservation states.
This paper introduces and evaluates two approaches that can cope with complex syntactial environments by
using statistical information derived from a term’s context and combining it with domain-specific heuristic
knowledge to perform a classification. Furthermore, these approaches can easily be adapted to new domains.