Authors:
Sebastian Lindner
and
Winfried Höhn
Affiliation:
University of Würzburg, Germany
Keyword(s):
References Parsing, Bibliography, Conditional Random Fields (CRFs), Constraint-based Learning, Information Extraction, Information Retrieval, Machine Learning, Sequence Labeling, Semi-supervised Learning.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Computational Intelligence
;
Data Reduction and Quality Assessment
;
Evolutionary Computing
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Symbolic Systems
Abstract:
This paper shows some key components of our workflow to cope with bibliographic information. We therefore compare several approaches for parsing bibliographic references using conditional random fields (CRFs). This paper concentrates on cases, where there are only few labeled training instances available. To get better labeling results prior knowledge about the bibliography domain is used in training CRFs using different constraint models. We show that our labeling approach is able to achieve comparable and even better results than other state of the art approaches. Afterwards we point out how for about half of our reference strings a correlation between journal title, volume and publishing year could be used to identify the correct journal even when we had ambiguous journal title abbreviations.