Authors:
Quang-Hong Vuong
1
and
Takasu Atsuhiro
2
Affiliations:
1
Hanoi University of Science and Technology, Vietnam
;
2
National Institute of Informatics, Japan
Keyword(s):
Transfer Learning, Bibliographic Information Extraction, Conditional Random Fields, Page Layout Analysis, Digital Libraries.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Bioinformatics and Systems Biology
;
Instance-Based Learning
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Learning in Process Automation
;
Natural Language Processing
;
Pattern Recognition
;
Software Engineering
;
Symbolic Systems
;
Theory and Methods
Abstract:
This paper discusses the problems of analyzing title page layouts and extracting bibliographic information
from academic papers. Information extraction is an important task for easily using digital libraries. Sequence
analyzers are usually used to extract information from pages. Because we often receive new layouts and the
layouts also usually change, it is necessary to have a machenism for self-trainning a new analyzer to achieve a
good extraction accuracy. This also makes the management becomes easier. For example, when the new layout
is inputed, There is a problem of how we can learn automatically and efficiently to create a new analyzer. This
paper focuses on learning a new sequence analyzer automatically by using transfer learning approach. We
evaluated the efficiency by testing three academic journals. The results show that the proposed method is
effective to self-train a new sequence analyer.