loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Atsuhiro Takasu 1 and Manabu Ohta 2

Affiliations: 1 National Institute of Informatics, Japan ; 2 Okayama University, Japan

Keyword(s): Digital Library, Document Understanding, Information Extraction, CRF.

Related Ontology Subjects/Areas/Topics: Applications ; Document Analysis and Understanding ; Pattern Recognition ; Software Engineering

Abstract: This paper discusses the problem of managing rules for page layout analysis and information extraction. We have been developing a system to extract information from academic papers that exploits both page layout and textual information. For this purpose, a conditional random field (CRF) analyzer is designed according to the layout of the object pages. Because various layouts are used in academic papers, we must prepare a set of rules for each type of layout to achieve high extraction accuracy. As the number of papers in a system grows, rule management becomes a big problem. For example, when should we make a new set of rules, and how can we acquire them efficiently while receiving new articles? This paper examines two scores to measure the fitness of rules and the applicability of rules learned for another type of layout. We evaluate the scores for bibliographic information extraction from title pages of academic papers and show that they are effective for measuring the fitness. We a lso examine the sampling of training data when learning a new set of rules. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.117.105.40

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Takasu, A. and Ohta, M. (2014). Rule Management for Information Extraction from Title Pages of Academic Papers. In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-018-5; ISSN 2184-4313, SciTePress, pages 438-444. DOI: 10.5220/0004827204380444

@conference{icpram14,
author={Atsuhiro Takasu. and Manabu Ohta.},
title={Rule Management for Information Extraction from Title Pages of Academic Papers},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2014},
pages={438-444},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004827204380444},
isbn={978-989-758-018-5},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Rule Management for Information Extraction from Title Pages of Academic Papers
SN - 978-989-758-018-5
IS - 2184-4313
AU - Takasu, A.
AU - Ohta, M.
PY - 2014
SP - 438
EP - 444
DO - 10.5220/0004827204380444
PB - SciTePress