Transfer Learning for Bibliographic Information Extraction

Quang-Hong Vuong, Takasu Atsuhiro

Abstract

This paper discusses the problems of analyzing title page layouts and extracting bibliographic information from academic papers. Information extraction is an important task for easily using digital libraries. Sequence analyzers are usually used to extract information from pages. Because we often receive new layouts and the layouts also usually change, it is necessary to have a machenism for self-trainning a new analyzer to achieve a good extraction accuracy. This also makes the management becomes easier. For example, when the new layout is inputed, There is a problem of how we can learn automatically and efficiently to create a new analyzer. This paper focuses on learning a new sequence analyzer automatically by using transfer learning approach. We evaluated the efficiency by testing three academic journals. The results show that the proposed method is effective to self-train a new sequence analyer.

References

  1. F. Jiao, S. Wang, C. L. R. G. and Schuurmans, D. (2006). Semi-supervised conditional random fields for improved sequence segmentation and labeling. In International Committee on Computational Linguistics and the Association for Computational Linguistics, pages 209-216.
  2. I. G. Councill, C. L. G. and Kan, M. Y. (2008). Parscit: An open-source crf reference string parsing package. In Language Resources and Evaluation Conference (LREC), page 8.
  3. M. Ohta, R. I. and Takasu, A. Empirical evaluation of active sampling for crf- based analysis of pages. In IEEE International Conference on Information Reuse and Integration (IRI 2010), pages 13-18.
  4. Peng, F. and McCallum, A. (2004). Accurate information extraction from research pa- pers using conditional random fields. In Human Language Technologies; Annual Conference on the North American Chapter of the Association for Computational Liguistics (NAACL HLT), pages 329-336.
  5. Quang-Hong, V. and Takasu, A. (2014). Transfer learning for emotional polarity classification. In IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI2014), pages 94-101.
  6. S. J. Pan, J. K. and Yang, Q. (2013). Transfer learning via dimensionality reduction. In Proceedings of the conference on artificial intelligence, pages 677-682.
  7. Takasu, A. (2003). Bibliographic attribute extraction from erroneous references based on a statistical model. In Joint Conference on Digital Libraries (JCDL 03), pages 49-60.
  8. Takasu, A. and Ohta, M. (2014). Utilization of multiple sequence analyzers for bibliographic information extraction.
  9. W. Dai, Q. Yang, G. X. and Yu, Y. (2007). Boosting for transfer learning. In Proceedings of the international conference on machine learning, pages 193-200.
Download


Paper Citation


in Harvard Style

Vuong Q. and Atsuhiro T. (2015). Transfer Learning for Bibliographic Information Extraction . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 374-379. DOI: 10.5220/0005283003740379


in Bibtex Style

@conference{icpram15,
author={Quang-Hong Vuong and Takasu Atsuhiro},
title={Transfer Learning for Bibliographic Information Extraction},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={374-379},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005283003740379},
isbn={978-989-758-076-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Transfer Learning for Bibliographic Information Extraction
SN - 978-989-758-076-5
AU - Vuong Q.
AU - Atsuhiro T.
PY - 2015
SP - 374
EP - 379
DO - 10.5220/0005283003740379