TWO METHODS FOR FILLED-IN DOCUMENT IMAGE IDENTIFICATION USING LOCAL FEATURES

Diego Carrion-Robles, Vicent Castello-Fos, Juan-Carlos Perez-Cortes, Joaquim Arlandis

Abstract

In this work, the task of document image classification is dealt with, particularly in the case of pre-printed forms, where a large part of the document can be filled-in with the result of a potentially very different image. A method for the selection of discriminative local features is presented and tested along with two different classification algorithms. The first one is an incremental version of the method proposed in (Arlandis et al., 2009), based on similarity searching around a set anchor points, and the second one is based on a direct voting scheme ((Arlandis et al., 2011)). Experiments on a document database consisting of real office documents with a very high variability, as well as on the NIST SD6 database, are presented. A confidence measure intended to reject unknown documents (those that have not been indexed in advance as a given document class) is also proposed and tested.

References

  1. Arlandis, J., Castello-Fos, V., and Pérez-Cortes, J. C. (2011). Filled-in document identification using local features and a direct voting scheme. In Vitrià, J., Sanches, J. M. R., and Hernández, M., editors, IbPRIA, volume 6669 of Lecture Notes in Computer Science, pages 548-555. Springer.
  2. Arlandis, J., Perez-Cortes, J.-C., and Ungria, E. (2009). Identification of very similar filled-in forms with a reject option. In Document Analysis and Recognition, 2009. ICDAR 7809. 10th International Conference on, pages 246 -250.
  3. Dimmick, D. L. and Garris, M. D. (1992). Structured forms database 2, nist special database 6.
  4. Haralick, R. M., Shanmugam, K., and Dinstein, I. (1973). Textural features for image classification. Systems, Man and Cybernetics, IEEE Transactions on, 3(6):610 -621.
  5. Heroux, P., Diana, S., Ribert, A., and Trupin, E. (1998). Classification method study for automatic form class identification. In Pattern Recognition, 1998. Proceedings. Fourteenth International Conference on, volume 1, pages 926 -928 vol.1.
  6. Kittler, J., Hatef, M., Duin, R., and Matas, J. (1998). On combining classifiers. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(3):226 -239.
  7. Mohr, R., Picard, S., and Schmid, C. (1997). Bayesian decision versus voting for image retrieval. In IN PROC. OF THE CAIP-97, pages 376-383.
  8. Nagasaki, T., Marukawa, K., Kagehiro, T., and Sako, H. (2006). A coupon classification method based on adaptive image vector matching. In Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, volume 3, pages 280 -283.
  9. Ogata, H., Watanabe, S., Imaizumi, A., Yasue, T., Furukawa, N., Sako, H., and Fujisawa, H. (2003). Formtype identification for banking applications and its implementation issues. In DRR'03, pages 208-218.
  10. Parker, C. (2010). Anchor point selection by kl-divergence. In Image Processing Workshop (WNYIPW), 2010 Western New York, pages 42 -45.
  11. Sako, H., Seki, M., Furukawa, N., Ikeda, H., and Imaizumi, A. (2003). Form reading based on form-type identification and form-data recognition. In Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2, ICDAR 7803, pages 926-, Washington, DC, USA. IEEE Computer Society.
  12. Sarkar, P. (2006). Image classification: Classifying distributions of visual features. In Proceedings of the 18th International Conference on Pattern Recognition - Volume 02, ICPR 7806, pages 472-475, Washington, DC, USA. IEEE Computer Society.
  13. Sarkar, P. (2010). Learning image anchor templates for document classification and data extraction. In Pattern Recognition (ICPR), 2010 20th International Conference on, pages 3428 -3431.
  14. Ting, A. and Leung, M. (1996). Business form classification using strings. In Pattern Recognition, 1996., Proceedings of the 13th International Conference on, volume 2, pages 690 -694 vol.2.
  15. Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 1:511-518.
Download


Paper Citation


in Harvard Style

Carrion-Robles D., Castello-Fos V., Perez-Cortes J. and Arlandis J. (2012). TWO METHODS FOR FILLED-IN DOCUMENT IMAGE IDENTIFICATION USING LOCAL FEATURES . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: IATMLRP, (ICPRAM 2012) ISBN 978-989-8425-98-0, pages 481-487. DOI: 10.5220/0003884004810487


in Bibtex Style

@conference{iatmlrp12,
author={Diego Carrion-Robles and Vicent Castello-Fos and Juan-Carlos Perez-Cortes and Joaquim Arlandis},
title={TWO METHODS FOR FILLED-IN DOCUMENT IMAGE IDENTIFICATION USING LOCAL FEATURES},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: IATMLRP, (ICPRAM 2012)},
year={2012},
pages={481-487},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003884004810487},
isbn={978-989-8425-98-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: IATMLRP, (ICPRAM 2012)
TI - TWO METHODS FOR FILLED-IN DOCUMENT IMAGE IDENTIFICATION USING LOCAL FEATURES
SN - 978-989-8425-98-0
AU - Carrion-Robles D.
AU - Castello-Fos V.
AU - Perez-Cortes J.
AU - Arlandis J.
PY - 2012
SP - 481
EP - 487
DO - 10.5220/0003884004810487