Peter Kluegl, Martin Toepfer, Florian Lemmerich, Andreas Hotho, Frank Puppe


Conditional Random Fields (CRF) are popular methods for labeling unstructured or textual data. Like many machine learning approaches these undirected graphical models assume the instances to be independently distributed. However, in real world applications data is grouped in a natural way, e.g., by its creation context. The instances in each group often share additional consistencies in the structure of their information. This paper proposes a domain-independent method for exploiting these consistencies by combining two CRFs in a stacked learning framework. The approach incorporates three successive steps of inference: First, an initial CRF processes single instances as usual. Next, we apply rule learning collectively on all labeled outputs of one context to acquire descriptions of its specific properties. Finally, we utilize these descriptions as dynamic and high quality features in an additional (stacked) CRF. The presented approach is evaluated with a real-world dataset for the segmentation of references and achieves a significant reduction of the labeling error.


  1. Arnold, A. and Cohen, W. W. (2008). Intra-document Structural Frequency Features for Semi-supervised Domain Adaptation. In Proceeding of the 17th ACM conference on Information and knowledge management, pages 1291-1300. ACM.
  2. Cohen, W. W. (1995). Fast Effective Rule Induction. In Proceedings of the Twelfth Int. Conference on Machine Learning, pages 115-123. Morgan Kaufmann.
  3. Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3):273-297.
  4. Councill, I., Giles, C. L., and Kan, M.-Y. (2008). ParsCit: an Open-source CRF Reference String Parsing Package. In Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco. ELRA.
  5. Finkel, J. R., Grenager, T., and Manning, C. (2005). Incorporating non-local Information into Information Extraction Systems by Gibbs Sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 7805, pages 363- 370, Stroudsburg, PA, USA. Association for Computational Linguistics.
  6. Gulhane, P., Rastogi, R., Sengamedu, S. H., and Tengli, A. (2010). Exploiting Content Redundancy for Web Information Extraction. Proc. VLDB Endow., 3:578- 587.
  7. Klösgen, W. (1996). Explora: A Multipattern and Multistrategy Discovery Assistant. In Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors, Advances in Knowledge Discovery and Data Mining, pages 249-271. AAAI Press.
  8. Kou, Z. and Cohen, W. W. (2007). Stacked Graphical Models for Efficient Inference in Markov Random Fields. In Proceedings of the 2007 SIAM Int. Conf. on Data Mining.
  9. Krishnan, V. and Manning, C. D. (2006). An Effective twostage Model for Exploiting non-local Dependencies in Named Entity Recognition. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pages 1121- 1128, Stroudsburg, PA, USA. Association for Computational Linguistics.
  10. Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proc. 18th International Conf. on Machine Learning, pages 282- 289.
  11. Mann, G. S. and McCallum, A. (2010). Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data. J. Mach. Learn. Res., 11:955- 984.
  12. McCallum, A. (2003). Efficiently Inducing Features of Conditional Random Fields. In Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03).
  13. Peng, F. and McCallum, A. (2004). Accurate Information Extraction from Research Papers using Conditional Random Fields. In HLT-NAACL, pages 329-336.
  14. Poon, H. and Domingos, P. (2007). Joint Inference in Information Extraction. In AAAI'07: Proceedings of the 22nd National Conference on Artificial intelligence, pages 913-918. AAAI Press.
  15. Richardson, M. and Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1-2):107-136.
  16. Singh, S., Schultz, K., and McCallum, A. (2009). Bidirectional Joint Inference for Entity Resolution and Segmentation Using Imperatively-Defined Factor Graphs. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II, ECML PKDD 7809, pages 414- 429. Springer-Verlag.
  17. Stewart, L., He, X., and Zemel, R. S. (2008). Learning Flexible Features for Conditional Random Fields. IEEE Trans. Pattern Anal. Mach. Intell., 30(8):1415-1426.
  18. Sutton, C. and McCallum, A. (2004). Collective Segmentation and Labeling of Distant Entities in Information Extraction. In ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields.
  19. Wolpert, D. H. (1992). Stacked Generalization. Neural Networks, 5:241-259.
  20. Yang, J.-M., Cai, R., Wang, Y., Zhu, J., Zhang, L., and Ma, W.-Y. (2009). Incorporating Site-level Knowledge to Extract Structured Data from Web Forums. In Proceedings of the 18th international conference on World wide web, pages 181-190. ACM.

Paper Citation

in Harvard Style

Kluegl P., Toepfer M., Lemmerich F., Hotho A. and Puppe F. (2012). STACKED CONDITIONAL RANDOM FIELDS EXPLOITING STRUCTURAL CONSISTENCIES . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-8425-99-7, pages 240-249. DOI: 10.5220/0003706602400249

in Bibtex Style

author={Peter Kluegl and Martin Toepfer and Florian Lemmerich and Andreas Hotho and Frank Puppe},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},

in EndNote Style

JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
SN - 978-989-8425-99-7
AU - Kluegl P.
AU - Toepfer M.
AU - Lemmerich F.
AU - Hotho A.
AU - Puppe F.
PY - 2012
SP - 240
EP - 249
DO - 10.5220/0003706602400249