ISSUES WITH PARTIALLY MATCHING FEATURE FUNCTIONS IN CONDITIONAL EXPONENTIAL MODELS

Carsten Elfers; Hartmut Messerschmidt; Otthein Herzog

doi:10.5220/0003855205710578

ISSUES WITH PARTIALLY MATCHING FEATURE FUNCTIONS IN CONDITIONAL EXPONENTIAL MODELS

Carsten Elfers, Hartmut Messerschmidt, Otthein Herzog

2012

Abstract

Conditional Exponential Models (CEM) are effectively used in several machine learning approaches, e.g., in Conditional Random Fields. Their feature functions are typically either satisfied or not. This paper presents a way to use partially matching feature functions which are satisfied to some degree and corresponding issues while training. Using partially matching feature functions improves the inference accuracy in domains with sparse reference data and avoids overfitting. Unfortunately, the typically used Maximum Likelihood training includes some issues for using partially matching feature functions. In this context three problems (inequality of influence, unlimited weight boundaries and local optima in parameter space) with Improved Iterative Scaling (a popular training algorithm for Conditional Exponential Models) using such feature functions are stated and solved.

References

Anderson, C. R., Domingos, P., and Weld, D. S. (2002). Relational Markov models and their application to adaptive web navigation. In Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 7802, pages 143-152, New York, NY, USA. ACM.
Bancarz, I. and Osborne, M. (2002). Improved iterative scaling can yield multiple globally optimal models with radically differing performance levels. In Proceedings of the 19th International Conference on Computational Linguistics, volume 1, pages 1-7, Morristown, NJ, USA. Association for Computational Linguistics.
Berger, A. (1997). The improved iterative scaling algorithm: A gentle introduction.
Berger, A. L., Pietra, V. J. D., and Pietra, S. A. D. (1996). A maximum entropy approach to natural language processing. In Computational Linguistics, volume 22, pages 39-71, Cambridge, MA, USA. MIT Press.
Chen, S. and Rosenfeld, R. (2000). A survey of smoothing techniques for ME models. In Speech and Audio Processing, IEEE Transactions on, volume 8, pages 37 -50.
Elfers, C., Horstmann, M., Sohr, K., and Herzog, O. (2010). Typed linear chain conditional random fields and their application to intrusion detection. In Proceedings of the 11th International Conference on Intelligent Data Engineering and Automated Learning, Lecture Notes in Computer Science. Springer Verlag Berlin.
Gupta, K. K., Nath, B., and Ramamohanarao, K. (2010). Layered approach using conditional random fields for intrusion detection. In IEEE Transactions on Dependable and Secure Computing.
Jin, R., Yan, R., Zhang, J., and Hauptmann, A. G. (2003). A faster iterative scaling algorithm for conditional exponential model. In Proceedings of the 20th International Conference on Machine Learning, pages 282- 289.
Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML 7801, pages 282-289, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
McCallum, A., Freitag, D., and Pereira, F. C. N. (2000). Maximum entropy markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML 7800, pages 591-598, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
Oblinger, D., Castelli, V., Lau, T., and Bergman, L. D. (2005). Similarity-based alignment and generalization. In Proceedings of ECML 2005.
Pietra, S. D., Pietra, V. D., and Lafferty, J. (1997). Inducing features of random fields. In IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 19, pages 380 -393.
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE, volume 77, pages 257-286.
Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modeling. In Computer, Speech and Language, volume 10, pages 187-228.

Download

Paper Citation

in Harvard Style

Elfers C., Messerschmidt H. and Herzog O. (2012). ISSUES WITH PARTIALLY MATCHING FEATURE FUNCTIONS IN CONDITIONAL EXPONENTIAL MODELS . In Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: SSML, (ICAART 2012) ISBN 978-989-8425-95-9, pages 571-578. DOI: 10.5220/0003855205710578

in Bibtex Style

@conference{ssml12,
author={Carsten Elfers and Hartmut Messerschmidt and Otthein Herzog},
title={ISSUES WITH PARTIALLY MATCHING FEATURE FUNCTIONS IN CONDITIONAL EXPONENTIAL MODELS},
booktitle={Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: SSML, (ICAART 2012)},
year={2012},
pages={571-578},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003855205710578},
isbn={978-989-8425-95-9},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: SSML, (ICAART 2012)
TI - ISSUES WITH PARTIALLY MATCHING FEATURE FUNCTIONS IN CONDITIONAL EXPONENTIAL MODELS
SN - 978-989-8425-95-9
AU - Elfers C.
AU - Messerschmidt H.
AU - Herzog O.
PY - 2012
SP - 571
EP - 578
DO - 10.5220/0003855205710578