Do for all λ
i
:
(3) Update λ
i
← λ
i
+ δ
i
(4) Ensure that λ
i
is in a given weight
interval
Step (1) and the splitting of the convergence loop
into (2) and (3) solves Problem 3, step (4) solves
Problem 1 and Problem 2.
5 CONCLUSIONS AND
OUTLOOK
In this paper the Conditional Exponential Model
(which is used in Maximum Entropy Markov Models
and Conditional Random Fields) has been extended
to be used with partially matching feature functions.
This work enables the use of partially matching fea-
ture functions with Conditional Exponential Models
and Improved Iterative Scaling in a well-defined way
to overcome the problem of missing features. It has
been shown that the influence of partially matching
feature functions on the posterior probability changes
in the correct direction (i.e., monotonicity). Further
the impact of the weights has been analyzed. Prob-
lems regarding IIS have been identified and a solution
in a modified algorithm has been developed. Addi-
tionally the problem of overfitting is addressed by al-
lowing potentially all feature functions to be satisfied
to some degree of matching (and therefore smooth the
posterior distribution). In future work we are going to
show how partially matching feature functions may
be defined in a semantically intuitive way and present
empirical results of such a combined method. First
steps have already been done in the domain of intru-
sion detection.
ACKNOWLEDGEMENTS
This work was supported by the German Federal Min-
istry of Education and Research (BMBF) under the
grant 01IS08022A.
REFERENCES
Anderson, C. R., Domingos, P., and Weld, D. S. (2002). Re-
lational Markov models and their application to adap-
tive web navigation. In Proceedings of the eighth
ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, KDD ’02, pages
143–152, New York, NY, USA. ACM.
Bancarz, I. and Osborne, M. (2002). Improved itera-
tive scaling can yield multiple globally optimal mod-
els with radically differing performance levels. In
Proceedings of the 19th International Conference
on Computational Linguistics, volume 1, pages 1–7,
Morristown, NJ, USA. Association for Computational
Linguistics.
Berger, A. (1997). The improved iterative scaling algo-
rithm: A gentle introduction.
Berger, A. L., Pietra, V. J. D., and Pietra, S. A. D. (1996). A
maximum entropy approach to natural language pro-
cessing. In Computational Linguistics, volume 22,
pages 39–71, Cambridge, MA, USA. MIT Press.
Chen, S. and Rosenfeld, R. (2000). A survey of smooth-
ing techniques for ME models. In Speech and Audio
Processing, IEEE Transactions on, volume 8, pages
37 –50.
Elfers, C., Horstmann, M., Sohr, K., and Herzog, O. (2010).
Typed linear chain conditional random fields and their
application to intrusion detection. In Proceedings of
the 11th International Conference on Intelligent Data
Engineering and Automated Learning, Lecture Notes
in Computer Science. Springer Verlag Berlin.
Gupta, K. K., Nath, B., and Ramamohanarao, K. (2010).
Layered approach using conditional random fields for
intrusion detection. In IEEE Transactions on Depend-
able and Secure Computing.
Jin, R., Yan, R., Zhang, J., and Hauptmann, A. G. (2003).
A faster iterative scaling algorithm for conditional ex-
ponential model. In Proceedings of the 20th Interna-
tional Conference on Machine Learning, pages 282–
289.
Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001).
Conditional random fields: Probabilistic models for
segmenting and labeling sequence data. In Proceed-
ings of the Eighteenth International Conference on
Machine Learning, ICML ’01, pages 282–289, San
Francisco, CA, USA. Morgan Kaufmann Publishers
Inc.
McCallum, A., Freitag, D., and Pereira, F. C. N. (2000).
Maximum entropy markov models for information ex-
traction and segmentation. In Proceedings of the Sev-
enteenth International Conference on Machine Learn-
ing, ICML ’00, pages 591–598, San Francisco, CA,
USA. Morgan Kaufmann Publishers Inc.
Oblinger, D., Castelli, V., Lau, T., and Bergman, L. D.
(2005). Similarity-based alignment and generaliza-
tion. In Proceedings of ECML 2005.
Pietra, S. D., Pietra, V. D., and Lafferty, J. (1997). In-
ducing features of random fields. In IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
volume 19, pages 380 –393.
Rabiner, L. R. (1989). A tutorial on hidden markov mod-
els and selected applications in speech recognition. In
Proceedings of the IEEE, volume 77, pages 257–286.
Rosenfeld, R. (1996). A maximum entropy approach to
adaptive statistical language modeling. In Computer,
Speech and Language, volume 10, pages 187–228.
ICAART 2012 - International Conference on Agents and Artificial Intelligence
578