computed by using the current acquired models. For
example, in CRF-3, the first, second, and third mod-
els build the queue. Along the queue, the evaluations
are better and better. In CRF-1, the second column is
higher than the fourth one because of overfitting. The
performance of the model is better on the training set
than on the test set.
In the example, if X cannot be explained with
a probability higher than the threshold t
m
by all the
models in the queue, CRFs-5 should be chosen to ex-
plain X. The reason is that its second column is the
highest one over the second columns of all the mod-
els. To summarize, the third columns of the first four
models and the second column of the fifth model are
chosen to explain X. Their overall performance, the
fourth column in CRFs-5, is lower than any of these
columns because of overfitting. From another point of
view, the values of the chosen columns are based on
the training data; the overall estimation of the queue
is the evaluation on the test set.
The fourth experiment was designed to evalu-
ate CRF queue. The algorithm shown in Table 2
was run on all data sets. The average results over
5 configurations are shown as the third columns in
Figure 7. The CRF queue outperformed the single
model approaches for about 4% on average in all con-
figurations. We show the number of models in the
queue in the lower parts in Figure 8. The results
are averaged over the 100 sets in each configurations.
The number of models is above 3, which hints that
the queue works well in most cases. {S1, S3} has a
shorter queue because the performances of the single
model approaches in these configurations are better,
as shown in Figure 7. The CRF queue is shorter when
the single model approaches work better.
5 CONCLUSIONS
In this paper, we constructed a simulation fran-
mework to investigate the issues of inducing features
of linear-chain CRFs. The simulation helps to gain
a new phase to compare the simulated CRFs and the
induced CRFs. We used a large amount of experi-
ments to explore the properties of the learned CRFs.
Moreover, we developed a feature reduction method
that can be integrated into the induction process, and a
queue of CRF models can be constructed which yields
a better performance. CRF queues guarantees accu-
racy no worse than the single model approaches.
We did not use the open source CRF toolkit and
did not yet experiment on the benchmarks. In the fu-
ture, we will adapt our code to process the benchmark
data. The simulation framework sets a basis for in-
teresting research on CRFs in several directions. It
would be interesting to explore the bootstrap issues.
In CRF queue, we defined a method to filter the train-
ing set. Another method could be to construct a deci-
sion tree to first classify the training set, then use the
data in each class to induce CRFs.
REFERENCES
Chen, M., Chen, Y., Brent, M. R., and Tenney, A. E. (2009).
Gradient-based feature selection for conditional ran-
dom fields and its applications in computational ge-
netics. In ICTAI ’09: Proceedings of the 2009 21st
IEEE International Conference on Tools with Artifi-
cial Intelligence, pages 750–757, Washington, DC,
USA. IEEE Computer Society.
Dietterich, T. G., Ashenfelter, A., and Bulatov, Y. (2004).
Training conditional random fields via gradient tree
boosting. In ICML ’04: Proceedings of the twenty-
first international conference on Machine learning,
page 28, New York, NY, USA. ACM.
Guyon, I. and Elisseeff, A. (2003). An introduction to
variable and feature selection. J. Mach. Learn. Res.,
3:1157–1182.
Lafferty, J., McCallum, A., and Pereira, F. (2001). Con-
ditional random fields: Probabilistic models for seg-
menting and labeling sequence data. Proc. 18th Inter-
national Conf. on Machine Learning, pages 282–289.
McCallum, A. (2003). Efficiently inducing features of con-
ditional random fields. In UAI, pages 403–410.
Rabiner, L. R. (1990). A tutorial on hidden markov models
and selected applications in speech recognition. pages
267–296.
Vishwanathan, S. V. N., Schraudolph, N. N., Schmidt,
M. W., and Murphy, K. P. (2006). Accelerated train-
ing of conditional random fields with stochastic gra-
dient methods. In ICML ’06: Proceedings of the 23rd
international conference on Machine learning, pages
969–976, New York, NY, USA. ACM.
Zhang, D. and Hornung, A. (2008). A table soccer game
recorder. In Video Proceedings of the IEEE/RSJ In-
ternational Conference on Intelligent Robots and Sys-
tems (IROS).
FEATURE INDUCTION OF LINEAR-CHAIN CONDITIONAL RANDOM FIELDS - A Study based on a Simulation
235