5 CONCLUSIONS AND
PERSPECTIVES
The main objective of this work is to propose and im-
plement a Czech NER system that facilitates the data
searching from the
ˇ
CTK text news databases. We pre-
sented the CRF based method for automatic named
entity recognition. The particular focus was on the
NE feature selection. We have proved that the fea-
ture choice plays a crucial role for the NE recogni-
tion. We have also shown that the language indepen-
dent features are much more important than the lan-
guage dependent ones. The best obtained F-measure
is 58.4% which represents the increase of F-measure
14.4% over the baseline in absolute value.
The first perspective consists of evaluation of the
system over the
ˇ
CTK data. This includes the data an-
notation, because the current text data is not annotated
by named entities. We further propose to integrate in
the system new information coming from the other
knowledge sources, in particular from the syntactic
parsing module. Detailed syntactic features are most
often underexploited. However they bring clearly ad-
ditional information to the NE recognition.
REFERENCES
Abdul Hamid, A. and Darwish, K. (2010). Simplified fea-
ture set for arabic named entity recognition. In Pro-
ceedings of the 2010 Named Entities Workshop, pages
110–115. Association for Computational Linguistics.
Curran, J. R. and Clark, S. (2003). Language independent
ner using a maximum entropy tagger. In Proceedings
of the seventh conference on Natural language learn-
ing at HLT-NAACL 2003 - Volume 4, CONLL ’03,
pages 164–167, Edmonton, Canada. Association for
Computational Linguistics.
Ekbal, A. and Bandyopadhyay, S. (2010). Named entity
recognition using support vector machine: A language
independent approach.
Ekbal, A., Saha, S., and Garbe, C. S. (2010). Feature selec-
tion using multiobjective optimization for named en-
tity recognition. In International Conference on Pat-
tern Recognition, pages 1937–1940.
Favre, B., Hakkani-T¨ur, D., and Shriberg, E. (2009).
Syntactically-informed models for comma prediction.
pages 4697–4700, Taipei, Taiwan.
Georgiev, G., Nakov, P., Ganchev, K., and Osenova, P.
(2009). Feature-rich named entity recognition for bul-
garian using conditional random fields. aclweborg,
pages 113–117.
Gravier, G. (2005). The ester phase ii evaluation campaign
for the rich transcription of french broadcast news. In
European Conf. on Speech Communication and Tech-
nology.
Grishman, R. and Sundheim, B. (1996). Message under-
standing conference-6: a brief history. In Proceed-
ings of the 16th conference on Computational linguis-
tics - Volume 1, COLING ’96, pages 466–471, Copen-
hagen, Denmark. Association for Computational Lin-
guistics.
Isozaki, H. and Kazawa, H. (2002). Efficient support vector
classifiers for named entity recognition. In Proceed-
ings of the 19th international conference on Compu-
tational linguistics - Volume 1, COLING ’02, pages
1–7, Taipei, Taiwan. Association for Computational
Linguistics.
Jan Hajic, e. a. (2005). Manual for morphological anno-
tation, revision for the prague dependency treebank
2.0. Technical Report TR-2005-27,
´
UFAL MFF UK,
Praha, Czechia.
Kozareva, Z., Ferr´andez, O., Montoyo, A., Mu˜noz, R.,
Su´arez, A., and G´omez, J. (2007). Combining data-
driven systems for improving named entity recogni-
tion. Data & Knowledge Engineering, 61:449–466.
Kravalov´a, J.,
ˇ
Sevˇc´ıkov´a, M., and
ˇ
Zabokrtsk´y, Z. (2009).
Czech Named Entity Corpus 1.0.
Kravalov´a, J. and
ˇ
Zabokrtsk´y, Z. (2009). Czech named en-
tity corpus and svm-based recognizer. In Proceedings
of the 2009 Named Entities Workshop: Shared Task
on Transliteration, NEWS ’09, pages 194–201, Sun-
tec, Singapore. Association for Computational Lin-
guistics.
Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001).
Conditional random fields: Probabilistic models for
segmenting and labeling sequence data. In Proceed-
ings of the Eighteenth International Conference on
Machine Learning, ICML ’01, pages 282–289, San
Francisco, CA, USA. Morgan Kaufmann Publishers
Inc.
McCallum, A. and Li, W. (2003). Early results for named
entity recognition with conditional random fields, fea-
ture induction and web-enhanced lexicons. In Pro-
ceedings of the seventh conference on Natural lan-
guage learning at HLT-NAACL 2003 - Volume 4,
CONLL ’03, pages 188–191, Edmonton, Canada. As-
sociation for Computational Linguistics.
Sang, T. K. and Erik, F. (2002). Introduction to the conll-
2002 shared task: language-independent named entity
recognition. In Proceedings of the 19th international
conference on Computational linguistics, pages 1–4,
Taipei, Taiwan.
Santos, D., Seco, N., Cardoso, N., and Vilela, R. (2006).
Harem: An advanced ner evaluation contest for por-
tuguese. In Odjik and Daniel Tapias (eds.), Proceed-
ings of LREC 2006 (LREC’2006) (Genoa, pages 22–
28.
Satoshi, S. and Hitoshi, I. (2000). Ir and ie evaluation
project in japanese. In LREC.
Zhou, G. and Su, J. (2002). Named entity recognition us-
ing an hmm-based chunk tagger. In Proceedings of
the 40th Annual Meeting on Association for Computa-
tional Linguistics, ACL ’02, pages 473–480, Philadel-
phia, Pennsylvania. Association for Computational
Linguistics.
FEATURES FOR NAMED ENTITY RECOGNITION IN CZECH LANGUAGE
441