the dataset. Further performance improvement was achieved after balancing the class
distribution using undersampling of the majority class instances. The above tech-
niques deal very well with the large number of class labels, with the low level of pre-
processing, as well as the complicated nature of the corpus.
Acknowledgements
We thank the European Social Fund (ESF), Operational Program for Educational and
Vocational Training II (EPEAEK II), and particularly the Program PYTHAGORAS
II, for funding the above work.
References
1. Ciaramita, M., Altun, Y.: Named Entity Recognition in Novel Domains with External
Lexical Knowledge. In Workshop on Advances in Structured Learning for Text and Speech
Processing (NIPS) (2005)
2. Daelemans, W., van den Bosch, A., Zavrel, J.: Forgetting Exceptions is Harmful in Lan-
guage Learning. Machine Learning, Vol. 34, (1999) 11-41
3. Hendrickx, I., van den Bosch, A.: Memory-based One-step Named-entity Recognition:
Effects of Seed List Features, Classifier Stacking and Unannotated Data. Proceedings of
the 7
th
Conference on Computational Natural Language Learning (CoNNL), Edmonton,
Canada (2003)
4. Kermanidis, K., Fakotakis, N., Kokkinakis, G.: DELOS: An Automatically Tagged Eco-
nomic Corpus for Modern Greek. In Proceedings of the 3rd International Conference on
Language Resources and Evaluation (LREC), Las Palmas de Gran Canaria (2002) 93-100
5. Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets. Proceedings of
the International Conference on Machine Learning (ICML) (1997) 179- 186.
6. Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class
Distribution. Proceedings of the 8th Conference on Artificial Intelligence in Medicine in
Europe. Cascais, Portugal (2001) 63-66
7. Radu, F., Ittycheriah A., Jing H., Zhang T.: Named Entity Recognition through Classifier
Combination. Proceedings of the 7
th
Conference on Computational Natural Language
Learning (CoNNL), Edmonton, Canada (2003) 168-171
8. Sgarbas, K., Fakotakis, N., Kokkinakis, G.: A Straightforward Approach to Morphological
Analysis and Synthesis, In Proceedings of the Workshop on Computational Lexicography
and Multimedia Dictionaries (COMLEX), Kato Achaia, Greece (2000) 31−34
9. Sporleder, C., van Erp, M., Porcelijn, T., van den Bosch, A., Arntzen, P.: Identifying
Named Entities in Text Databases from the Natural History Domain. In Proceedings of the
5
th
International Conference on Language Resources and Evaluation (2006)
10. Tsukamoto, K., Mitsuishi, Y., Sassano, M.: Learning with Multiple Stacking for Named
Entity Recognition. In Proceedings of the 6
th
Conference on Natural Language Learning,
Taipei, Taiwan (2002) 1-4
11. Wu. C., Jan, S., Tsai, T., Hsu, W.: On Using Ensemble Methods for Chinese Named Entity
Recognition. Proceedings of the 5
th
SIGHAN Workshop on Chinese Language Processing,
Sydney, Australia (2006) 142-145
158