Identifying Boundaries and Semantic Labels of Economic Entities Using Stacking and Re-sampling

Katia Lida Kermanidis

Abstract

Semantic entities of the economic domain are detected and labeled in free Modern Greek text using Instance-based learning in two phases (stacking) to force the classifier to learn from its mistakes, and random under sampling of the majority class to improve classification accuracy of the instances of the minority classes. By not making use of any external sources (gazetteers etc), and limited linguistic information for pre-processing, a mean f-score value of 73.3% for the minority classes is achieved.

References

  1. Ciaramita, M., Altun, Y.: Named Entity Recognition in Novel Domains with External Lexical Knowledge. In Workshop on Advances in Structured Learning for Text and Speech Processing (NIPS) (2005)
  2. Daelemans, W., van den Bosch, A., Zavrel, J.: Forgetting Exceptions is Harmful in Language Learning. Machine Learning, Vol. 34, (1999) 11-41
  3. Hendrickx, I., van den Bosch, A.: Memory-based One-step Named-entity Recognition: Effects of Seed List Features, Classifier Stacking and Unannotated Data. Proceedings of the 7th Conference on Computational Natural Language Learning (CoNNL), Edmonton, Canada (2003)
  4. Kermanidis, K., Fakotakis, N., Kokkinakis, G.: DELOS: An Automatically Tagged Economic Corpus for Modern Greek. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), Las Palmas de Gran Canaria (2002) 93-100
  5. Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets. Proceedings of the International Conference on Machine Learning (ICML) (1997) 179- 186.
  6. Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. Proceedings of the 8th Conference on Artificial Intelligence in Medicine in Europe. Cascais, Portugal (2001) 63-66
  7. Radu, F., Ittycheriah A., Jing H., Zhang T.: Named Entity Recognition through Classifier Combination. Proceedings of the 7th Conference on Computational Natural Language Learning (CoNNL), Edmonton, Canada (2003) 168-171
  8. Sgarbas, K., Fakotakis, N., Kokkinakis, G.: A Straightforward Approach to Morphological Analysis and Synthesis, In Proceedings of the Workshop on Computational Lexicography and Multimedia Dictionaries (COMLEX), Kato Achaia, Greece (2000) 31-34
  9. Sporleder, C., van Erp, M., Porcelijn, T., van den Bosch, A., Arntzen, P.: Identifying Named Entities in Text Databases from the Natural History Domain. In Proceedings of the 5th International Conference on Language Resources and Evaluation (2006)
  10. Tsukamoto, K., Mitsuishi, Y., Sassano, M.: Learning with Multiple Stacking for Named Entity Recognition. In Proceedings of the 6th Conference on Natural Language Learning, Taipei, Taiwan (2002) 1-4
  11. Wu. C., Jan, S., Tsai, T., Hsu, W.: On Using Ensemble Methods for Chinese Named Entity Recognition. Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, Sydney, Australia (2006) 142-145
Download


Paper Citation


in Harvard Style

Lida Kermanidis K. (2007). Identifying Boundaries and Semantic Labels of Economic Entities Using Stacking and Re-sampling . In Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007) ISBN 978-972-8865-97-9, pages 149-158. DOI: 10.5220/0002414101490158


in Bibtex Style

@conference{nlpcs07,
author={Katia Lida Kermanidis},
title={Identifying Boundaries and Semantic Labels of Economic Entities Using Stacking and Re-sampling},
booktitle={Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007)},
year={2007},
pages={149-158},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002414101490158},
isbn={978-972-8865-97-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007)
TI - Identifying Boundaries and Semantic Labels of Economic Entities Using Stacking and Re-sampling
SN - 978-972-8865-97-9
AU - Lida Kermanidis K.
PY - 2007
SP - 149
EP - 158
DO - 10.5220/0002414101490158