Tagging with Disambiguation Rules - A New Evolutionary Approach to the Part-of-Speech Tagging Problem

Ana Paula Silva, Arlindo Silva, Irene Rodrigues

2012

Abstract

In this paper we present an evolutionary approach to the part-of-speech tagging problem. The goal of part-of-speech tagging is to assign to each word of a text its part-of-speech. The task is not straightforward, because a large percentage of words has more than one possible part-of-speech, and the right choice is determined by the surrounding word’s part-of-speeches. This means that to solve this problem we need a method to disambiguate a word’s possible tags set. Traditionally there are two groups of methods used to tackle this task. The first group is based on statistical data concerning the different context’s possibilities for a word, while the second group is based on rules, normally designed by human experts, that capture the language properties. In this work we present a solution that tries to incorporate both these approaches. The proposed system is divided in two components. First, we use an evolutionary algorithm that for each part-of-speech tag of the training corpus, evolves a set of disambiguation rules. We then use a second evolutionary algorithm, guided by the rules found earlier, to solve the tagging problem. The results obtained on two different corpora are amongst the best ones published for those corpora.

References

  1. Alba, E., Luque, G., and Araujo, L. (2006). Natural language tagging with genetic algorithms. Information Processing Letters, 100(5):173 - 182.
  2. Araujo, L. (2002). Part-of-speech tagging with evolutionary algorithms. In Gelbukh, A., editor, Computational Linguistics and Intelligent Text Processing, volume 2276 of Lecture Notes in Computer Science, pages 187-203. Springer Berlin / Heidelberg.
  3. Araujo, L. (2004). Symbiosis of evolutionary techniques and statistical natural language processing. Evolutionary Computation, IEEE Transactions on, 8(1):14 - 27.
  4. Araujo, L. (2006). Multiobjective genetic programming for natural language parsing and tagging. In Runarsson, T., Beyer, H.-G., Burke, E., Merelo-Guervós, J., Whitley, L., and Yao, X., editors, Parallel Problem Solving from Nature - PPSN IX, volume 4193 of Lecture Notes in Computer Science, pages 433-442. Springer Berlin / Heidelberg.
  5. Araujo, L. (2007). How evolutionary algorithms are applied to statistical natural language processing. Artificial Intelligence Review, 28(4):275-303.
  6. Araujo, L., Luque, G., and Alba, E. (2004). Metaheuristics for natural language tagging. In Genetic and Evolutionary Computation - GECCO 2004, Genetic and Evolutionary Computation Conference, volume 3102 of Lecture Notes in Computer Science, pages 889- 900. Springer.
  7. Brill, E. (1995). Transformation-based error-driven learning and natural language processing: a case study in partof-speech tagging. Comput. Linguist., 21:543-565.
  8. De Jong, K. A., Spears, W. M., and Gordon, D. F. (1993). Using genetic algorithms for concept learning. Machine Learning, 13:161-188. 10.1023/A:1022617912649.
  9. Freitas, A. A. (2003). A survey of evolutionary algorithms for data mining and knowledge discovery, pages 819- 845. Springer-Verlag New York, Inc., New York, NY, USA.
  10. Giordana, A. and Neri, F. (1995). Search-intensive concept induction. Evol. Comput., 3:375-416.
  11. Greene, D. P. and Smith, S. F. (1993). Competition-based induction of decision models from examples. Machine Learning, 13:229-257.
  12. Hindle, D. (1989). Acquiring disambiguation rules from text.
  13. Janikow, C. Z. (1993). A knowledge-intensive genetic algorithm for supervised learning. Machine Learning, 13:189-228. 10.1007/BF00993043.
  14. Noda, E., Freitas, A., and Lopes, H. (1999). Discovering interesting prediction rules with a genetic algorithm. In Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on, volume 2, pages 3 vol. (xxxvii+2348).
  15. Steven Bird, E. K. and Loper, E. (2009). Natural Language Processing with Python. O'Reilly Media.
  16. Wilson, G. and Heywood, M. (2005). Use of a genetic algorithm in brill's transformation-based part-of-speech tagger. In Proceedings of the 2005 conference on Genetic and evolutionary computation, GECCO 7805, pages 2067-2073, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Paula Silva A., Silva A. and Rodrigues I. (2012). Tagging with Disambiguation Rules - A New Evolutionary Approach to the Part-of-Speech Tagging Problem . In Proceedings of the 4th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2012) ISBN 978-989-8565-33-4, pages 5-14. DOI: 10.5220/0004112000050014


in Bibtex Style

@conference{ecta12,
author={Ana Paula Silva and Arlindo Silva and Irene Rodrigues},
title={Tagging with Disambiguation Rules - A New Evolutionary Approach to the Part-of-Speech Tagging Problem},
booktitle={Proceedings of the 4th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2012)},
year={2012},
pages={5-14},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004112000050014},
isbn={978-989-8565-33-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2012)
TI - Tagging with Disambiguation Rules - A New Evolutionary Approach to the Part-of-Speech Tagging Problem
SN - 978-989-8565-33-4
AU - Paula Silva A.
AU - Silva A.
AU - Rodrigues I.
PY - 2012
SP - 5
EP - 14
DO - 10.5220/0004112000050014