7.1 Perspective & Future Work
One of the today’s most challenging issues are the correct handling of Free Natural
Language. The presented system adds some power to this crusade being able to
handle many errors and decently guess words in a ‘dirty’ & ‘misspelled’ environment.
The actual target is Human Computer Interface (HCI) for Dialog/Chat/Email/Blog
text processing.
We developed a dictionary editor to handle rules and semantic tags. With this tool
was able to add over 300 of Greek, Latin, and German affixes, including a lot of
‘human-like’ semantic data.
An interesting field of applications is recognition of parasynthetic words in
bio/medical records and scientific text. Medical dictionaries Espasa [21], has only 23k
words, while Snomed-CT [20], has >600k mostly OOV multi word-terms.
We think that the semantic extraction based on morphology may prove useful for
further NLP processing like Word Sense Disambiguation (WSD).
Indeed any human-machine dialog systems may benefit on a fast, robust and
compact tagger like this.
References
1 Hohendahl, Andrés T. & Zelasco, José F. 2006. Algoritmos eficientes para detección
temprana de errores y clasificación idiomática para uso en procesamiento de lenguaje
natural y texto, WICC2006 - ISBN 950-9474-35-5
2 Diccionarios españoles: http://www3.unileon.es/dp/dfh/jmr/dicci/012.htm
3 Pedro Luis Díez Orzas 1999. Estudios de Lingüística Española LA RELACIÓN DE
MERONIMIA EN LOS SUSTANTIVOS DEL LÉXICO ESPAÑOL: CONTRIBUCIÓN A
LA SEMÁNTICA COMPUTACIONAL Volumen 2 (1999) ISSN: 1139-8736
4 Shannon, Huffman compression: http://www.cbloom.com/algs/statisti.html
5 FreeLing: http://www.lsi.upc.es/~nlp/freeling/
6 FLANOM: Flexionador y lematizador automático de formas nominales. Santana, O.; Pérez,
J.; Carreras, F.; Duque, J.; Hernández, Z.; Rodríguez, G. Lingüística Española Actual XXI,
2, 1999. Ed. Arco/Libros, S.L. 253/297
7 FLAVER: Flexionador y lematizador automático de formas verbales. Santana, O.; Pérez, J.;
Hernández, Z.; Carreras, F.; Rodríguez, G. Lingüística Española Actual XIX, 2, 1997. Ed.
Arco/Libros, S.L. 229/282
8 ASPELL Affix compression: http://aspell.sourceforge.net/man-html/Affix-
Compression.html
9 Expresiones Regulares: http://www.regular-expressions.info/
10 DRAE Diccionario de la Real Academia Española http://buscon.rae.es/diccionario/drae.htm
11 ISPELL www.gnu.org/software/ispell/ispell.html
12 NetSpell http://sourceforge.net/projects/netspell/
13 TRIE http://www.cs.bu.edu/teaching/c/tree/trie/
14 TST -Ternary Search Tree: www.nist.gov/dads/HTML/ternarySearchTree.html
15 Open Office Dictionaries: http://lingucomponent.openoffice.org/spell_dic.html
16 Relaciones morfoléxicas prefijales del español. Santana, O.; Carreras, F.; Pérez, J.;
Rodríguez, G. Boletín de Lingüística, Vol. 22. ISSN: 0798-9709. Jul/Dic, 2004. 79/123.
17 J Bentley & R - Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1997
18 Mehlhorn, K. Dynamic Binary Search. SIAM Journal on Computing 8, 2 (May 1979),
175-198.
117