Natural Language Processing System Applied in Public Health for Assessment of an Automatic Analysis of Patterns Generator

Anabel Fraga, Juan Llorens, Valeria Rodríguez, Valentin Moreno

Abstract

Nowadays, there are many scientific articles referring to any topic like medicine, technology, economics, finance, and so on. These articles are better known as papers, they represent the evaluation and interpretation of different arguments, showing results of scientific interest. At the end, most of these are published in magazines, books, journals, etc. Due to the fact that these papers are created with a higher frequency it is feasible to analyse how people write in the same domain. At the level of structure and with the help of graphs some of the results that can be found are: groups of words that are used (to determine if they come from a specific vocabulary), most common grammatical categories, most repeated words in a domain, patterns found, and frequency of patterns found. This research has been created to fulfil these needs. A domain of public health has been selected and it is composed of 800 papers about different topics referring to genetics such as mutations, genetic deafness, DNA, trinucleotide, suppressor genes, among others; and an ontology of public health has been used to provide the basis of the study.

References

  1. Abney, Steven. Part-of-Speech Tagging and Partial Parsing, S. young and G. Bloothooft (eds.) CorpusBased Methods in Language and Speech Processing. An ELSNET book. Bluwey Academic Publishers, Dordrecht. 1997.
  2. Alonso, Laura. Herramientas Libres para Procesamiento del Lenguaje Natural. Facultad de Matemática, Astronomía y Física. UNC, Córdoba, Argentina. 5tas Jornadas Regionales de Software Libre. 20 de noviembre de 2005. Available in: http://www.cs.famaf.unc.edu.ar/laura/freeNLP
  3. Amsler, R. A. A taxonomy for English nouns and verbs. Proceedings of the 19th annual Meeting of the Association for Computational Linguistic. Stanford, California, 1981. Pp. 133-138.
  4. Carreras, xavier. Márquez, luis. phrase recognition by filtering and ranking with perceptrons. en proceedings of the 4th ranlp conference, borovets, bulgaria, september 2003.
  5. Cowie, Jim. Wilks, Yorick. Information Extraction. En DALE, R. (ed). Handbook of Natural Language Processing. New York: Marcel Dekker, 2000. Pp.241- 260.
  6. Dale, R. Symbolic Approaches to Natural Language Processing. En Dale, R (ed). Handbook of Natural Language Processing. New York: Marcel Dekker, 2000.
  7. Gómez-Pérez, Asunción. Fernando-López, Mariano. Corcho, Oscar. Ontological Engineering. London: Springer, 2004.
  8. Hopcroft, J. E. Ullman, J. D. introduction to automata theory, languages and computations. addison-wesley, reading, ma, united states. 1979.
  9. Llorens, J., Morato, J., Genova, G. RSHP: an information representation model based on relationships. in ernesto damiani, lakhmi c. jain, mauro madravio (eds.), soft computing in software engineering (studies in fuzziness and soft computing series, vol. 159), springer 2004, pp. 221-253.
  10. Llorens, Juan. Definición de una Metodología y una Estructura de Repositorio Orientadas a la Reutilización: El Tesauro de Software. Universidad Carlos III. 1996.
  11. Manning Christopher, "Foundations of Statistic Natural Language Processing", Cambridge University, England, 1999, 81
  12. Martí, M. A. Llisterri, J. Tratamiento del Lenguaje Natural. Barcelona: Universitat de Barcelona, 2002. p. 207.
  13. Moreno, Valentín. Representación del Conocimiento de Proyectos de Software Mediante Técnicas Automatizadas. Anteproyecto de Tesis Doctoral. Universidad Carlos III de Madrid. Marzo 2009.
  14. Poesio, M. semantic analysis. en dale, r. (ed). handbook of natural language processing. new york: marcel dekker, 2000.
  15. Rehberg, C. P. Automatic Pattern Generation in Natural Language Processing. United States Patent. US 8,180,629 b2. May 15, 2012. January, 2010.
  16. Riley, M. D. Some Applications of Tree-based Modeling to Speech and Language Indexing. Proceedings of the Darpa Speech and Natural Language Workshop. California: Morgan Kaufmann, 1989. pp. 339-352.
  17. Suarez, P., Moreno, V., Fraga, A., Llorens, J. Automatic Generation of Semantic Patterns using Techniques of Natural Language Processing. SKY 2013: 34-44
  18. Thomason, Richmond H. What is Semantics? Version 2. March 27, 2012. Available in: http://web.eecs.umich.edu/rthomaso/documents/gene ral/what-is-semantics.html
  19. Triviño, J. L. Morales Bueno, R. A Spanish Pos Tagger with Variable Memory. in Proceedings of the Sixth International Workshop On Parsing Technologies (iwpt-2000). ACL/SIGPARSE, Trento, Italia, 2000. pp. 254-265.
  20. Weischedel, R. Metter, M. Schwartz, r. Ramshaw, L. Palmucci, J. coping with ambiguity and unknown through probabilistic models. computational linguistics, vol. 19, pp. 359-382.
Download


Paper Citation


in Harvard Style

Fraga A., Llorens J., Rodríguez V. and Moreno V. (2015). Natural Language Processing System Applied in Public Health for Assessment of an Automatic Analysis of Patterns Generator . In Proceedings of the 6th International Workshop on Software Knowledge - Volume 1: SKY, (IC3K 2015) ISBN 978-989-758-162-5, pages 27-34. DOI: 10.5220/0005646400270034


in Bibtex Style

@conference{sky15,
author={Anabel Fraga and Juan Llorens and Valeria Rodríguez and Valentin Moreno},
title={Natural Language Processing System Applied in Public Health for Assessment of an Automatic Analysis of Patterns Generator},
booktitle={Proceedings of the 6th International Workshop on Software Knowledge - Volume 1: SKY, (IC3K 2015)},
year={2015},
pages={27-34},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005646400270034},
isbn={978-989-758-162-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Workshop on Software Knowledge - Volume 1: SKY, (IC3K 2015)
TI - Natural Language Processing System Applied in Public Health for Assessment of an Automatic Analysis of Patterns Generator
SN - 978-989-758-162-5
AU - Fraga A.
AU - Llorens J.
AU - Rodríguez V.
AU - Moreno V.
PY - 2015
SP - 27
EP - 34
DO - 10.5220/0005646400270034