Grammar and Dictionary based Named-entity Linking for Knowledge Extraction of Evidence-based Dietary Recommendations

Tome Eftimov, Barbara Koroušić Seljak, Peter Korošec

Abstract

In order to help people to follow the new knowledge about healthy diet that comes rapidly each day with the new published scientific reports, a grammar and dictionary based named-entity linking method is presented that can be used for knowledge extraction of evidence-based dietary recommendations. The method consists of two phases. The first one is a mix of entity detection and determination of a set of candidates for each entity, and the second one is a candidate selection. We evaluate our method using a corpus from dietary recommendations presented in one sentence provided by the World Health Organization and the U.S. National Library of Medicine. The corpus consists of 50 dietary recommendations and 10 sentences that are not related with dietary recommendations. For 47 out of 50 dietary recommendations the proposed method extract all the useful knowledge, and for remaining 3 only the information for one entity is missing. Due to the 10 sentences that are not dietary recommendation the method does not extract any entities, as expected.

References

  1. A.Voutilainen (2003). Part-of-speech tagging. The Oxford handbook of computational linguistics, pages 219- 232.
  2. Blanco, R., Boldi, P., and Marino, A. (2015). Using graph distances for named-entity linking. Science of Computer Programming.
  3. Campos, D., Matos, S., and Oliveira, J. L. (2013). Chemical name recognition with harmonized feature-rich conditional random fields. In BioCreative Challenge Evaluation Workshop, volume 2, page 82.
  4. Chowdhury, G. G. (2003). Natural language processing. Annual review of information science and technology, 37(1):51-89.
  5. EFSA ((accessed February 18, 2016)). European Food safety Authority. https://www.efsa.europa.eu/.
  6. Gkoutos, G. V., Schoefild, P. N., and Hoehndorf, R. (2012). The units ontology: a tool for integrating units of measurement in science. Database, 2012:bas033.
  7. Hachey, B., Radford, W., Nothman, J., Honnibal, M., and Curran, J. R. (2013). Evaluating entity linking with wikipedia. Artificial intelligence , 194:130-150.
  8. Han, X., Sun, L., and Zhao, J. (2011). Collective entity linking in web text: a graph-based method. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 765-774. ACM.
  9. Marcus, M. P., Marcinkiewicz, M. A., and Santorini, B. (1993). Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2):313-330.
  10. McEnery, T. and Wilson, A. (2001). Corpus linguistics: An introduction. Edinburgh University Press.
  11. Mihalcea, R. and Csomai, A. (2007). Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 233-242. ACM.
  12. Nelson, R. J. (1955). Karnaugh m.. the map method for synthesis of combinational logic circuits. transactions of the american institute of electrical engineers, vol. 72 part i (1953), pp. 593-598. The Journal of Symbolic Logic, 20(02):197-197.
  13. Nunes, T., Campos, D., Matos, S., and Oliveira, J. L. (2013). Becas: biomedical concept recognition services and visualization. Bioinformatics, page btt317.
  14. Rayson, P., Archer, D., Piao, S., and McEnery, A. (2004). The ucrel semantic analysis system.
  15. Rusu, D., Dali, L., Fortuna, B., Grobelnik, M., and Mladenic, D. (2007). Triplet extraction from sentences. In Proceedings of the 10th International Multiconference” Information Society-IS , pages 8-12.
  16. Santorini, B. (1990). Part-of-speech tagging guidelines for the penn treebank project (3rd revision).
  17. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing, volume 12, pages 44-49. Citeseer.
  18. Taylor, A., Marcus, M., and Santorini, B. (2003). The penn treebank: an overview. In Treebanks, pages 5-22. Springer.
  19. Tian, Y. and Lo, D. (2015). A comparative study on the effectiveness of part-of-speech tagging techniques on bug reports. In Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on, pages 570-574. IEEE.
  20. Vorster, H., Love, P., and Browne, C. (2001). Development of food-based dietary guidelines for south africa: the process. S Afr J Clin Nutr, 14(3).
  21. Wilson, A. and Thomas, J. (1997). Semantic annotation. Corpus Annotation. Longman, London.
Download


Paper Citation


in Harvard Style

Eftimov T., Koroušić Seljak B. and Korošec P. (2016). Grammar and Dictionary based Named-entity Linking for Knowledge Extraction of Evidence-based Dietary Recommendations . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 150-157. DOI: 10.5220/0006032401500157


in Bibtex Style

@conference{kdir16,
author={Tome Eftimov and Barbara Koroušić Seljak and Peter Korošec},
title={Grammar and Dictionary based Named-entity Linking for Knowledge Extraction of Evidence-based Dietary Recommendations},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},
year={2016},
pages={150-157},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006032401500157},
isbn={978-989-758-203-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - Grammar and Dictionary based Named-entity Linking for Knowledge Extraction of Evidence-based Dietary Recommendations
SN - 978-989-758-203-5
AU - Eftimov T.
AU - Koroušić Seljak B.
AU - Korošec P.
PY - 2016
SP - 150
EP - 157
DO - 10.5220/0006032401500157