POS Tagging-probability Weighted Method for Matching the Internet Recipe Ingredients with Food Composition Data

Tome Eftimov, Barbara Korousicg Seljak

Abstract

In this paper, we present a new method that can be used for matching recipe ingredients extracted from the Internet to nutritional data from food composition databases (FCDBs). The method uses part of speech tagging (POS tagging) to capture the information from the names of the ingredients and the names of the food analyses from FCDBs. Then, probability weighted model is presented, which takes into account the information from POS tagging to assign the weight on each match and the match with the highest weight is used as the most relevant one and can be used for further analyses. We evaluated our method using a collection of 721 lunch recipes, from which we extracted 1,615 different ingredients and the result showed that our method can match 91.82% of the ingredients with the FCDB.

References

  1. Alani, H., Kim, S., Millard, D. E., Weal, M. J., Hall, W., Lewis, P. H., and Shadbolt, N. R. (2003). Automatic ontology-based knowledge extraction from web documents. Intelligent Systems, IEEE, 18(1):14-21.
  2. AllRecipes. Allrecipes website. http://allrecipes.com/. Accessed: 2015-05-04.
  3. A. Voutilainen (2003). Part-of-speech tagging. The Oxford handbook of computational linguistics, pages 219- 232.
  4. EuroFIR. Eurofir website. http://www.eurofir.org/. Accessed: 2015-05-04.
  5. Han, X., Sun, L., and Zhao, J. (2011). Collective entity linking in web text: a graph-based method. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 765-774. ACM.
  6. H. Greenfield and D. Southgate (2003). Food composition data: production, management, and use. Food & Agriculture Org.
  7. J. Freyne and S. Berkovsky (2010). Intelligent food planning: personalized recipe recommendation. In Proceedings of the 15th international conference on Intelligent user interfaces, pages 321-324. ACM.
  8. J. Plisson, N. Lavrac, and D. Mladenic (2004). A rule based approach to word lemmatization. Proceedings of IS2004, pages 83-86.
  9. LIRMM. Lirmm. http://data.lirmm.fr/ontologies/food/. Accessed: 2015-05-04.
  10. Mihalcea, R. and Csomai, A. (2007). Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 233-242. ACM.
  11. M. Muller, M. Harvey, D. Elsweiler, and S. Mika (2012). Ingredient matching to determine the nutritional properties of internet-sourced recipes. In Pervasive Computing Technologies for Healthcare (PervasiveHealth), 2012 6th International Conference on, pages 73-80. IEEE.
  12. MyFridgeFood. Myfridgefood website. http:// myfridgefood.com/. Accessed: 2015-08-20.
  13. Ontology, B.-F. Bbc - food ontology. http://www.bbc.co.uk/ ontologies/fo/. Accessed: 2015-05-04.
  14. RecipeMatcher. Recipematcher website. http:// www.recipematcher.com/. Accessed: 2015-08-20.
  15. R. Real and J. M. Vargas (1996). The probabilistic basis of jaccard's index of similarity. Systematic biology, pages 380-385.
  16. Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513-523.
  17. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing, volume 12, pages 44-49. Citeseer.
  18. Supercook. Supercook website. http:// www.supercook.com/. Accessed: 2015-08-20.
  19. Tian, Y. and Lo, D. (2015). A comparative study on the effectiveness of part-of-speech tagging techniques on bug reports. In Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on, pages 570-574. IEEE.
  20. Y.Picó (2012). Chemical analysis of food: Techniques and applications. Academic Press.
Download


Paper Citation


in Harvard Style

Eftimov T. and Korousicg Seljak B. (2015). POS Tagging-probability Weighted Method for Matching the Internet Recipe Ingredients with Food Composition Data . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 330-336. DOI: 10.5220/0005612303300336


in Bibtex Style

@conference{kdir15,
author={Tome Eftimov and Barbara Korousicg Seljak},
title={POS Tagging-probability Weighted Method for Matching the Internet Recipe Ingredients with Food Composition Data},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={330-336},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005612303300336},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - POS Tagging-probability Weighted Method for Matching the Internet Recipe Ingredients with Food Composition Data
SN - 978-989-758-158-8
AU - Eftimov T.
AU - Korousicg Seljak B.
PY - 2015
SP - 330
EP - 336
DO - 10.5220/0005612303300336