Effect of the Named Entity Recognition and Sliding Window on the HONcode Automated Detection of HONcode Criteria for Mass Health Online Content
Celia Boyer, Ljiljana Dolamic, Patrick Ruch, Gilles Falquet
2016
Abstract
The Health On the Net’s Foundation (HON) Code of Conduct, HONcode, is the oldest and the most used ethical and trustworthy code for medical and health related information available on the Internet. Until recently, websites voluntarily applying for the HONcode seal were evaluated manually by an expert medical team according to 8 principles, referred to as criteria, and associated published guidelines. In the scope of the European project Kconnect, HON is developing an automated system to identify the 8 HONcode criteria within health webpages. When the research on the development of such a system evolved from simple algorithmic testing to a real full-content setting, it revealed a number of issues. The preceding study consisted in taking a set of 27 health-related websites and having them assessed for their compliance to each of the 8 HONcode criterion, first manually by senior HONcode experts, and then through supervised machine learning by the automated system. The results showed discrepancies mainly for two criteria: “submerged content” under the Complementarity criterion and “extremely low recall” under the Date Attribution criterion. In this article, the authors investigate different approaches to solve the problems related to each of these criteria, namely a customized Named Entity Recognition Model instead of a machine learning component for Date Attribution, and a sliding window instead of the whole document as a unit of detection for Complementarity. The results obtained show that the newly adapted automated system greatly improves accuracy: 74% vs. 41% for the Date Attribution criterion and 74% vs. 22% for the Complementarity criterion.
References
- Aphinyanaphongs, Y., Tsamardinos, I., Statnikov, A., Hardin, D., and Aliferis, C. (2005). Text categorization models for high-quality article retrieval in internal medicine. J Am Med Inform Assoc, 12(2):207-216.
- Boyer, C., Baujard, V., Nater, T., Scherrer, J., and Appel, R. (1999). The health on the net code of conduct for medical and health-related web sites: three years on. J Med Internet Res, 1(Suppl 1):e99:e99.
- Boyer, C. and Dolamic, L. (2014). Feasibility of automated detection of honcode conformity for healthrelated websites. IJACSA, 5(3):69-74.
- Boyer, C. and Dolamic, L. (2015). Automated detection of honcode website conformity compared manual detection: An evaluation. J Med Internet Res, 17(6):e135.
- Boyer, C., Hajic, J., Hanbury, A., Kirtz, M., Pletneva, N., Schneller, P., Stefanov, V., and Uresova, Z. (2014). D10.3: Report on the extensive tests with the final search system. Khresmoi public deliverable. Accessed on : 25.08.2015. http://khresmoi.eu/assets/Deliverables/WP10/Khresm oiD103.pdf.
- Griffiths, K., Tang, T., Hawking, D., and Christensen, H. (2005). Automated assessment of the quality of depression websites. J Med Internet Res, 7(5):e59.
- Humphrey, T. (2009). Internet users now spending an average of 13 hours a week online. Accessed on: 08.01.2012. http://news.harrisinteractive.com/profiles/investor/Res LibraryView.asp?BzID=1963& ResLibraryID=35164 &Category=1777.
- Ilic, D., Bessel, T., Silagy, C., and Green, S. (2003). Specialized medical search engines are no better than general search-engines in sourcing consumer information about androgen deficiency. Hum Reprod., 18(3):557- 561.
- McNamee, P. and Mayfield, J. (2004). Character n-gram tokenization for european language text retrieval. Information Retrieval, 7(1-2):7397.
- OpenNLP (2015). Apache opennlp developer documentation. Accessed on : 25.08.2015. http://opennlp.apache.org/documentation/manual/open nlp.html#tools.namefind.recognition.
- van Straten, A., Cuijpers, P., and Smits, N. (2008). Effectiveness of a web-based self-help intervention for symptoms of depression, anxiety, and stress: Randomized controlled trial. J Med Internet Res, 10(1):e7.
- Vishnyakova, D., Gobeill, J., Oezdemir-Zaech, F., Kreim, O., Vachon, T., Cladé, T., Haenning, X., Mikhailov, D., and Ruch, P. (2014). Electronic processing of informed consents in a global pharmaceutical company environment. MIE, pages 995-999.
- Williams, K. and Calvo, R. (2002). A framework for document categorization. In Proceedings of the Seventh Australasian Document Computing Symposium.
Paper Citation
in Harvard Style
Boyer C., Dolamic L., Ruch P. and Falquet G. (2016). Effect of the Named Entity Recognition and Sliding Window on the HONcode Automated Detection of HONcode Criteria for Mass Health Online Content . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 151-158. DOI: 10.5220/0005644301510158
in Bibtex Style
@conference{healthinf16,
author={Celia Boyer and Ljiljana Dolamic and Patrick Ruch and Gilles Falquet},
title={Effect of the Named Entity Recognition and Sliding Window on the HONcode Automated Detection of HONcode Criteria for Mass Health Online Content},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016)},
year={2016},
pages={151-158},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005644301510158},
isbn={978-989-758-170-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016)
TI - Effect of the Named Entity Recognition and Sliding Window on the HONcode Automated Detection of HONcode Criteria for Mass Health Online Content
SN - 978-989-758-170-0
AU - Boyer C.
AU - Dolamic L.
AU - Ruch P.
AU - Falquet G.
PY - 2016
SP - 151
EP - 158
DO - 10.5220/0005644301510158