Table 1: Sensitivity (SE) and specificity (SP) rates, and
area under ROC curve (ROC) for nearest-neighbor
clustering (NNC), artificial neural networks (ANN), and
logistic regression (LR) classification algorithms, tested in
OHealth filter.
SE SP
RO
C
NN
C
0.92 1.00 0.98
AN
N
0.80 0.88 0.91
LR
0.92 1.00 0.98
At the current stage of the study, NNC showed
the best performance as aid algorithm for automated
classification of web health content with the best
rates of specificity and sensitivity, and the largest
area under the ROC curve – 0.92, 1.00, 0.98
respectively.
4 DISCUSSION
In the present study, nearest-neighbor clustering
(NNC) was able to recognize all Merck health web
pages – specificity 1 – and was able to filter 92% of
web pages with content not related to health issues –
sensitivity 0.92.
Despite OHealth showed a satisfactory
performance in classifying health web pages, its
represented a single editor (Merck). Further tests
should be run including health web pages from
different sources and different editorial
characteristics.
Another point requiring further investigation is
classification time performance. With the present
design, OHealth took approximately 7 minutes to
carry out the classifying calculations for each web
page. Other approaches to assess similarity have
been proposed such as white lists and memory hash
calculation.
Other software components integrating the
InHealth portal system are still in stages of design
and test, as the Lepidus adaptation and the GUI
improvement.
The authors also plan to implement a
questionnaire-based assessment tool to be applied on
the general and specialist web users in order to
compare the performance of InHeath with other
general-purpose search engines such as Google.
5 CONCLUSIONS
The usefulness of a search portal for web pages with
health-related content is potentially enormous, and
the challenge of its implementation is motivating.
The preliminary results of the present study show
that web mining techniques can improve the
specificity of search for health information on the
world wide web.
REFERENCES
Abraham, J., & Reddy, M. (2007). Quality of Healthcare
Websites: A Comparison of a General-Purpose vs.
Domain-Specific Search Engine. AMIA Symposium
Proceedings, (p. 858).
Ajax. (01 of 01 of 2008). Ajax. Access in 11 of 07 of
2008, available in Ajax:
http://www.w3schools.com/Ajax/Default.Asp
Berkow, R., Beers, M., Bogin, R., & Fletcher, A. (01 of 01
of 2003). Manual Merck de Informação Médica:
Saúde para a Família. Access in 08 of 07 of 2008,
available in Merck: http://www.msd-
brazil.com/msdbrazil/patients/manual_Merck/prefacio.
html
Bireme. (01 of 01 of 2008). VHL. Access in 11 of 07 of
2008, available in VHL:
http://www.bireme.br/php/index.php?lang=en
Bishop, C. (2007). Pattern Recognition and Machine
Learning. Springer: New Jersey.
Burnham, K., & Anderson, D. (2004). Model Selection
and Multi-Model Inference. Berlim: Springer.
CETIC. (01 of 11 of 2007). TIC Domicílios e usuários
2007. Access in 07 of 07 of 2008, available in CETIC:
http://www.cetic.br/usuarios/tic/2007/rel-int-10.htm
Chang, P., Hou, I., Hsu, C., & HF, L. (2006). Are Google
or Yahoo a good portal for getting quality healthcare
web information. AMIA Annu Symp Proc, (p. 878).
DeCS. (01 of 01 of 2008). DeCS - Health Sciences
Descriptors. Access in 07 of 07 of 2008, available in
http://decs.bvs.br/I/homepagei.htm
Duda, R., Hart, P., & Stork, D. (2000). Pattern
Classification. New York: Wiley-Interscience.
Dunford II, T. (2008). Advanced Search Engine
Optimization: A Logical Approach. Maui: American
Creations of Maui.
Erl, T. (2007). SOA Principles of Service Design. Prentice
Hall: New York.
Falcão AEJ. HealthRank: Construção e Avaliação de um
Software para Medir Adequação à Códigos de Ética e
Relevância de Websites em Saúde Utilizando Métodos
de Mídia Social e Indicadores Automatizados. Master
Thesys –Federal University of São Paulo, 2008.
Haykin, S. (1999). Neural Networks: a Comprehensive
Foundation. New Jersey: Prentice-Hall.
Hersh, W. (2003). Information Retrieval : a Health and
Biomedical Perspective. New York: Springer.
BRAZILIAN HEALTH-RELATED CONTENT WEB SEARCH PORTAL - Presentation on a Method for its Development
and Preliminary Results
309