Table 4: The sliding window allows to zoom on the sig-
nificant content for the Complementarity criterion. A few
samples from the page of Figure 3.
No. Content of the window
3 Content The information on this Site has
been included in good faith but is for
general informational purposes only. It
should not be relied on for any specific
purpose and no representation or war-
ranty is given as regards its accuracy or
completeness. No information on this
Site shall constitute an invitation to invest
in the Company, nor should it be used
as the basis for any investment decision.
The Content of this Site is not intended as
medical advice, nor is it recommended as a
4 Completeness. No information on this
Site shall constitute an invitation to invest
in the Company, nor should it be used
as the basis for any investment decision.
The Content of this Site is not intended as
medical advice, nor is it recommended as a
substitute for medical advice. You should
always seek the advice of your doctor or
health care professional regarding any
medical condition or treatment. Neither the
Company, its affiliates, nor their respective
directors, officers, employees, agents
5 Substitute for medical advice. You should
always seek the advice of your doctor or
health care professional regarding any
medical condition or treatment. Neither the
Company, its affiliates, nor their respective
directors, officers, employees, agents, or
representatives are engaged in rendering
medical advice. Alere reserves the right
to make any changes and corrections to
this Site and its Content as and when we
consider it appropriate and without notice.
Privacy Policy Alere
TM
Privacy
losing track of context. The size of the window was
established empirically. We chose to create a window
consisting of 500-characters maximum (limited to the
word boundaries) and slid it progressively 250 char-
acters at a time.
Table 4 gives examples of the different passages
for the webpage displayed in Figure 3 and highlighted
by the blue rectangle. In order for the page to be
marked as respecting the criteria, the system needs
to detect its presence on at least one window created
in such a way for this page. Information related to
the Complementarity criterion spreads in the rows 3,
4 and 5. This information is in italics.
To test the effectiveness of the above-described
methods for both the Date Attribution and Comple-
mentarity criteria, we used the same set of 27 web-
sites (+10,000 webpages) as the ones used for the pre-
vious experiments (Boyer and Dolamic, 2015). The
main motivation for using the same collection and
setup (i.e. Nave Bayes classifier in combination with
word tokenization) was to be able to compare the sys-
tem before/after directly while integrating the new ap-
proaches customized for the 2 criteria. The conve-
nience sample of 27 health websites was selected to
broadly cover HONcode potential and actual sites as
follows:
• 9 new, potentially certifiable websites. HONcode
experts estimated that these websites do conform
to HONcode, but are not yet certified;
• 9 likely non-certifiable websites. The HONcode
experts estimated that these websites would not
conform to HONcode principles when fully anal-
ysed;
• 4 newly certified websites. These websites were
recently certified for the first time;
• 5 previously certified HONcode sites chosen be-
cause they were awaiting annual reassessment.
In order to perform different research experiments
conducted over a few years, we decided to locally
retrieve the websites using the HON crawler. The
crawling was conducted in April 2014. It should be
noted that it was a good approach as several web-
sites do not exist anymore (10% of websites selected
closed down).
We have chosen to present the obtained results
using various measurements. Apart from giving the
standard classification measurements: precision (P),
recall (R) and accuracy (A); we added the contin-
gency table values: False Negative (FN), False Pos-
itive (FP), True Negative (TN) and True Positives
(TP).
4 RESULTS
Table 5 gives the results for the detection of the Date
Attribution criterion. Manual review has resulted in it
being detected for 21 out of 27 websites in the test set
(column Manual +). In the case that neither the auto-
mated system nor manual review found evidence sup-
porting this criterion, it was considered as true nega-
tive (TN), while detection by both manual and auto-
mated system is considered a true positive (TP). Web-
sites for which the criterion was detected in the man-
ual review but not by the automated system are con-
sidered to be a false negative (FN), while the ones de-
tected by the automated system, but not in the manual
review, represent a false positive (FP). In the experi-
ments which yielded the results are presented in this
table we compared the results obtained by a machine
Effect of the Named Entity Recognition and Sliding Window on the HONcode Automated Detection of HONcode Criteria for Mass Health
Online Content
155