The percentage of goal articles found per recom-
mendation in the at most 1000 results, as well as the
total number of search results returned by techniques
1 to 3 are shown in table 6. The results for techniques
4 to 6 are shown in table 7.
Overall, 20 out of 71 goal articles were found.
Looking at the results more closely, we can note the
following findings:
1. For 5 out of 16 recommendations, the system was
unable to find any of the goal articles with any of
the techniques (recommendations 3, 5, 12, 13 and
15)
2. Technique 2 found the most articles overall,
closely followed by technique 3. Both techniques
work by the query construction method by com-
bining sets of terms.
3. The techniques 4 to 6, which are based on
constructing queries by selecting terms, perform
much worse.
4. The query constructing method by combining sets
of terms (technique 1-3) performs better overall,
but yields many more search results. This in-
dicates that the query captures the recommenda-
tions’ meaning the best, but is not very specific in
doing so.
If we want to explain finding 1, we have to take
a closer look at recommendations 3, 5, 12, 13 and
15. For recommendation 3, 5 and 15, this can be ex-
plained by the small amount of evidence articles for
the recommendation (1, 2 and 1 respectively). No-
table about recommendations 12 and 13 is that their
updates seem to radically change the recommenda-
tion itself. This could be the reason that the system
was unable to find any goal articles: the goal articles
are simply too different from the original evidence.
To elaborate on finding 2, we have to look more
closely at the techniques used. Both are based on con-
structing query method by combining sets of terms,
and both use the primary MeSH terms extracted from
the evidence articles as their input (for technique 3,
these are augmented with the MeSH terms of the rec-
ommendation). In practice, most queries that reach
the threshold of 15 articles are of level 1: they are a
disjunction of all primary terms, as explained in sec-
tion 3.2.3. This explains why the number of search
results is so large, as disjunctions are very weak re-
strictions on the set of articles. The fact that there are
still a significant number of goal articles found, indi-
cates that the PubMed search engine is quite potent at
sorting articles by relevance to the search terms, since
only the first 1000 were used.
We can also see that constructing queries by se-
lecting terms (technique 4-6) reaches one of the goals
for which it was designed, which is decreasing the
search space. This can be seen by the number of
search results, which is much lower on average than
when using the constructing queries by combining
sets of terms (technique 1-3) . This approach is, how-
ever, much worse at finding the goal articles. This in-
dicates that removing search terms in order to broaden
the query can lead to a loss in meaning, causing worse
results.
The large number of results returned by the
queries indicates how volatile queries can be. Even
though our approach offers a lot of variation between
broad and specific queries, small changes such as re-
moving a term or switching from conjunction to dis-
junction can result in an explosion in the number of
results obtained by the query. This is a difficult prob-
lem to solve due to the size of the database, and re-
quires further research.
4.4 Results for Ranking
Now that we have an indication of how well our
queries perform, we will examine how well the rank-
ing algorithm performs in determining their rele-
vance. To do this, we will take a look at our best
performing technique (1-6) for each recommendation,
and measure the percentage of articles that are ranked
in the top 25 most relevant. We chose the number 25,
as this is a reasonable amount that can be processed
by a person in approximately an hour. In table 8 is for
each recommendation given: the best technique (1-6),
the recall for the at most 1000 results (the percentage
goal articles in first 1000 results), and the top-25 re-
call. Recommendations for which we found no goal
articles (3, 5, 12, 13 and 15) are omitted.
From these results, we can immediately see the
urgency of keeping the number of search results low.
In the cases where there are a lot of results, the goal
articles have a very high chance to get lost outside
of the top of the ranking. This reinforces the find-
ings of (Iruetaguena, et al., 2013), who noted simi-
lar results. This indicates that the combination of the
Rosenfeld-Shiffman filter combined with tf-idf is per-
haps not a suitable way to process large numbers of
articles, as the resulting ratings are very close to each
other for many articles. For smaller sets of articles,
for instance recommendations 4 and 8, the algorithm
seems to have performed very well.
The addition of the MeSH distance to these rat-
ings showed little difference. This can have multiple
reasons:
• Not all articles are sufficiently annotated with
MeSH terms. If an article is not annotated, the
MeSH distance will always be 0, resulting in an
HEALTHINF2015-InternationalConferenceonHealthInformatics
98