Table 1: Search metrics.
Metric Without expansion With expansion % improvement
Mean Average Precision 0.63 0.71 12
Precision(@Rank 10) 0.49 0.53 8
Recall(@Rank 10) 0.62 0.67 8
NDCG 0.79 0.83 5
also closer to the original user query are ranked higher
than those which are further apart.
3 RESULTS AND DISCUSSION
The proposed technique was tested in IBM Policy
Insights, an application that helps Medicaid Fraud,
Waste and Abuse (FWA) investigators find policy in-
formation relevant to their investigations.
For Ground Truth, 3 subject matter expert (SME)
investigators identified a corpus of 345 Medicaid pol-
icy documents. They also defined 80 queries that
were representative of those typically used in their
investigations, ranging from single words to phrase
queries. Every document in the corpus was labelled
as either relevant or not-relevant for each query. The
corpus, queries and relevancy labels were used as
ground truth for measuring document retrieval per-
formance. To evaluate the performance, the follow-
ing metrics were used, Precision, Recall, Mean Av-
erage Precision (MAP), Normalized Discounted Cu-
mulative Gain (NDCG) and Mean Reciprocal Rank
(MRR) (Buttcher et al., 2016).
The underlying search engine used was Watson
Discovery Service (WDS) (Turner, 2016), using its
out-of-the-box elastic search capabilities only. A
WDS collection was created for the corpus of 345 pol-
icy documents. UMLS was used as a thesaurus. A
word embeddings model was trained on this corpus,
using word2vec (Le and Mikolov, 2014). To evalu-
ate baseline document retrieval performance, all 80
queries were run against WDS’ out-of-the-box Natu-
ral Language Query (NLQ) API and the above perfor-
mance metrics calculated for the returned documents.
For our experiment, the same setup was used, with
the 80 queries passed through our proposed query ex-
pansion pipeline in advance of being sent to the NLQ
search engine. The same performance metrics were
again calculated for the returned documents and these
were compared to the baseline performance.
Table 1 summarizes the comparison of key met-
rics. From the average performance over 80 queries,
it can be concluded that recall and precision are both
improved by our proposed query expansion pipeline.
We also observed that queries containing healthcare
concepts matched by our thesaurus (as described in
section 2.2) showed more significant improvements
than those that did not. This is expected, due to
the nature of our corpus and available thesauri i.e.,
UMLS contains high-quality human-curated knowl-
edge. We conclude that combining domain knowl-
edge from human-curated thesauri and automatically-
learned word embeddings enhances document re-
trieval performance on a corpus containing a mix of
terminology from several domains.
4 FUTURE WORK
The proposed technique was tested on a corpus of
healthcare policy documents. Further testing needs
to be done on different domains to assess the gener-
alization of the proposed technique. Specifically, the
approach needs to be tested on domains with a less de-
tailed human-annotated thesaurus (such as insurance
sector). Furthermore, metadata of detected concepts
such as semantic groups will be exploited in future
iterations for concept disambiguation.
REFERENCES
Bodenreider, O. (2004). The unified medical language sys-
tem: integrating biomedical terminology. In Nucleic
acids research, volume 32, pages D267–D270. Ox-
ford University Press.
Buttcher, S., Clarke, C. L., and Cormack, G. V. (2016).
Information retrieval: Implementing and evaluating
search engines. MIT Press.
Carpineto, C. and Romano, G. (2012). A survey of au-
tomatic query expansion in information retrieval. In
ACM Comput. Surv., volume 44, pages 1:1–1:50, New
York, NY, USA. ACM.
Crimp, R. and Trotman, A. (2018). Refining query expan-
sion terms using query context. In Proceedings of the
23rd Australasian Document Computing Symposium,
ADCS ’18, pages 12:1–12:4, New York, NY, USA.
ACM.
Kuzi, S., Shtok, A., and Kurland, O. (2016). Query expan-
sion using word embeddings. In Proceedings of the
25th ACM International on Conference on Informa-
tion and Knowledge Management, CIKM ’16, pages
1929–1932, New York, NY, USA. ACM.
Le, Q. and Mikolov, T. (2014). Distributed representations
NLPinAI 2020 - Special Session on Natural Language Processing in Artificial Intelligence
444