demonstrated a large semantic and clinical variability
of criteria across the trials. They argue that the ma-
jority of criteria present the challenges for automatic
evaluation because of semantic connectors hard to ex-
press with current representation languages, temporal
constraints, need for clinical judgment or lack of ex-
pected data in patient record.
6 CONCLUSIONS
The work described in this paper is part of our re-
search aimed at supporting patient recruitment and
trial study feasibility. It focuses on the analysis of se-
mantics of eligibility criteria, detecting parts of med-
ical ontologies relevant for a particular disease.
First, we investigated which annotation tool,
MetaMap or NCBO annotator, is more appropriate
for our task. We compared the overlap of concepts
detected by both in eligibility criteria of 2135 breast
cancer trials. The results show that the intersection
accounts for only 59% of entire set. Because of the
advantage of MetaMap in the number of detected con-
cepts we decided to use it for further experiments. In
future work it could be interesting to define a voting
algorithm which takes into account precision and re-
call of both tools corresponding to particular types
of criteria or semantic types. Second, we analyzed
the source and semantic types of detected concepts.
The findings indicate the high majority of concepts
(88%) is defined by more than one ontology cov-
ered by UMLS, majority by MTH, CHV, NCI and
SNOMED CT. The highest number of unique contri-
butions is provided by NCI, SNOMED CT and CHV.
We chose SNOMED CT for the next experiments, be-
cause of its wide usage in clinical setting and good
scores in the comparison. It should be noted that in
32% of criteria phrases MetaMap did not detect any
concept, which indicates that additional processing is
needed to recognize the context in which recognized
terms occur. Only approximately 35% of phrases an-
notated with UMLS obtained the maximal mapping
score, and 48% in case of using only SNOMED CT.
The analysis of the distribution of the detected
concepts over various semantic types and their fre-
quency revealed that the mapping effort will need to
be spread over many types. Furthermore, we analyzed
the stability of obtained concept set by studying its
growth while adding new trials. While some stabil-
ity of the growth curve can be observed, specially for
some semantic types, we cannot expect that obtained
annotation set is complete. Extending the solution to
other trials will involve creating more mappings.
Finally, we put the semantic of breast cancer trials
into broader perspective of over 38, 000 clinical trials
studying other diseases. We used tf-idf measure to
find concepts that are specific for breast cancer, and
cancer in general, and used the results to prioritize
them. We also verified the overlap between the top
2000 ranking concepts for breast cancer and concepts
occurring in other types of eligibility criteria and find
out that the substantial part is repeated: in all cases
above 1100, in other cancer types above 1300.
We believe that this analysis provides insights
about semantics of eligibility criteria that can be used
to prioritize the mapping process of eligibility crite-
ria to patient record, and enhance building the re-
cruitment support tool. The approach was demon-
strated on the breast cancer domain, but it can be eas-
ily reused for other diseases.
REFERENCES
Aronson, A. R. and Lang, F.-M. (2010). An overview of
metamap: historical perspective and recent advances.
Journal of the American Medical Informatics Associ-
ation, 17(3):229–236.
Clark, K. and Parsia, B. (2008). Modularity and owl. Liter-
ature survey.
Jones, K. S. (1972). A statistical interpretation of term
specificity and its application in retrieval. Journal of
Documentation, 28:11–21.
Milian, K., Aleksovski, Z., Vdovjak, R., ten Teije, A.,
and van Harmelen, F. (2009). Identifying disease-
centric subdomains in very large medical ontolo-
gies, a case-study on breast-cancer concepts in
snomed. In Knowledge Representation for Healthcare
(KR4HC09), LNCS.
Milian, K., Bucur, A., and ten Teije, A. (2012). Formaliza-
tion of clinical trial eligibility criteria: Evaluation of a
pattern-based approach. In 2012 IEEE International
Conference on Bioinformatics and Biomedicine.
Musen, M., Shah, N., Noy, N., Dai, B., Dorf, M., Griffith,
N. B., Buntrock, J., Jonquet, C., Montegut, M., and
Rubin, D. (2008). Bioportal: Ontologies and data re-
sources with the click of a mouse. In AMIA Annual
Symposium, pages 1223–1224.
Ross, J., Tu, S. W., Carini, S., and Sim, I. (2010). Anal-
ysis of eligibility criteria complexity in clinical trials.
AMIA Summits on Translational Science Proceedings,
pages 46–50.
Shah, N. H., Bhatia, N., Jonquet, C., Rubin, D. L., Chi-
ang, A. P., and Musen, M. A. (2009). Comparison of
concept recognizers for building the open biomedical
annotator. BMC Bioinformatics, 10(S-9):14.
HEALTHINF2013-InternationalConferenceonHealthInformatics
166