Clinical Ontology Mapping

Toward Automatic Care Plan Recommendation

Khai Nguyen

, Kaisei Reio

and Ryutaro Ichise

National Institute of Informatics, Tokyo, Japan

Metaps Inc., Tokyo, Japan

Keywords:

Clinical Ontology, Ontology Mapping, Care Plan Recommendation, MDS, ICF.

Abstract:

In this paper, we share a sketch of an automatic care plan recommendation system in Japan. After that, we

describe our proposed method and experience in the ﬁrst step: clinical ontology mapping. We discuss the

difﬁculties, method, preliminary results of a case study, which is to ﬁnd corresponding mappings between two

ontologies, the Minimum Data Set 3.0 (MDS)

and the International Classiﬁcation of Functioning, Disability

and Health (ICF)

1 INTRODUCTION

Personalized health care is being developed as a fo-

cus of modern healthcare systems. The motivation

of such development is the superiority of providing

caring solutions meeting the speciﬁc needs of an in-

dividual. For years in Japan, personalized care plans

have been helping numerous people, mainly disease

suffering patients, elders, and the disabled. These

plans include the short-term and long-term support

for persons having difﬁculty in daily activities. One

of existing problems is that the care plan personaliza-

tion is a complicated process due to the wide range

of involved factors, from personal screening results

to inter-personal relationships, together with environ-

ments, technological resources, etc. Therefore, it

costs a lot of effort and time to make a care plan man-

ually.

1.1 Automatic Care Plan

Recommendation

For above reasons, developing automatic methods to

support the care plan making is an important study.

With an automatic system, it takes less time for the

analyzer to design/determine a care plan. Currently,

in the creation of care plan, an assessment sheet is

given as one of required input. This sheet includes

health screening results (e.g., body functions, dis-

https://www.cms.gov/

http://www.who.int/classiﬁcations/icf/en/

ease diagnoses, medical history), personal situations

(e.g., age, working status, ﬁnancial capability), rela-

tionships (e.g., family, friends), environment factors

(e.g., living condition, local policies), etc. A care

plan analyzer will check the assessment sheet to iden-

tify the capabilities, difﬁculties, and needs of the per-

son. Together with considerations of available sup-

portive resources (e.g., technology, device, therapist,

nursing), a care plan is suggested. We focus on au-

tomatic care plan recommendation from assessment

sheets and supportive resources. Although the current

research object is considered in the context of Japan,

our long-term goal is a system widely applicable to

different societies.

On the development of care plan recommenda-

tion system, we focus on methods whose model and

result are interpretable. This is a conventional ap-

proach for most tasks whose interpretability of results

is an essential requirement. One challenge of care

plan recommendation is that the assessment sheets

are not written in a well-structured format (e.g., semi-

structured and free description). They are written us-

ing a collection of vocabularies derived from the Min-

imum Data Set (MDS). Although MDS is also orga-

nized as an ontology, its structure is shallow. There-

fore, it is difﬁcult to extract the information from

those materials and thus, difﬁcult to build a care plan

recommendation system. The recent achievements

in machine learning have demonstrated the capability

of knowledge generalization from shallow-structured

and unstructured data (LeCun et al., 2015; Deng et al.,

2014). Applying this approach to our problem might

Nguyen K., Reio K. and Ichise R.

Clinical Ontology Mapping - Toward Automatic Care Plan Recommendation.

DOI: 10.5220/0006753107220726

In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (HEALTHINF 2018), pages 722-726

ISBN: 978-989-758-281-3

be feasible. However, interpretability is a current

drawback of many advanced machine learning meth-

ods.

To achieve the interpretability, we attempt to di-

vide our study into two phases. In the ﬁrst phase, we

transform assessment sheets into a better-structured

format. In the second phase, over the transformed

data, we built an interpretable care plan recommender,

such as a reasoner. Among various structured repre-

sentations, ontology-based format is the best candi-

date due to its capability of knowledge representation

and availability of many clinical ontologies. For ex-

ample, most of up-to-date available ontologies are ac-

cessible via BioPortal

and UMLS meta-thesaurus

Before transformation of assessment sheets, the map-

pings of currently in-use ontology and the target

higher quality ontologies are necessary. Therefore,

ontology mapping is designed as the ﬁrst most step.

The remaining part of this paper describes the details

and related issues of clinical ontology mapping.

1.2 Clinical Ontology Mapping

Ontology mapping is a well-studied problem as there

are many independently created ontologies, uniﬁed

ontology has become a need. Various ontology map-

ping systems have been proposed and have obtained

certain achievements (Shvaiko and Euzenat, 2013).

However, it is not trivial to apply an existing ontol-

ogy mapping system to domain speciﬁed ontologies,

such as clinical ontologies. General ontologies are

conceptualized but clinical ontologies are not. That

is, an item (i.e., class) in a general ontology is usually

represented by a concept (e.g., ‘animal’ and ‘bird’).

Meanwhile, an item in a clinical ontology is different

because it is used to represent a clinical term, which

may not be a single concept. For example, an item

could be a functioning (e.g.,‘maintaining a sitting po-

sition’) or even a question (e.g., ‘does the president

walk?’). Therefore, the mapping of clinical ontolo-

gies demands some speciﬁc techniques.

Years of effort in automatic clinical ontology map-

ping have resulted in a modest number of studies and

leave many challenges. Previous studies focused on

the recommendation of mappings rather than a fully

automatic system. One reason for the missing of a

fully automatic solution is the requirement of high-

quality mappings in the domain. That is also why

existing mappings between clinical ontologies have

been done manually or semi-automatically.

(Fung and Bodenreider, 2005) attempted to use

synonyms, mapping relations of UMLS concepts

https://bioportal.bioontology.org/

https://www.nlm.nih.gov/research/umls/

to map SNOMED-CT

and ICD9CM

. 86% of

SNOMED-CT terms were mapped with 42% re-

call and 20% precision, compared to the gold stan-

dard. (Lomax and McCray, 2004) tried to map Gene

Ontology

to UMLS concepts using an automatic

method together with manual curation. Such auto-

matic method could suggest the mappings for 25% of

Gene Ontology terms.

A common issue of previous studies is that they

used simple lexical string matchings or synonym vo-

cabularies. Such approaches do not capture semantic

similarity of terms with complex differences. More-

over, they failed to suggest mappings for items which

do not lexically match with any term in the target on-

tology or not exist in the vocabularies. We try to de-

velop a mapping method capable of generating high-

quality mapping candidates for any input.

1.3 Candidate Ontologies

As earlier mentioned, we currently use vocabular-

ies derived from MDS for assessment sheets. How-

ever, the structural information of MDS is limited.

Therefore, we aim to map MDS ontology to other

better-structured ontology. In addition to structure,

the coverage and recognition of target ontologies

are important criteria for selection. Among avail-

able ontologies, there are two promising candidates:

International Classiﬁcation of Functioning, Disabil-

ity and Health (ICF) and SNOMED Clinical Terms

(SNOMED-CT).

One advantage of ICF is that it contains terminol-

ogy of components similar to items in current assess-

ment sheets. Those components are body function,

body structure, environment, and activity capability.

Furthermore, ICF is recognized by 191 member states

of the World Health Organization. Therefore, by us-

ing ICF, it is easier to apply our solution in other coun-

tries in future. In the meantime, SNOMED-CT is con-

sidered as the most comprehensive terminology up to

date. SNOMED-CT contains more than 320,000 con-

cepts and has better coverage in speciﬁc areas com-

pared to ICF (Tu et al., 2015). This ontology is being

used in more than 50 countries.

There are already some mappings between

SNOMED-CT and MDS(Tuttle et al., 2007),

SNOMED-CT and ICF

. However, the mappings

between MDS and ICF derived from those data is

http://www.snomed.org/snomed-ct/

https://www.cdc.gov/nchs/icd/icd9cm.htm

http://www.geneontology.org/

https://bioportal.bioontology.org/ontologies/ICF/?p=

mappings

limited to 9 pairs. Therefore, currently, it is necessary

to ﬁnd more mappings between MDS and ICF.

Although it is feasible to manually map the

medium-sized ontologies such like MDS and ICF, a

support tool is helpful in future mapping tasks. For

example, when checking and implementing undiscov-

ered mappings between MDS and SNOMED-CT or

other ontologies, or dealing with systems using other

terminologies. The detail of our mapping method is

described in the next section.

2 MAPPING MDS AND ICF - A

CASE STUDY

2.1 MDS 3.0 Ontology

MDS is an ontology designed for nursing home

screening. It is structured into 25 sections, of which

three (Calc, Control, Filter) are for information re-

lated to the screening system. The 22 remaining ones

are assessment sections. Each section differs from

others by its unique screening purpose. For example,

section ‘C’ is about ‘cognitive patterns’, section ‘D’

is about ‘mood’. Each section contains a ﬂat list of

assessment items. That is, the structural information

is limited to only sibling and section-item relations.

In total, there are 1038 assessment items. Each

item comes with string title, and only a few contain

a short description. A string title could be a con-

cept (e.g., disease names), a sentence (e.g., activity

descriptions), or even a screening question (i.e., pre-

deﬁned interview questions). There are six types of

item: text (e.g., social security number), date (e.g.,

birthdate), ICD code (International Classiﬁcation of

Disease

), number (e.g., height), checklist (yes/no),

and code (e.g., code ‘1’ refers to male, code ‘2’ refers

to female).

2.2 ICF Ontology

Different from MDS, ICF ontology is much better

structured. From bottom to the top level is a path start-

ing with component, going through a chapter, block,

categories, and ending at a category. That is, each cat-

egory may contain sub-categories and such relation is

recursively deﬁned. In a total of 1530 items, there

are four components (i.e., body functions, body struc-

tures, activities and participation, and environmental

factors) and 30 chapters, 37 blocks, and 1442 cate-

gories. Every item comes with a string title and a

short description. A string title could be a concept

http://www.who.int/classiﬁcations/icd/

(e.g., ‘mobility’ and ‘light sensitivity’) or a phrase

(e.g., ‘Lifting and carrying objects’).

When mapping to ICF, only 846 MDS items of code

and checklist type are used for mapping because ICF

does not contain items of other types. However, all of

1530 ICF items are used because every of them can

be matched.

2.3 Mapping Method

Given a MDS item x, we ﬁrst estimate the similar-

ity sim(x, y) between x and every ICF item y. After

that, we ﬁnd the matching score score(x, y) and apply

a ranking technique based on those scores to iden-

tify most promising mapping candidates. For map-

ping all of 846 MDS items, the basic idea to perform

the above procedure for each item. In other words, all

pairwise mappings are examined. Because the num-

ber of mappings is about 1.3 million, considerably

medium, it is possible to check all those mappings

without applying advanced candidate generation tech-

niques. The details of similarity, matching score, and

ranking is described in the following subsections.

2.3.1 Similarity

The similarity of two items is estimated by comparing

their string titles. Previous clinical ontology mapping

systems faced the difﬁculty of similarity estimation

for non-lexically related strings. Currently, it is pos-

sible to solve this issues thanks to the achievement of

word embedding (Mikolov et al., 2013), which was

empirically evaluated as effective methods for many

NLP tasks, including string matching. By using word

embedding, each string token is represented by a nu-

merical vector and instead of comparing the word sur-

faces or using synonym dictionaries, the string match-

ing is done by to comparing the vectors.

There is some effort for clinical word embed-

ding (Choi et al., 2016; Pyysalo et al., 2013) .

Among them, (Pyysalo et al., 2013) provide the vec-

tors trained from millions abstract and articles of

PubMed

and PMC

. We also use this resource for

extracting the word vectors.

Given an item z, we ﬁrst extract all tokens in the

title of item z (with stopword removal) and query their

word vectors. Then, we calculate the average vector

v(z) to represent the title of z. Using the average vec-

tor, we estimate the similarity of two item x and y

using the following equation.

sim(x, y) =

exp[EuclideanDistance(v(x), v(y))]

(1)

https://www.ncbi.nlm.nih.gov/pubmed/

https://www.ncbi.nlm.nih.gov/pmc/

Table 1: Example of relevant mappings.

MDS item ICF item

B0100: Comatose b1101: Continuity of consciousness

C0700: Staff asmt mental status: short-term memory OK b1440: Short-term memory

G0300C: Balance: turning around while walking d450-d469: Walking and moving

H0400: Bowel continence b5253: Faecal continence

K0300: Weight loss b530: Weight maintenance functions

2.3.2 Matching score

Based on the estimated similarities and structural in-

formation of ICF, we deﬁne the matching score as fol-

lows.

score(x, y) = max[sim(x, y), max

z∈C(y)

score(x, z)]

(2)

where C(y) are children items of y. In ICF, an item

could contain children and such relation is recursively

deﬁned. Therefore, the meaning of above equation is

that the matching score of item x and y is not only

deﬁned by their similarity or the similarity of x and

children of y. It is deﬁned by the similarity of x and

all descendants of y. This deﬁnition comes from the

assumption that if x matches to y, it will match to all

ancestors of y. As a result, the matching score of a

parent item is not less than any of its children.

2.3.3 Ranking

It is simple to just return the list of ranked ICF items

based on their matching scores. The problem with

this approach is that the top-ranked items can be very

similar to each other. In the context of mapping sug-

gestion, in which experts have to check the results

manually, it is important to present the mappings se-

lectively. Instead of only prioritizing mappings of

highest scores, we minimize the similarity between

top-ranked items. To this end, we implement a well-

known re-ranking method in Information retrieval, the

Maximal marginal relevance (MMR) (Carbonell and

Goldstein, 1998). MMR provides a mechanism to

control the balance of the relevance and the diversity

of the retrieval results. In the context of our problem,

MMR is deﬁned as follows.

MMR

de f

= argmax

y∈ICF\S

[λscore(x, y)

− (1 − λ)max

z∈S

(strc(y, z))] (3)

where S is the selected items from ICF, strc is the sim-

ilarity between ICF items, and λ ∈ [0, 1] is a control

factor (i.e., if λ = 1 only matching score is consid-

ered). By repeating the mechanism of equation (3),

we collect K mapping candidates for each item x.

The strc(y, z) measures the structural similarity

between ICF items and is as follows.

strc(y, z) =

2 × depth( f cs(y, z))

depth(y) + depth(z)

(4)

where depth(t) is the length of path to t from root

item and f cs(y, z) is the ﬁst common successor of y

and z. Here we reuse the idea of (Wu and Palmer,

1994) due to its capability of capturing structural rel-

evances.

2.4 Result

We implemented our mapping module in Java and

tested on a desktop with Core i7 7700K CPU. We

set λ to 0.9 after preliminarily checking different val-

ues, on an observation of ﬁve MDS items. We set

K = 5 for MMR ranking as it is reasonable for man-

ual checking. The time to complete the mapping pro-

cess for 846 MDS items to ICF is quite fast, at 127

seconds.

We randomly pick up ﬁve MDS sections for

checking, resulting in 548 mappings of 97 items.

Among them, 43 mappings (9%) of 33 items (34%)

are relevant. Table 1 lists random examples of rele-

vant mappings from each selected section. Accord-

ing to that table, our method can detect the semantic

equality between items of different terms.

3 DISCUSSION

There are still many aspects need investigations to im-

prove the effectiveness. The ﬁrst issue in our observa-

tion is the discrimination between matched and non-

matched mappings. Using current matching score

is insufﬁcient to differentiate such cases. In many

cases, the matching score of some relevant mappings

is lower than those of the non-relevances. We intend

to use multiple features to describe the similarity of

items. For example, in addition to sim over full ti-

tles, we can include the similarity of the same part-of-

speech, weighting scheme (e.g., TF-IDF, Google dis-

tance), or advanced measures on word vector (Kenter

and De Rijke, 2015). By using multiple features, we

expect a possibility of the discrimination between rel-

evant and non-relevant cases.

Making good use of existing mappings in the clin-

ical domain (e.g., SNOMED-CT and other ontolo-

gies) is also important. The beneﬁt from existing

mappings includes detailed evaluations and character-

ization of mappings. In other words, it is feasible to

generalize the mapping knowledge from a gold stan-

dard and apply the knowledge on different mapping

tasks.

It is possible to compare all items of MDS and

ICF exhaustedly. However, for larger ontologies such

as SNOMED-CT, such approach will be obstructed

by computational issues. An investigation on candi-

date generation for clinical ontology mapping is also

a need. It is also worthy to ﬁnd a mapping method

leveraging structural information when two ontolo-

gies are both well-structured.

As clinical ontology mapping requires curation

by domain experts, utilizing checking results by live-

feedback mechanisms is useful. A semi-supervised

mapping method will help to improve the mapping

quality.

Lastly, multiple ontologies are preferable because

it increase the coverage of assessment sheets. How-

ever, such advantage comes with challenges, includ-

ing ontology merging and deduplication. Further-

more, even when using more ontologies, it is not guar-

anteed a 100% coverage because assessment sheets

are customized for a local society. Such problem hap-

pen when we work on other health care systems as

well. Therefore, it is important to study how to deﬁne

new items with sufﬁcient logical relations.

4 CONCLUSION

We described an ongoing research project in care

plan recommendation system in Japan and its prelim-

inary step, which is the mapping of clinical ontolo-

gies. We proposed a straightforward yet reasonable

method for mapping MDS and ICF ontologies. Al-

though there are still many challenges to overcome

as discussed, we envision an optimistic form of the

direction. We hope our study could contribute to im-

proving the quality of healthcare in different societies.

ACKNOWLEDGMENTS

This research is funded by Welmo inc.

REFERENCES

Carbonell, J. and Goldstein, J. (1998). The Use of MMR,

Diversity-based Reranking for Reordering Documents

and Producing Summaries. In Annual International

ACM SIGIR Conference on Research and Develop-

ment in Information Retrieval, pages 335–336.

Choi, Y., Chiu, C. Y.-I., and Sontag, D. (2016). Learning

Low-Dimensional Representations of Medical Con-

cepts. AMIA Summits on Translational Science,

2016:41–50.

Deng, L., Yu, D., et al. (2014). Deep learning: Methods

and applications. Foundations and Trends

 in Signal

Processing, 7(3–4):197–387.

Fung, K. W. and Bodenreider, O. (2005). Utilizing the

UMLS for Semantic Mapping Between Terminolo-

gies. In AMIA Annual Symposium, volume 2005,

pages 266–270.

Kenter, T. and De Rijke, M. (2015). Short Text Similar-

ity with Word Embeddings. In ACM International on

Conference on Information and Knowledge Manage-

ment, pages 1411–1420.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep Learn-

ing. Nature, 521(7553):436–444.

Lomax, J. and McCray, A. T. (2004). Mapping the Gene

Ontology into the Uniﬁed Medical Language System.

Comparative and Functional Genomics, 5(4):354–

361.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Ef-

ﬁcient Estimation of Word Representations in Vector

Space. arXiv preprint arXiv:1301.3781.

Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., and Anani-

adou, S. (2013). Distributional Semantics Resources

for Biomedical Text Processing. In International Sym-

posium on Languages in Biology and Medicine, pages

39–43.

Shvaiko, P. and Euzenat, J. (2013). Ontology Match-

ing: State of the art and Future Challenges. IEEE

Transactions on Knowledge and Data Engineering,

25(1):158–176.

Tu, S. W., Nyulas, C. I., Tudorache, T., and Musen, M. A.

(2015). A Method to Compare ICF and SNOMED CT

for Coverage of US Social Security Administrations

Disability Listing Criteria. In AMIA Annual Sympo-

sium, volume 2015, pages 1224–1233.

Tuttle, M. S., Weida, T., White, T., and Harvell, J. (2007).

Standardizing the MDS with LOINC

 and vocabu-

lary matches. Report, at U.S. Department of Health

and Human Services.

Wu, Z. and Palmer, M. (1994). Verbs Semantics and Lex-

ical Selection. In Annual meeting on Association for

Computational Linguistics, pages 133–138.