Measuring and Avoiding Information Loss During Concept Import from

a Source to a Target Ontology

James Geller

1 a

, Shmuel T. Klein

2 b

and Vipina Kuttichi Keloth

1 c

Dept. of Computer Science, New Jersey Institute of Technology, U.S.A.

Dept. of Computer Science, Bar Ilan University, Ramat Gan 52900, Israel

Keywords:

Biomedical Ontologies, Concept Import, Information Content, Information Loss.

Abstract:

Comparing pairs of ontologies in the same biomedical content domain often uncovers surprising differences.

In many cases these differences can be characterized as “density differences,” where one ontology describes

the content domain with more concepts in a more detailed manner. Using the Uniﬁed Medical Language

System across pairs of ontologies contained in it, these differences can be precisely observed and used as the

basis for importing concepts from the ontology of higher density into the ontology of lower density. However,

such an import can lead to an intuitive loss of information that is hard to formalize. This paper proposes an

approach based on information theory that mathematically distinguishes between different methods of concept

import and measures the associated avoidance of information loss.

1 INTRODUCTION

The ﬁeld of Medical Informatics has developed a

rich ecosystem for research, development, and appli-

cations of biomedical terminologies and ontologies.

The NCBO BioPortal (NCBO, 2019) provides access

to over 772 such resources, containing, as of May 20,

2019, over 9.4 million classes (which would be called

“concepts” in other repositories). BioPortal keeps a

rich set of statistics about the upload and use of on-

tologies. These statistics allow the analysis of the

quality of ontology maintenance by the curators of in-

dividual BioPortal entries (Geller et al., 2018).

BioPortal takes a “big tent” inclusive approach to-

ward the question of What qualiﬁes as a biomedical

ontology? This is expressed both in the content and

structure of some of the resources accessible through

bioportal. Thus, (Stato, 2019) is a general purpose

statistics ontology that is not speciﬁc to medicine.

MeSH, the Medical Subject Headings (MeSH, 2019)

is contained in BioPortal, although it is widely ac-

knowledged that it is structurally not an ontology at

all.

Another major resource for biomedical ontologies

is the Uniﬁed Medical Language System (UMLS)

(UMLS, 2019), developed by the National Library of

https://orcid.org/0000-0002-9120-525X

https://orcid.org/0000-0002-9478-3303

https://orcid.org/0000-0001-6919-1122

Medicine (NLM), an institute under the US Govern-

ment National Institutes of Health (NIH). The most

important component of the UMLS is the Metathe-

saurus (Meta, 2019).

A new version of it is released twice a year and

over a long period of time (“decades”), every new

release has expanded on the previous version. Ac-

cording to the most recent release notes (Metanotes,

2019), the UMLS contains 3,848,696 concepts and

12,362,080 concept names from 210 distinct termi-

nology sources. The staff of the NLM integrates the

different terminologies such that each group of terms

with identical meaning is tied together as a single con-

cept and assigned a Concept Unique Identiﬁer (CUI).

However, individual terms are maintained with their

source information.

1.1 Concept Import

The unique richness of the UMLS makes it possible

to compare its subterminologies on a concept basis.

Researchers have observed (He et al., 2014) that paths

between pairs of concepts that are identical by their

CUIs may be different in two different terminologies.

Speciﬁcally, if a pair of concepts (A, B) exists in both

terminologies T

and T

, such that there is a path from

A to B consisting of one or more IS-A links (similar

to subclass links), then the following situations can

arise.

442

Geller, J., Klein, S. and Keloth, V.

Measuring and Avoiding Information Loss During Concept Import from a Source to a Target Ontology.

DOI: 10.5220/0008354904420449

In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019), pages 442-449

ISBN: 978-989-758-382-7

• There may be direct IS-A links from A to B in T

and T

, with no intervening concepts. This ex-

presses that A is a more speciﬁc concept than B.

• There may be paths of IS-A links and intervening

concepts in T

and T

that are identical. Thus there

may be a concept Z between A and B in both T

and T

, such that (A IS-A Z) and (Z IS-A B). This

would be a path of two IS-A links.

• There may be paths of IS-A links and interven-

ing concepts such that the intervening concepts

in T

are different from those in T

; furthermore

the paths may be of different lengths, including a

length of one in either T

or T

The two concepts A and B are called anchor con-

cepts. Following the literature (Rector et al., 2006),

the difference between paths of different lengths ex-

presses a density difference. As these paths appear

in IS-A paths that are conventionally drawn akin to

the vertical direction, these have been called vertical

density differences.

The above observations raised the following in-

triguing question. If one designates T

as a source

terminology and T

as a target ontology, and the path

between the anchor concepts in T

is longer than that

in T

, does this mean that the intervening concepts

from T

that are missing in T

can or even should be

imported into T

? In consultation with ontology cu-

rators and medical experts, it was determined that the

results of an algorithm comparing paths in two on-

tologies may not be used for automated import from

a source ontology into a target terminology. How-

ever, these results could be presented to target ontol-

ogy curator(s) for a decision whether an import would

be useful for improving it. It was furthermore ob-

served that even with the help of such an algorithm,

the actual work of the target ontology curator remains

formidable (He and Geller, 2016). The reasons given

by ontology curators for not importing valid concepts

include that they do not want to clutter up their on-

tology with concepts for which no use case exists or

which no user has ever requested (Curators, 2019).

In contrast to research on vertical density differ-

ences, this paper reports on work on horizontal den-

sity differences (Keloth et al., 2018; Keloth et al.,

2019). Importing concepts from a source into a tar-

get ontology, based on horizontal density differences,

would lead to a loss of information, and in this paper

we analyze how to quantify and avoid this loss.

1.2 Relationship to Data and Ontology

Integration

A rich literature exists on ontology alignment, match-

ing and integration. The extensive work of Shvaiko,

Euzenat, et al. (Shvaiko et al., 2018) may provide an

excellent entry point into this ﬁeld. Synonym substi-

tution is one tool that can be used for the purpose of

integration; this was proposed by (Huang et al., 2009),

(Huang et al., 2007) using WordNet (WordNet, 2019)

as additional resource besides the UMLS.

Our goals in this paper are more limited in that we

are not attempting automated integration and are also

limiting ourselves to a form of local “point wise” im-

port. On the other hand, we are addressing the ques-

tion of what loss of information occurs and how to

avoid it, if the human curator agrees to an import.

Ontologies can function as tools in (database) schema

integration (Geller et al., 1992; Rahm, 2016).

1.3 Horizontal Density Differences

Figure 1 shows a bare bones example of a horizontal

density difference. Terminology 1 (the source) con-

tains a concept A that also exists in Terminology 2

(the target). Furthermore, by using both Terminol-

ogy 1 and Terminology 2 in the version provided by

the UMLS, because A has the same CUI in Termi-

nology 1 and Terminology 2, we may assert that it is

the same concept (unless the team of the NLM made

a mistake during their integration). Furthermore, we

observe that (X IS-A A) (i.e., X is a subclass or sub-

concept of A) in both the source and the target ontol-

ogy, and again the identity of X is assured by having

the same CUI. The same applies to Y and Z. How-

ever, there is a density difference. Terminology 1 has

an additional concept W that does not exist in Ter-

minology 2. We also assume that W does not exist

anywhere in Terminology 2.

After importing W into Terminology 2, A has the

same children in both terminologies. However, at this

point, the information that X, Y and Z were originally

in Terminology 2 and that W is “a recent addition,” is

completely lost. We note that the situation described

in Figure 1 is not “theoretical.”

In a recent paper (Keloth et al., 2019), we showed

that there are many instances of horizontal density dif-

ferences. This study was based on two popular medi-

cal terminologies, MEDCIN and the National Cancer

Institute thesaurus (NCIt). It was shown that identi-

cal concepts with different sets of children in NCIt

and MEDCIN appear 1966 times. More interestingly,

1049 of these concepts do not have any children in

common in NCIt and MEDCIN. Table 1 shows an ex-

Measuring and Avoiding Information Loss During Concept Import from a Source to a Target Ontology

443

X Y Z W

X Y Z

Terminology 1 Terminology 2

Figure 1: Horizontal Density Difference.

ample parent concept that appears in NCIt and MED-

CIN and has four common children. The right column

shows an example additional child concept in MED-

CIN. Table 2 shows seven more examples, listing only

the number of common children instead of showing

them, in the column with the header C#.

In (Keloth et al., 2019) the authors examined dif-

ferent approaches of how to use this insight for im-

porting concepts from MEDCIN into NCIt. However,

they did not deal with the issue of potential loss of

information during an import.

Figure 2(a) shows the original situation. Figure

2(b) shows naive import of D and E from the source

ontology into the target ontology. In this case a user

of the target ontology cannot tell that there is a “his-

torical” difference between the concepts X, Y , and Z

versus the concepts D and E. This is a form of infor-

mation loss.

In Figure 2(c) we attempt to avoid this loss of in-

formation by creating an artiﬁcial intermediate node

Inter1 which maintains a memory of the fact that D

and E were imported. However, this leads to an imbal-

ance of the structure that is not logical, because in the

source ontology X , Y, Z, D, and E are all at the same

level. As level is commonly used to imply generality

(in tree-structured ontologies), placing two groups of

concepts that were originally at the same level in the

source ontology into two different levels in the target

ontology corresponds to mutating the structure of the

target ontology in an undesirable way. Figure 2(d)

shows an alternative solution with two new interme-

diate nodes. Now the concepts X, Y, Z, D, and E are

back to being at the same hierarchical level while still

maintaining full information of the provenance of the

imported concepts and the original concepts. How-

ever, in solution 2(d) we pay the price of having to

introduce two artiﬁcial nodes.

The idea of introducing intermediate structuring

nodes that have little meaning in the ontology might

be objected too. However, it is not totally unprece-

dented. In the NDF-RT (National Drug File - Refer-

ence Terminology) (NDF-RT, 2019) groups of drug

concepts are combined together by similar intermedi-

ate concepts (that, however, have chemical justiﬁca-

tions).

2 MEASURING INFORMATION

We shall try to quantify the rather fuzzy concepts ex-

posed above. The big difﬁculty is that even if a sin-

gle concept is imported into an ontology of thousands

of concepts, “every concept is now suspect.” In other

words, it is impossible to tell by looking at a concept

whether it was originally in the ontology, or whether

it was imported. Thus, we will use a “backwards ap-

proach,” focusing on the gain of information achieved

by making the structural changes during import that

avoid the original “global” loss.

Measuring information is the main objective of In-

formation Theory and is quite well understood since

Shannon’s pioneering work in 1948 (Shannon, 1948).

The typical scenario is that of a discrete random vari-

able X, taking on a ﬁnite number of possible values

, . . . , x

. One also generally assumes that there is

a given probability distribution, assigning the prob-

ability p

to the event X = x

, for 1 ≤ i ≤ n. The

average amount of information conveyed by the ran-

dom variable X, called its entropy, is then deﬁned

as H(X) = −

∑

i=1

log

, and is measured in bits.

Resnik (Resnik, 1995) has used entropy to measure

semantic similarity between concepts within one sin-

gle ontology, which is a different problem than the

one posted here.

The ﬁrst obstacle to overcome when trying to ex-

tend the notion of entropy to the concepts of an ontol-

ogy as those in Figures 1 and 2 is that there is no un-

derlying probability distribution. One way to avoid it,

is to assume uniform probabilities, that is, p

for

all i, in which case H(X) = log

n. Indeed, the infor-

mation amount given in the left hand side ontologies

shown in Figure 2 is log

5 = 2.32 bits, which is the

number of bits necessary to encode a possible choice

among the ﬁve alternatives X , Y , Z, D and E.

For the purposes of this analysis we will ignore

any connections between concepts that do not imple-

KEOD 2019 - 11th International Conference on Knowledge Engineering and Ontology Development

444

Table 1: Example of Common Children and Extra Child.

Parent

C0149516 : Chronic sinusitis

Common Children Example Extra Child in MEDCIN

C0008712 : Chronic sphenoidal sinusitis C0155827 : Chronic pansinusitis

C0008683 : Chronic frontal sinusitis

C0008698 : Chronic maxillary sinusitis

C0008681 : Chronic ethmoidal sinusitis

Table 2: Examples of Extra Children.

Parent C # Example Extra Child in MEDCIN

Anti-Arrhythmia Agents 13 pilsicainide hydrochloride

Testosterone 4 testosterone methyl

Loop Diuretics 5 Xipamide

Cranial Nerve Neoplasms 15 overlapping neoplasm of cranial nerve

Glycogen Storage Disease 7 GLYCOGEN STORAGE DISEASE Ic

Mastectomy 6 Bilateral mastectomy

Retinoids 4 aliretinoin

ment the concept taxonomy, i.e., we will not consider

any lateral/semantic relationships such as “location.”

Nevertheless, a second complication arises from the

fact that the structures of many important real life

ontologies may be more generally directed acyclic

graphs (DAGs), such as SNOMED CT (SNOMED

CT, 2019), considered the most important clinical on-

tology, and the NCIt (NCIt, 2019) mentioned ear-

lier, rather than the over-simpliﬁed approximation as

a tree-like hierarchy dealt with in the example ﬁgures

above. We shall, however, restrict this preliminary in-

vestigation only to trees, and tackle the general prob-

lem in future work.

In previous work on structural families of ontolo-

gies in BioPortal (Ochs et al., 2016) over 140 tree-

shaped biomedical ontologies were observed. Ex-

amples include the Healthcare Common Procedure

Coding System (HCPCS) and the Drug Ontology

(DRON).

To deal with the general case, we shall derive our

suggested measure inductively. Assume a tree struc-

ture with r + 1 levels indexed 0 (for the level of the

root) to r, and with n

nodes on level i, for 0 ≤ i ≤ r.

Denote the number of nodes on level i that are further

sub-partitioned as m

, so that n

− m

is the number

of leaves at level i. These m

nodes have, respectively,

i+1,1

, n

i+1,2

, . . . , n

i+1,m

children on level i+1, so that

∑

j=1

i+1, j

= n

i+1

for 0 ≤ i < r.

For the example tree displayed in Figure 3, r = 3,

, . . . , n

) = (1, 4, 8, 7), (m

, . . . , m

) = (1, 3, 2, 0)

and (n

1,1

; n

2,1

, n

2,2

, n

2,3

; n

3,1

, n

3,2

) = (4; 3, 2, 3; 4, 3).

We deﬁne the information content I of the tree T

by summing, within each level and over all the lev-

els, the logarithm of the branching multiplicity of the

nodes, taking a weighted average for the nodes within

each level. Formally

I(T ) =

∑

i=1

i−1

∑

j=1

i, j

log

i, j

). (1)

Returning to the example of Figure 3, we get

I(T ) = log

4 +



log

3 +

log

2 +

log





log

4 +

log



= 5.26 bits.

In particular, for a simple ontology with n con-

cepts, represented by a tree of depth 1, that is, with

just one level of n leaf nodes, the information content

will be log

Suppose then that we are given an ontology, which

is conveniently represented as a tree structure, and

that we want to reﬁne it by introducing an interme-

diate node R. Assume that this intermediate node is

added between level i − 1 and level i of the tree for

some i > 0, that there are n

nodes on level i and that

k of them should now be connected to the new node R.

The passage from the right hand tree in Figure 2(b) to

that of Figure 2(c) is the special case for which i = 1,

= 5 and k = 2. The general scenario is depicted in

Figure 4. Note that for convenience, we assume that

the k nodes of level i which are connected to R are sib-

lings, in the sense that they were originally children of

the same node on level i − 1.

Measuring and Avoiding Information Loss During Concept Import from a Source to a Target Ontology

445

X Y Z

MEDCIN NCIt

(a) Initial Situation

X Y Z D E

(b) Import of D and E with complete loss of information.

X Y Z

D E

Inter1

have a common "provenance" from MEDCIN.

X Y Z D E

Inter1Inter2

(d) Import of D and E with restructuring, making the provenance of all

children of A explicit in the target ontology.

X Y Z D E

Figure 2: Different Approaches to Importing Concepts.

KEOD 2019 - 11th International Conference on Knowledge Engineering and Ontology Development

446

Figure 3: Example tree hierarchy.

. . . . . .

. . .. . .

. . .

i - 1

. . . . . .

. . .. . .

. . .

i - 1

Figure 4: Schematic view of the inductive step in the deﬁnition of the information measure.

Since the modiﬁed tree structure has obviously

added some information, we deﬁne the additional

amount of information that has been added by insert-

ing the intermediate node R as follows. A new choice

among k elements has been adjoined, which should

add another log

k bits, but only k of the n

nodes are

affected, so we deﬁne the added information amount

log

k, (2)

thereby extending the deﬁnition in eq. (1). Returning

to the example of Figure 2(c), we get that the infor-

mation at this stage is

log5 +

log2 = 2.72 bits. (3)

The addition of another intermediate node, as in

the passage from the right hand tree in Figure 2(c)

to that of Figure 2(d) is yet another example of the

same generalization principle, so we get as informa-

tion content of the structure with both intermediate

nodes:

log5 +

log2 +

log3 = 3.67 bits. (4)

A technical problem arises from the fact that the

deﬁnition of the additional information relies on hav-

ing the levels of the tree well deﬁned. However, the

newly inserted intermediate nodes may disrupt the

level numbering if one considers these nodes as equiv-

alent to the original nodes in the tree. For example,

the nodes D and E in the right hand side tree of Fig-

ure 2(c) would then be on level 2, while their former

siblings X, Y and Z remain on level 1, as in the left

hand side of the ﬁgure. As a result, the information

added by the inter2 node would then be biased, be-

cause it would consider all the (remaining) nodes of

level 1 to be affected, and not only 3 of the 5 nodes

that were originally on level 1.

Since a model in which the level of a node is not

inﬂuenced by the possible insertion of intermediate

nodes seems more reasonable and closer to the real

life scenario we wish to simulate, we shall ignore in-

termediate nodes for the calculation of the level of a

node. This is consistent with the fact that these nodes

have been artiﬁcial additions in the ﬁrst place, that

they are not content-bearing and are used only for

technical convenience. Applying this new convention

then yields, for our running example, the information

amounts reported in eqs. (3) and (4).

One of the advantages of this approach of deﬁning

the information content as given in eq. (2) is that, by

deﬁnition, this information measure is an increasing

function of the complexity of the hierarchical struc-

Measuring and Avoiding Information Loss During Concept Import from a Source to a Target Ontology

447

ture: every newly introduced branching conveys ad-

ditional information, and accordingly, adds a non-

negative amount to the previously deﬁned information

estimate.

It follows that we may deﬁne the information loss

mentioned in the title of this work by taking the dif-

ference between the information contents of the hier-

archical structure after and before the import of the

concepts from the source to the target ontology. This

is precisely the amount given in eq. (2), so it is de-

ﬁned as the number of bits lost by not including the

additional intermediate node(s).

Another advantage is that the speciﬁc deﬁnition

as a precise number of information bits to be derived

from the structure can have a logical interpretation:

the given number of bits is the minimal one, from the

compression point of view, needed to communicate

all the information displayed by the hierarchy, see

any textbook dealing with coding, e.g., (Klein, 2016,

Chapter 11).

As we have argued, the introduction of the inter-

mediate nodes counteracts a hard to quantify global

information loss. Thus, using one or two intermediate

nodes avoids the issue of information loss by main-

taining a record of the ontology from which the im-

ported concepts are taken.

3 EXPERIMENTAL SETUP

The idea of trying to quantify a semantic concept by

assigning it a measure that can be efﬁciently and pre-

cisely calculated is not new and has been applied in

various ﬁelds. An example could be the attraction

factor deﬁned in (Choueka et al., 1983), allowing to

sort the terms of an ontology according to the strength

by which they “attract” the term(s) following them;

thus once upon a has a high factor, being practically

always followed by time, but and has a low factor,

even though and the is very frequent, yet there are

many other combinations starting with and. Another

example would be (Geller et al., 2015), in which a

measure is derived helping to identify term pairs with

strong semantic correlation.

Devising a convincing experimental setup to eval-

uate the usefulness of a proposed measure does not

seem to be a trivial task. The intuition of most read-

ers will hardly differentiate between a structure that

has been assigned, say, 4.8 bits, and one with only 3.6

bits; and it will be even harder to convince ourselves

why the increase should be by precisely 33%.

A reasonable, yet very resource intensive, ap-

proach would be to make use of human informants.

One could then prepare a large set of examples and

ask the informants to classify them according to what

they “feel” their information content should be. In

a second stage, the results, averaged over all infor-

mants, could be compared with what would be ob-

tained by classifying the examples according to the

information measure proposed herein. A high corre-

lation would then be supportive of the usefulness of

our suggestions. The current paper, however, is only

meant to present the basic ideas, and we leave their

evaluation for future work.

4 CONCLUSIONS AND FUTURE

WORK

The UMLS mapping of concepts from different on-

tologies makes it possible to observe potentially miss-

ing concepts by comparing pairs of ontologies. A do-

main expert can then decide whether such concepts

should be imported or not. Many opportunities for

such imports exist. However, when a concept is im-

ported naively, the information that it was not origi-

nally in the target ontology is lost. Quantifying this

loss is difﬁcult, because it affects the whole target on-

tology. We have presented an approach to quantifying

the loss of information by measuring the gain that is

achieved by maintaining the source information dur-

ing import, with the aid of “artiﬁcial” parent nodes.

In future work, we plan to extend the presented

model from trees to Directed Acyclic Graphs (DAGs),

which covers a much larger set of biomedical ontolo-

gies. We will also attempt to perform a user study

with human informants. An algorithm for automat-

ically generating intermediate nodes during import

will also be provided.

REFERENCES

Choueka, Y., Klein, S. T., and Neuvitz, E. (1983). Auto-

matic retrieval of frequent idiomatic and collocational

expressions in a large corpus. Journal Association Lit-

erary and Linguistic Computing, 4:34–38.

Curators (2019). Personal communication with ncit and

snomed curators.

Geller, J., Keloth, V. K., and Musen, M. A. (2018). How

sustainable are biomedical ontologies? In AMIA

2018, American Medical Informatics Association An-

nual Symposium, San Francisco, CA, November 3-7,

2018.

Geller, J., Klein, S. T., and Polyakov, Y. (2015). Identifying

pairs of terms with strong semantic connections in a

textbook index. In KEOD 2015 - Proceedings of the

International Conference on Knowledge Engineering

KEOD 2019 - 11th International Conference on Knowledge Engineering and Ontology Development

448

and Ontology Development, Volume 2, Lisbon, Portu-

gal, November 12-14, 2015, pages 307–315.

Geller, J., Perl, Y., Neuhold, E., and Sheth, A. (1992). Struc-

tural schema integration with full and partial corre-

spondence using the dual model. Information Systems,

17(6):443 – 464.

He, Z. and Geller, J. (2016). Preliminary analysis of difﬁ-

culty of importing pattern-based concepts into the na-

tional cancer institute thesaurus. In Exploring Com-

plexity in Health: An Interdisciplinary Systems Ap-

proach - Proceedings of MIE2016 at HEC2016, Mu-

nich, Germany, 28 August - 2 September 2016., pages

389–393.

He, Z., Geller, J., and Elhanan, G. (2014). Categorizing the

relationships between structurally congruent concepts

from pairs of terminologies for semantic harmoniza-

tion. In AMIA Joint Summits on Translational Science

proceedings, pages 48–53.

Huang, K., Geller, J., Halper, M., and Cimino, J. J. (2007).

Piecewise synonyms for enhanced UMLS source ter-

minology integration. In AMIA 2007, American

Medical Informatics Association Annual Symposium,

Chicago, IL, USA, November 10-14, 2007.

Huang, K., Geller, J., Halper, M., Perl, Y., and Xu, J.

(2009). Using wordnet synonym substitution to en-

hance UMLS source integration. Artiﬁcial Intelli-

gence in Medicine, 46(2):97–109.

Keloth, V. K., He, Z., Chen, Y., and Geller, J. (2018). Lever-

aging horizontal density differences between ontolo-

gies to identify missing child concepts: A proof of

concept. In AMIA 2018, American Medical Informat-

ics Association Annual Symposium, San Francisco,

CA, November 3-7, 2018.

Keloth, V. K., He, Z., Elhanan, G., and Geller, J. (2019). Al-

ternative classiﬁcation of identical concepts in differ-

ent terminologies: Different ways to view the world.

Journal of Biomedical Informatics, in press.

Klein, S. T. (2016). Basic Concepts in Data Structures.

Cambridge University Press.

MeSH (2019). Medical Subject Headings, https://bioportal.

bioontology.org/ontologies/{mesh}.

Meta (2019). The UMLS Metathesaurus, https:

//www.nlm.nih.gov/research/umls/knowledge\

sources/metathesaurus/.

Metanotes (2019). Metathesaurus Release Notes,

https://www.nlm.nih.gov/research/umls/know\-

ledge\ sources/metathesaurus/relea\-se/notes.htm.

NCBO (2019). https://bioportal.bioontology.org/.

NCIt (2019). The National Cancer Institute thesaurus, https:

//ncithesaurus-stage.nci.nih.govncitbrowser/.

NDF-RT (2019). National Drug File Reference Ter-

minology, https://www.nlm.nih.gov/research/umls/

sourcereleasedocs/current/{ndfrt}/.

Ochs, C., He, Z., Zheng, L., Geller, J., Perl, Y., Hripcsak,

G., and Musen, M. A. (2016). Utilizing a structural

meta-ontology for family-based quality assurance of

the bioportal ontologies. Journal of Biomedical Infor-

matics, 61:63–76.

Rahm, E. (2016). The case for holistic data integra-

tion. In Advances in Databases and Information Sys-

tems - 20th East European Conference, ADBIS 2016,

Prague, Czech Republic, August 28-31, 2016, Pro-

ceedings, pages 11–27.

Rector, A. L., Rogers, J., and Bittner, T. (2006). Granularity,

scale and collectivity: When size does and does not

matter. Journal of Biomedical Informatics, 39(3):333–

349.

Resnik, P. (1995). Using information content to evalu-

ate semantic similarity in a taxonomy. In IJCAI’95

Proceedings of the 14th international joint conference

on Artiﬁcial intelligence - Volume 1, Montreal, Que-

bec, Canada, August 20 - 25, 1995, pages 448–453.

Morgan Kaufmann Publishers Inc. San Francisco, CA,

USA.

Shannon, C. E. (1948). A mathematical theory of communi-

cation. Bell System Technical Journal, 27(2):379–423.

Shvaiko, P., Euzenat, J., Jim

enez-Ruiz, E., Cheatham, M.,

and Hassanzadeh, O., editors (2018). Proceedings of

the 13th International Workshop on Ontology Match-

ing co-located with the 17th International Semantic

Web Conference, OM@ISWC 2018, Monterey, CA,

USA, October 8, 2018, volume 2288 of CEUR Work-

shop Proceedings. CEUR-WS.org.

SNOMED CT (2019). https://www.snomed.org/

Stato (2019). Statistics Ontology, https://bioportal.\-

bioontology.org/ontologies/{stato}.

UMLS (2019). The Uniﬁed Medical Language System,

https://www.nlm.nih.gov/research/umls/.

WordNet (2019). https://wordnet.princeton.edu/.

Measuring and Avoiding Information Loss During Concept Import from a Source to a Target Ontology

449