EXPLICIT CONCEPTUALIZATIONS FOR KNOWLEDGE
MAPPING
Willem-Olaf Huijsen
Telematica Instituut, P.O. Box 589, NL-7500 AN, Enschede, The Netherlands
Samuël J. Driessen
Océ Technologies, P.O. Box 101, NL-5900 MA, Venlo, The Netherlands
Jan W.M. Jacobs
Océ Technologies, P.O. Box 101, NL-5900 MA, Venlo, The Netherlands
Keywords: information classification, con
ceptualizations, knowledge mapping, ontologies, semantic networks, thesauri.
Abstract: Knowledge mapping supports members of an organization in finding knowledge available within the
organization, and in developing insights into corporate expertise. An essential prerequisite is an explicit
conceptualization of the subject domain to enable the classification of knowledge resources. Many tools
exist to create explicit conceptualizations. This paper establishes a set of requirements for conceptualization
tools from the perspective of knowledge mapping. Next, a number of tools are reviewed – thesauri,
ontologies, and semantic networks – using the following criteria: the complexity, the effort required, and the
degree to which it is possible to integrate it into the overall knowledge mapping system.
1 INTRODUCTION
Knowledge mapping is about making the knowledge
that is available within an organization transparent,
and is about providing insights into its qualities.
Generally, when employees look for knowledge,
they draw from three sources: other employees,
documents of various types, and information
systems. First, other employees typically include
close colleagues and other colleagues one knows to
have relevant expertise. The search is typically
limited to only a few people. Still, there may be
others with high-quality knowledge one misses
because one does not know them or does not know
all of the expertise one’s colleagues have. Second,
documents often come in large numbers, and with a
poor structure to them, making a quick and effective
search very difficult. Third, information systems
tend to be numerous too, and each system has a
different interface and internal structure, so that
finding knowledge and piecing together information
from across a number of systems is a lot to ask. The
distributed nature of organizational knowledge
makes it very hard to get a clear, complete overview,
and to draw conclusions.
Knowledge-mapping systems (KMSs) provide
su
pport for addressing these issues, collecting data
on the corporate knowledge from various
information systems.
The knowledge-mapping process can be said to
co
nsist of the following steps. First, raw data is
acquired from one or more sources. This typically
involves some basic processing such as filtering or
keyword extraction. The resulting first-order data is
stored in the knowledge-mapping database
(KMDB). In order to obtain more meaningful
information, it may be further analyzed, aggregated,
and contextualized, resulting in higher-order data.
By visualizing the first-order and higher-order data
in specific ways, and taking into account user
preferences, knowledge maps can be produced that
provide insights into corporate knowledge.
2 CONCEPTUALIZATIONS
To talk about a domain of knowledge, one needs a
way to label pieces of knowledge and the
relationships between them. This is done by using
conceptualizations. To establish a common frame of
reference, we review some basic notions. A concept
can be defined as any unit of thought, any idea that
forms in our mind [Gertner, 1978]. Often, nouns are
used to refer to concepts [Roche 2002]. Relations
231
Huijsen W., J. Driessen S. and W.M. Jacobs J. (2004).
EXPLICIT CONCEPTUALIZATIONS FOR KNOWLEDGE MAPPING.
In Proceedings of the Sixth International Conference on Enterprise Information Systems, pages 231-236
DOI: 10.5220/0002634202310236
Copyright
c
SciTePress
form a special class of concepts [Sowa, 1984]: they
describe connections between other concepts.
Modifiers (or attributes) may be attached to concepts
and relations to restrict or clarify their scope. One of
the most important relations between concepts is the
hierarchical relation (subsumption), in which one
concept (superconcept) is more general than another
concept (subconcept). Instances are concrete objects
that may be examples of the more abstract concepts.
A conceptualization is “the objects, concepts, and
other entities that are assumed to exist in some area
of interest and the relationships that hold among
them” [Genesereth & Nilsson, 1987]. A
conceptualization can be divided into a conceptual
model and an instance model. The conceptual model
includes all concepts of interest and the relations
between them. The instance model includes all
instances of interest and the relations that link these
instances to each other and to the concepts.
The conceptual model states that the individuals of
interest to the organization are its employees and the
external contacts. In terms of the knowledge-
mapping process the instance model data is derived
from the external information systems such as
document management systems and organizational
databases. This data is stored in the KMDB. The
conceptual model typically is not available in any
information system, and must be developed as part
of the KMS. The KMDB invariably requires a large
number of specific relations corresponding to the
information presented by the various knowledge
maps. Since a thesaurus can only accommodate 3–5
relations, it is not an option to integrate the KMDB
and the thesaurus (Fig. 1a). However, in the case of
ontologies and semantic networks, any number of
specific relations can be incorporated.
3 CRITERIA
A knowledge-mapping tool uses the conceptualiza-
tion in several situations: the indexing of knowledge
sources, concept naming for user interaction,
browsing the concept space, and relevance ranking.
Indexing determines for a given knowledge source
which concepts it is concerned with. Essentially,
indexing consists of examining the document and
establishing its subject content; identifying the
principle concepts present in the subject; and
expressing these concepts in the indexing language
[ISO 1985]. Therefore, indexing requires the
conceptualization to include, for each concept of
interest, all terms that describe it: synonyms and
spelling variants. If a term is ambiguous, then
hyperonyms and hyponyms, and other related
concepts may be used for disambiguation.
The conceptualization is also used to name relevant
concepts for communication with the user. The
following must be addressed: technical terms may
have any number of synonyms and spelling variants,
and they may be ambiguous. Generally, this is dealt
with by selecting, for every concept, from the set of
synonyms and spelling variants, a single,
unambiguous term as the preferred term, while the
others are the non-preferred terms. This also ensures
consistent naming.
An important feature of a knowledge-mapping tool
is browsing the system’s conceptual space. This
means that the system must contain descriptions of
the relations between the concepts. The number of
relations depends in part on the application’s
requirements and on the number of instances of the
relations. We find that the minimum is two: the
subsumption relation and the association relation.
The intended application may require certain
distinctions between types of relations. A practical
consideration is that if there are too few relation
types, then a concept may have very many relation
instances with the same name. Then it may be too
difficult to find the information one is looking for.
Given a query, a knowledge-mapping tool will try
and find relevant knowledge items. In presenting the
knowledge items, the results are ranked according to
their relevance to the query. A knowledge item may
also refer to the concept in question using synonyms
Conceptual
model
(= thesaurus)
Instance
model
(= KMDB)
Conceptual
model
+
instance
model
Conceptualization
(Thesaurus Type)
Conceptualization
(Ontology/
Semantic network
Type)
(a)
Aircraft
Passenger
Aircraft
Aircraft
Military
Aircraft
Freight
Aircraft
Civil
Aircraft
Payload User type
Passenger
Aircraft
Military
Aircraft
Freight
Aircraft
Civil
Aircraft
(b)
Figure 1: (a) Two conceptualization types, (b) Facets
ICEIS 2004 - INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION
232
and other related terms. Thus, relevance ranking can
also benefit from knowledge about relations between
terms.
Conceptualizations consist of a set of concepts on
the one hand and a set of relations between those
concepts on the other. From the perspective of the
above-mentioned applications, we observe that
having terms for the set of concepts is sufficient for
indexing and concept naming. In addition,
information on the relations can improve the
indexing process. However, for concept browsing,
both the set of concepts and the relations between
them is necessary. Thus, a term set provides a
minimal basis for knowledge mapping, but
additional information on the relations between
concepts is necessary for concept browsing.
We now summarize the requirements on a
knowledge classification system. First, it must
include, for every concept, all terms that describe the
concept: synonyms and spelling variants, and related
concepts (hyperonyms, hyponyms, etc.). Second, it
must include a set of different relation types. At least
subsumption and association are required. More
relations are needed if the application requires so, or
if the number of relation instances with the same
name becomes impractical.
4 TOOLS
For knowledge mapping we consider the following
three conceptualization types: thesauri, ontologies,
and semantic networks. Other, simpler structures
such as keyword lists and taxonomies are not taken
into account as they fail to meet basic requirements.
After reviewing these tools, section 5 evaluates them
with respect to the criteria established above.
4.1 Thesauri
Thesauri are classification systems that combine a
set of technical terms with a few basic relations. Our
discussion is inspired by the ISO and ANSI/NISO
standards [ISO 1986, NISO 1993, cf. Wielenga
2001]. The latter defines a thesaurus as “a controlled
vocabulary arranged in a known order and structured
so that equivalence, homographic, hierarchical, and
associative relations among terms are displayed
clearly and identified by standardized relationship
indicators that are employed reciprocally”.
Thesauri divide the technical terms into sets of
synonyms. From each set, one preferred term is
chosen to represent the underlying concept; the
others are non-preferred terms. The included
relation types are synonymy, subsumption, and
association.
A clear advantage of thesauri is their simplicity,
which allows for an easier implementation. The
downside is that since only three basic relations are
distinguished one may feel that many different
relations get mixed up.
Of course, depending on one’s application, one may
vary the number and nature of the basic relations
used in the thesaurus. As the hierarchical relation is
the primary structuring principle, we focus on two
variations that improve the clarity of the hierarchy.
An important and generally applicable variation
explicitly distinguishes different types of
hierarchical relations: the generic relation, the
whole-part relation, and the instance relation. With
only one hierarchical relation, in browsing concepts,
these different types of subordinate concepts will not
be readily distinguishable. Alternatively, using three
relations, the presentation for concept browsing can
be made much clearer.
A second variation is based on the subdivision of
concepts. One can use different characteristics of a
concept as the criterion for subdividing it (facets)
(Fig. 1b).
4.2 Ontologies
Generally, an ontology is an explicit specification of
a shared conceptualization. An ontology consists of
concepts, terms, and relations between the concepts
and terms [Huijsen & Driessen 2003].
An important observation about the ontology is that
it defines terms in the domain and relationships
between these terms in a formal way, but it does not
specify the meaning of the terms. Therefore, the
definition of an ontology can be focused to a formal
specification of a part of a conceptualization. The
names of the concepts and the description of the
ontology in natural language are therefore important
for the ability to understand and use an ontology.
In practical terms, developing an ontology includes
defining concepts in the ontology, arranging the
concepts in a subsumption hierarchy, defining facets
and describing allowed values for these facets, and
filling in the values for facets for instances [Noy &
McGuinnes 2001]. In contrast to thesauri, an explicit
difference is made between concepts and instances.
Furthermore, while subsumption is the main relation
in ontologies, there is no limit to the number of
relation types that can be used in ontologies.
4.3 Semantic Networks
A semantic network is a graph of the structure of
meaning. A semantic network represents knowledge
as a graph. An idea, event, situation or object
invariably has a composite structure; this is
EXPLICIT CONCEPTUALIZATIONS FOR KNOWLEDGE MAPPING
233
represented in a semantic network by a
corresponding structure of nodes representing
conceptual units, and directed links representing
relations between units. It becomes semantic when
you assign a meaning to each node and link
[Lehmann 1992]. Each concept is defined by its
links to other concepts [Sowa 1984]. Concept
meaning is extended by tracing outward from a
particular concept to all the concepts associated with
that particular concept. Semantic networks aim to
represent any kind of knowledge which can be
described in natural language. A semantic network
system includes not only the explicitly stored net
structure but also methods for automatically deriving
a much larger body of implied knowledge. The
essential idea of semantic networks is that the graph-
theoretic structure of relations and abstractions can
be used for inference as well as understanding
[Lehmann 1992].
Semantic networks do not distinguish between
concepts and instances, as ontologies do. As with
ontologies, there is no limit to the number of relation
types used in ontologies. The relations between the
concepts/instances are bidirectional.
In an ontology the subsumption relation is the basic
structure, and there typically is a strict distinction
between concepts and instances. By contrast, in a
semantic network, all concepts, instances and all
relations have an equal status. The subsumption
relation is just one of the relations, and no
distinction is made between concepts and instances.
This is an advantage because this makes abstraction
and inference over the concepts easier. The number
of relations in thesauri is fixed. In ontologies this
relation is not as strict by distinguishing between
concepts and instances. Semantic networks have rich
relations between the concepts. This means that
there are many more relations between the concepts
(than in ontologies), the (semantic) distance between
the concepts can be given, and finding related terms
is not limited by the hierarchy (as in thesauri and
ontologies).
A conceptualization in a knowledge-mapping tool is
used to describe the content of knowledge sources.
The results of the indexing of knowledge sources
and other information are stored in the KMDB. Note
that there is a commonality between the
conceptualization and the KMDB: both define
relations between concepts. If one allows for a large
enough number of concepts and relations in the
conceptualization, then these systems can be
integrated. This reduces complexity and improves
performance. Such a combination has a number of
implications. First, since a database typically
includes many relations, the conceptualization must
allow for a large number of relations. Second, the
conceptualization must also include the concepts of
the database, such as the knowledge sources
themselves.
5 EVALUATION
From the viewpoint of conceptualizations, what are
the fundamental differences between the models of
thesauri, ontologies, and semantic networks.
Thesauri focus primarily on the textual appearance
of concepts, namely technical terms. They only
indirectly recognize the existence of concepts by
grouping terms into synonym sets that each have one
preferred term to represent the corresponding
concept. Thesaurus standards limit the power to
express relations between concepts in that they
define only a handful of relations.
In contrast, the ontology model focuses on the
notions underlying the terms: concepts and
instances. The technical terms themselves are of
Country
The
Netherlands
Continent
consists of/part of
America
associates with
part of/consists of
Population
consists of/part of
State
consists of/part of
(a)
Country
The
Netherlands
Province
State
is a/instance of
is a/instance of
America
part of/consists of
part of/consists of
associates with
part of/consists of
Holland
also called
not the same as
(b)
Figure 2: (a) Sample ontology (b) Sample semantic network
ICEIS 2004 - INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION
234
lesser importance. Issues like synonymy are not
central, and can be dealt with in terms of the
multifunctional attributes of concepts and instances.
In terms of relations, the ontology model still places
a focus on the hierarchical subsumption relation, but
allows for arbitrarily many other relations. The
ontology model thus takes a more appropriate focus
and enhances expressive power by not limiting the
number of relations, while still being able to deal
with the issues central to the thesaurus model.
Finally, semantic networks subscribe to the more
generic view that concepts and instances have so
much in common that it warrants having only one
type of object, also termed “concept”. The
distinction between the concepts and instances of the
ontology model can still be made using attributes.
The semantic network model also liberalizes the role
of the relations. Relations are all given equal status:
there is no a priori focus on the subsumption
relation. Like the ontology model, the semantic
network model does not limit the number of
relations. In comparison to the ontology model, the
semantic network model therefore is more generic: it
retains the expressive power, and makes it much
easier to handle concepts and instances as the same
type of object where appropriate.
In summary, moving from the thesaurus model via
the ontology model to the semantic network model,
we see that each step corresponds to important
additional insights, and each successive model is
more capable to appropriately serve as a model for
conceptualizations. The expressive power of each
previous model is retained, not by extending the
previous model, but by replacing the modeling
primitives with more generic ones. As a conse-
quence, necessary features must be softcoded in
terms of the modeling primitives instead of being
hardcoded as modeling primitives themselves. This
means that in addition to the model itself it must be
specified how to use modeling primitives such as
attributes to implement the required features.
The structure of conceptualizations helps build the
conceptualization. As we have seen the thesaurus is
simple to build, while the ontology and semantic
network call for more effort. Furthermore the
structure of the conceptualization also limits or
extends its inference capabilities. As for the
thesaurus and the ontology, we saw that they were
limited by the top-down tree structure. The semantic
network is not limited by its structure.
The number of relations in a thesaurus is limited.
This results in a simple conceptualization that can be
built more easily. The relations in an ontology and
semantic network are not limited. This implies a
complex structure, especially for a semantic
network. Being able to use lots of relations results in
powerful inference capabilities.
Thesauri do not distinguish between concepts and
instances. They use preferred terms to represent the
concept. Ontologies do distinguish between concepts
and instances, while semantic networks do not. The
distinction between preferred terms and other terms
is simple and straightforward. Distinguishing
between concepts and instances, as in ontologies, is
a more complex approach. In semantic networks no
distinction is made between concepts and instances.
This results in a complex network, but also in the
advantage of being able to abstract from a concept
more easily.
Higher-order processing for knowledge mapping
involves some reasoning. In the thesaurus
configuration, inference is hardcoded into the
software and is limited by the structure of the
conceptualization. Alternatively, in the ontology and
semantic network configurations, we are not limited
by the structure of the network. Inference typically
makes use of the specific relations in the KMDB.
Thus, inference capabilities of the
conceptualizations only make sense if the KMDB is
integrated with the conceptualization, because this
enables the explicit coding of the inference rules.
Now we will judge the conceptualization based on
the criteria listed in section 3. All conceptualizations
can include all terms that describe the concept, e.g.
synonyms and spelling variants. Thesauri do both
explicitly. In ontologies and semantic networks the
spelling variants are treated as synonyms. Thesauri
define a preferred term, ontologies and semantic
networks do not. The conceptualizations all define
other relations. Thesauri usually consist of three or a
limited number of relations. Ontologies and
semantic networks usually also have a predefined
number of relations, but theoretically can account
for an unlimited number of relations.
In our view there are two viable conceptualizations
for knowledge mapping: the thesaurus and the
semantic network. The thesaurus variant is geared
towards simplicity, ease of implementation, and
reduction of work. The semantic network variant is
geared towards maximum expressiveness. It has a
more complex structure, which makes it harder to
implement. It also allows for integration with the
KMDB.
Table 1: Comparison of conceptualization Tools
Thesaurus Ontology Semantic
Net
Complexity low medium medium
Labour
intensity
medium high high
Integrated – + +
EXPLICIT CONCEPTUALIZATIONS FOR KNOWLEDGE MAPPING
235
6 CONCLUSION
Which conceptualizations should be recommended
to extend a knowledge mapping system?
Conclusions can be drawn from the evaluation of the
conceptualizations themselves and in the light of the
knowledge mapping criteria. From a technical point
of view – does the conceptualization account for
fully integrated conceptualization? Ontologies and
semantic network satisfy these criteria, while
thesauri do not. From a user perspective – how
easily can a conceptualization be built? – the
thesauri is the simplest and most straight-forward to
build (and maintain). Ontologies and semantic
networks require much more effort. However, the
complexity of the semantic network is higher than
that of the ontology due to the fact that no
distinction is made between concepts and instances,
and the possible relations are endless.
Thus, when we have to make a choice between these
conceptualizations we advise to choose between the
simple approach by using the thesaurus for the
knowledge mapping systems or the complex
approach, by choosing for the semantic network.
When you want to have more complex capabilities
than thesauri have, building an ontology is useless.
We advise to use the semantic network approach
because of the extra advantages the structure of a
semantic network has and the fact that semantic
networks do not differentiate between concepts and
instances, which makes abstracting easier.
REFERENCES
Genesereth, M.R. and Nilsson, N.J., 1987. Logical
Foundations of Artificial Intelligence, San Mateo, CA:
Morgan Kaufmann Publishers.
Gertner, D., 1978. “On Relational Meaning: The
Acquisition of Verb Meaning”. Child Dev.ment, 49,
pp.988-998.
Huijsen, W., and Driessen, S., 2003. “How to Build
Ontologies for the Metis Pilots”, Telematica Instituut,
internal technical report.
ISO 5963, 1985. “Documentation – Methods for Examing
Documents, Determining Their Subjects, and
Selecting Indexing Terms”, 1st edition, ISO.
ISO 2788, 1986. “Documentation – Guidelines for the
Establishment and Development of Monolingual
Thesauri”, 2nd edition, ISO.
Lehmann, F., 1992. “Semantic Networks”. In Computers
& Mathematics with Applications, vol. 23, nr. 2-5.
NISO. “Guidelines for the Construction, Format, and
Management of Monolingual Thesauri”, NISO Press,
Betheseda, MD.
Noy, N.F. and McGuinness, D.L., 2001. “Ontology
Development 101: A Guide to Creating Your First
Ontology”, Stanford Knowledge Systems Laboratory
Technical Report KSL-01-05.
Roche, C., 2002. “Form Information Society to
Knowledge Society: the Ontology Issue”. In:
Computing Anticipatory Systems: CASYS 2001 – Fifth
Int. Conf., ed. D.M. Dubois.
Sowa, J.F., 1984. Conceptual Structures: Information
Processing in Mind and Machine, Menlo Park, CA:
Addison-Wesley.
Wielenga, B.J., Schreiber, A.Th., Wielemaker, J., and
Sandberg, J.A.C., 2001. “From Thesaurus to
Ontology”,. In First Int. Conf. on Knowledge Capture
(K-CAP 2001), Victoria, British Columbia, Canada.
ICEIS 2004 - INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION
236