ONTOLOGY-BASED SIMILARITY OF SOFTWARE CASE
Applying Ontology Reasoning to Software Retrieval
Lothar Hotz, Katharina Wolter, Stephanie Knab and Arved Solth
HITeC e.V., Department of Computer Science, University of Hamburg, Germany
Keywords:
Ontology engineering, Metamodelling, Description Logic, Taxonomical Similarity.
Abstract:
In this paper, we use Description Logic based classification and taxonomical similarity computations for facili-
tating software reuse. For this purpose we map a metamodelling-based software representation to an ontology.
The ontology is classified by a Description Logic reasoner, which makes implicit taxonomical relations ex-
plicit. This classification is the basis for the computation of taxonomical similarity. The approach is tested
with several industrial software applications.
1 INTRODUCTION
Reusing software is still an open problem in current
software practice. Former approaches to solve this
problem concentrated on the code level. However, the
impact of reuse can be increased when integrated ear-
lier in the development process, e.g. on the level of
requirements. Thus, a goal is to design an effective
reuse process based on requirements.
In this paper, we describe one step towards this
goal. The general reuse process of our approach looks
as follows: Starting with an initial requirements spec-
ification a repository is searched for similar specifica-
tions. This repository contains former software devel-
opment projects stored in the form of software cases.
A software case comprises a problem (requirements)
and a solution (architecture, design, and implementa-
tion), similar to case-based approaches, for example
described in (Bergmann et al., 1999). Each require-
ments specification is mapped to appropriate elements
of the solution.
1
The retrieved case is intended to be
reused by modifying those parts that need rework and
keeping those parts that can be reused without modi-
fication.
Retrieval of similar requirements specifications
from a repository is a key prerequisite for this reuse
process. In the following, we explain how this can
be achieved by an ontology-based similarity measure.
In order to enable reuse on the basis of meaning, the
specifications need to provide more than meaningless
1
For a complete description of a software case’s internal
structure we refer to (
´
Smiałek, 2006).
strings. The words in one requirements specification
need to be linked to an ontology, where the seman-
tics of the words are defined. Our approach depends
on requirements specifications in a machine process-
able form as provided by the requirements specifica-
tion language RSL (
´
Smiałek et al., 2007), which we
use or ATTEMPO (Fuchs et al., 2005).
In this paper, this link is realized by a new combi-
nation of formal requirements specifications provided
by the requirements specification language RSL with
ontology reasoning provided by Description Logics
(DL) (Baader et al., 2003). The basic idea is to cover
all former software cases in an ontology (DL-model)
and use the reasoning facilities of a Description Logic
reasoner (DL-reasoner) to classify the software cases,
i.e. their deeply structured elements. When classified,
the software cases have certain taxonomical distances
between each other. As shown for example in (Sa-
lotti and Ventos, 1998) the taxonomical distance can
be used for computing the similarity between classes
of a taxonomy, i.e. in our case between elements of
software cases. Computing the similarity in this way
takes into consideration the semantics of used terms.
The remainder of the paper is structured as fol-
lows: we start with the description of the require-
ments specification language RSL (see Section 2).
This language links the words used in the require-
ments specification of one software case to WordNet
(Fellbaum, 1998) as a basic ontology. This provides
the basis for our semantic-based similarity measure.
However, for applying an DL-reasoner we had to map
the requirements specifications to a DL-model (see
Section 3). How a DL-reasoner can be used for clas-
183
Hotz L., Wolter K., Knab S. and Solth A. (2009).
ONTOLOGY-BASED SIMILARITY OF SOFTWARE CASES - Applying Ontology Reasoning to Software Retrieval.
In Proceedings of the International Conference on Knowledge Engineering and Ontology Development, pages 183-191
DOI: 10.5220/0002298401830191
Copyright
c
SciTePress
sifying a set of requirements specifications and thus,
provides new taxonomic relations is described in Sec-
tion 4. Our similarity measure compares the soft-
ware cases taking into account the new taxonomical
relationships between the cases, the meaning of used
terms, and the structure of sentences (Section 5). We
implemented the developed approach and evaluated it
with industrial software cases taken from software de-
velopment organizations. These results are reported
in Section 6. The approach is discussed in Section 7
and finally the paper is summarized in Section 8.
2 FORMALIZING
REQUIREMENTS
Most requirements specifications are still written in
natural language. However, natural language is poten-
tially ambiguous, which complicates automatic pro-
cessing such as similarity estimation. Therefore, the
new requirements specification language RSL was de-
fined, which enables precise requirements specifica-
tions and is comprehensible by humans at the same
time. RSL is based on the same metamodelling ap-
proach as used for specifying the Unified Modeling
Language (UML, see (OMG, 2007)), i.e. the Meta
Object Facility (MOF, see (OMG, 2006)). The RSL
metamodel specifies the structure of valid require-
ments specifications and is integrated in a prototyp-
ical tool. The tool supports the user writing specifica-
tions and ensures that each specification is an instance
of the metamodel. In the following, we describe the
RSL elements relevant for the scope of this paper.
Each Requirements Specification
2
is part of a Software
Case, which combines all artifacts developed in one
software development project (e.g. Architectural Model,
Detailed Design Model, and Source Code). However, our
similarity measure compares the requirements speci-
fications only. This is based on the assumption that
if two software cases have similar requirements their
other artifacts are similar too and thus, the reuse po-
tential is high.
RSL allows to write requirements specifications
in form of less formal NaturalLanguageHypertextSen-
tences or more formal ConstrainedLanguageSentences.
For comparing the natural language parts of RSL
requirements specifications an Information Retrieval
approach is most appropriate (see (Wolter et al.,
2008)). The constrained form of sentences has the
advantage of being syntactically unambiguous and se-
mantically rich. For this reason, our approach focuses
2
This font denotes elements defined in the RSL meta-
model.
on this part of the specifications.
One type of ConstrainedLanguageSentences pro-
vided by RSL are Subject-Verb-Object (SVO) sentences.
They can be used to write SentenceLists and Con-
strainedLanguageScenarios in order to define Require-
ments in detail. Such ConstrainedLanguageScenarios
can for example contain sentences like: The customer
changes the order. The system confirms the changes.
etc.
In general, the metamodel defines classes, their
sub- and superclasses as well as their associations.
Figure 1 shows the metamodel of SVOSentences in
RSL. Each SVOSentence has a Subject and a Predicate.
The Subject has a NounPhrase, which links to a Deter-
miner, a Modifier and a Noun via DeterminerLink, Modifier-
Link and NounLink, respectively. The Predicate links to
a VerbPhrase being either a SimpleVerbPhrase for sen-
tences with one object or a ComplexVerbPhrase for sen-
tences containing two objects. In addition, the Simple-
VerbPhrase links a Verb via a PhraseVerbLink. While the
phrases enable different sentence structures, the dif-
ferent links contain the information about the words’
inflection within a particular sentence. Finally, the
links point to Terms, e.g. Nouns, Verbs etc. All Terms of
one specification are contained in a Terminology which
provides a software case-specific word list.
class Example: SVO-Sentence
ConstrainedLanguageSentence
SVOSentence
PhraseHyperlink
Subject
PhraseHyperl i nk
Predicate
TermHyperlink
DeterminerLink
TermHyperlink
PhraseVerbLink
SimpleVerbPhrase
TermHyperlink
NounLink
Phrase
NounPhrase
Term
Noun
Term
Determiner
Phrase
«abstract»
Ve rbPhra se
ComplexVerbPhrase
Term
Verb
Term
Modifier
TermHyperlink
ModifierLink
verbLink
0..*
target
1
predi cate*
target 1
determi nerLink 0..*
target 1
nounLink 0..*
target 1
verbPhrase
0..1
object
0..1
subject
*
target 1
noun
1
sou rce
0..1
determi ner
0..1
source 0..1
verb 1
source 1
complexVerbPhrase
0..1
s impl eVerbPhrase
1
predi cate
1
source1
sub j ect
1
source
1
modi fi erLink 0..*
target 1
modifier 0..1
source
Figure 1: The structure of SVO-Sentences in RSL.
In RSL, the meaning of Terms is defined by
linking them to WordNet
3
. WordNet is a seman-
3
see: wordnet.princeton.edu/
KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development
184
tic lexicon that was developed at the Cognitive Sci-
ence Laboratory at Princeton University (Fellbaum,
1998). It groups synonymic words (synonyms) of the
English language in synonym sets (called synsets).
Amongst others, the following semantic relations
connect synsets: hypernyms / hyponyms (is-a, is-a in-
vers) and holonym / meronym (part-of, part-of invers).
For each synset, WordNet provides definitions or ex-
ample sentences. Words with different meanings par-
ticipate in distinct synsets.
Figure 2 illustrates the general structure of re-
quirements documents defined with RSL and how
ambiguity problems of natural language are solved.
Terminology#1, for example, contains the term Cus-
tomer while Terminology#2 and #3 both contain the
term Client. However, Customer of Terminology#1
and Client of Terminology#2 both link to the same
synset, i.e. they are synonyms while the Client of Ter-
minology#3 links to another synset, i.e. the two terms
Client have a different meaning, they are homonyms.
Terminology #1 Terminology #2 Terminology #3
Requirements Specification #1
Client
Customer
Client
WordNet
Client, Customer: someone
who pays for goods or services
Client: a person who seeks
the advice of a lawyer
Requ. Spec. #2 Requ. Spec. #3
1. Client wants to sign up for show.
2. System checks availability of show.
3. System shows time schedule.
4. Client chooses time.
5. System shows sign-up dialog.
6. Client cancels sign-up for show.
1. Client wants to sign up for show.
2. System checks availability of show.
3. System shows time schedule.
4. Client chooses time.
5. System shows sign-up dialog.
6. Client cancels sign-up for show.
1.
2. The Customer changes the order
of items.
3.
1.
2. The Customer changes the order
of items.
3.
Person […]: (a human being) […]
Figure 2: Relations between Requirements Specification,
Terminology and WordNet.
3 ONTOLOGY CONSTRUCTION
In order to use the classification facilities as they are
described in the next section, we had to represent the
requirements specifications and their links to Word-
Net as a DL-model, using ontology languages like
OWL (OWL, 2004) or KRSS (Patel-Schneider, 1993).
Such languages provide facilities to define concepts
and roles (or sometimes called properties) between
two concepts. A concept (given with a unique name)
indicates the set of individuals that belongs to it, and a
role indicates a relationship between concepts. Con-
cepts and roles are combined via constructors. Typi-
cal constructors are those of ALCH I F , i.e. Attribu-
tive Language with universal restrictions, existential
qualification, concept intersection (ALC ), role hier-
archy (H ), inverse roles (I ), and functional properties
(F ). As we will see in the next section, we use ALC
for gaining a tractable description logic.
With these constructors, we can define a special-
ization relation between concepts, which provides su-
perconcepts and subconcepts in a taxonomy. Fur-
thermore, AL allows to distinguish between prim-
itive (i.e. necessary) and defined (i.e. necessary
and sufficient) concepts, indicated with implies and
equivalent, respectively.
The question is now, how to use these ontology
representation facilities in order to represent require-
ments specifications such that similar specifications
have short taxonomical distances after classification.
For this task, we examine particular aspects of re-
quirements specifications in the following and de-
scribe how they are represented.
Mapping the RSL Metamodel. The RSL meta-
model consists of 127 classes, sub- and superclasses
and several associations between them. For example,
the class SVOSentence has a generalization relation to
ConstrainedLanguageSentence and the association sub-
ject to class Subject and predicate to class Predicate (see
Figure 1). The mapping is straightforward: classes of
the metamodel are represented with concepts, gener-
alization relations are represented with specialization
relations, and associations are represented with roles.
Furthermore, the associations of a class in the meta-
model define the class, i.e. if an object with such re-
lations exists, then it belongs to that class and if an
object belongs to a class it has the relations of that
class (see Figure 3, upper part).
Although navigation across associations is bidi-
rectional in the RSL metamodel, we do not use in-
verse roles in the DL-model since the similarity com-
putation only requires roles directing down from the
concept representing the requirements specification to
the synsets of WordNet. Likewise, we can avoid a role
hierarchy because only a flat hierarchy of associations
is given in the metamodel. Finally, functional prop-
erties are not needed for defining concepts because
they relate a class to a primitive type not to another
concept. Thus, we can use the description logic lan-
guage ALC instead of the more complex language
ALC H I F .
The mapping described above provides concepts
and roles, which have direct correspondence to
the classes and relations of the RSL metamodel.
The concepts representing the classes of the RSL
metamodel form the upper model of our ontology.
The concepts representing the elements of specific
requirements specifications are modeled as special-
izations of these concepts, thus, they form the lower
part of our ontology.
ONTOLOGY-BASED SIMILARITY OF SOFTWARE CASES - Applying Ontology Reasoning to Software Retrieval
185
Definition of an upper-model concept:
(equivalent SVOSentence
(and ConstrainedLanguageSentence
(some subject Subject)
(some predicate Predicate)))
Parts of the definition of the sentence:
“Client Opens a PC Window”
(equivalent SVOSentenceC2ClientOpensWindow
(and SVOSentence
(some subject SubjectC2Client)
(some predicate PredicateC2OpenWindow)))
(equivalent PredicateC2OpensWindow
(and Predicate
(some verbPhrase SimpleVerbPhraseC2Open)))
(equivalent SimpleVerbPhraseC2Open
(and SimpleVerbPhrase
(some verb PhraseVerbLinkC2Open)
(some object NounPhraseC2Window)))
(equivalent PhraseVerbLinkC2Open
(and PhraseVerbLink
(some linkedVerb VerbC2Open)))
(equivalent VerbC2Open
(and Verb
(some termLinksToWordnetEntry OpenSynset)))
(equivalent NounPhraseC2Window
(and NounPhrase
(some noun NounLinkC2Window)))
(equivalent NounLinkC2Window
(and NounLink
(some linkedNoun NounC2Window)))
(equivalent NounC2Window
(and Noun
(some termLinksToWordnetEntry WindowPCSynset)))
Parts of the definition of the sentence:
“Fireman opens a Building Window”
(equivalent SimpleVerbPhraseC7Open
(and SimpleVerbPhrase
(some verb PhraseVerbLinkC7Open)
(some object NounPhraseC7Window)))
(equivalent NounPhraseC7Window
(and NounPhrase
(some noun NounLinkC7Window)))
(equivalent NounLinkC7Window
(and NounLink
(some linkedNoun NounC7Window)))
(equivalent NounC7Window
(and Noun
(some termLinksToWordnetEntry WindowBuildingSynset)))
Definition of some synset concepts:
(implies OpenSynset VerbSynset)
(implies WindowBuildingSynset NounSynset)
(implies WindowPCSynset NounSynset)
(implies PersonSynset NounSynset)
(implies AdministratorSynset PersonSynset)
(implies ClientSynset PersonSynset)
(implies FiremanSynset PersonSynset)
Figure 3: Examples for mappings.
Mapping Requirements Specifications. Require-
ments specifications of software cases are represented
through instances of classes of the RSL metamodel.
For example, each SVO sentence defined in a soft-
ware case is an instance of the class SVOSentence
of the metamodel and is related to instances of the
classes Subject and Predicate. We map these instances
to further concepts in the ontology. Please note that
we map the elements of requirements specifications to
concepts and not to individuals. This is necessary be-
cause taxonomical relationships cannot be computed
between individuals. However, in our approach it
is a key to compute taxonomical relationships be-
tween the elements of requirements specifications.
For this reason, we represent the elements of a re-
quirements specification as subconcepts of the cor-
responding concept of the upper model. Thus, each
SVO sentence of a specific software case is mapped
to a concept with a unique id as name and the concept
SVOSentence
4
as superconcept. The concepts repre-
senting the elements of requirements specifications
are related through roles defined in the upper model.
Furthermore, the roles specified in the upper model
define the software case concepts (see Figure 3, mid-
dle part). Figure 4 shows a part of a DL-model, which
we created manually for illustration purposes. SVO
sentences are coded by a software case id starting with
C and a readable string reflecting the text of the sen-
tence (e.g. SVOSentenceC2ClientOpensWindow).
This general mapping can be used to convert all
elements of a requirements specification to a DL-
model. However, this is not necessary since RSL
specifications contain elements that are not relevant
for our semantic-based similarity measure such as
NaturalLanguageHypertextSentences, which only con-
tain meaningless strings. A further example are all
TermHyperlinks (e.g. NounLink). They are used to spec-
ify the inflection of a word within a particular sen-
tence, which is of minor relevance for the similarity
of sentences. The same holds for function words like
Determiner. They constitute a significant number of
sentence element’s but have little semantic content on
their own. Mapping each determiner into a concept
within the DL-model would require the DL-reasoner
to identify the equivalence of most of the determin-
ers. This, however, would cost significant amount of
computing time and would not improve the similarity
measure. For this reason, we only map those RSL el-
ements that are relevant for our similarity calculation.
TermHyperlinks e.g. can easily be skipped by relating
Phrases directly with corresponding Terms (see Figure
1).
Mapping WordNet Elements. The WordNet meta-
model consists of several classes like Synonym, Synset,
Wordform etc. Synsets are related via the relation
hyponym and hypernym in a synset taxonomy. For
the classification, only this taxonomical relation and
the fact that synsets are distinct are needed. Thus,
4
This font denotes elements defined in the DL-model.
KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development
186
Figure 4: Asserted hierarchy.
we only map synsets from WordNet. They are rep-
resented with concepts and the synset taxonomy is
represented with the subconcept relation. The role
termLinksToWordNetEntry relates each term of a re-
quirements specification to one synset (see Figure 3).
However, it is not necessary to map all synsets of
WordNet (i.e. almost 117.000) into the ontology. For
our purposes, it is only necessary to map the synsets
that have relations to a term of a requirements specifi-
cation and all the predecessors of these synsets in the
synset taxonomy.
4 CLASSIFYING THE
ONTOLOGY
Through the mapping described in Section 3, require-
ments specifications of software cases are represented
with concepts and roles in a DL-model. Since the re-
quirements specifications contain no information that
directly relates the diverse concepts of distinct spec-
ifications, all concepts are direct subconcepts of up-
per model concepts (see Figure 4). Figure 5 shows
the same DL-model as Figure 4 but after classify-
ing. In Figure 4, the diverse subjects (SubjectC1-
Client, SubjectC2Administrator etc.) are direct sub-
classes of the upper model concept Subject. In this
DL-model, only the synset taxonomy provides a hi-
erarchical structure. For example Administrator-
Synset and ClientSynset are both of type Person-
Synset (see Figure 4). The synset taxonomy provided
by WordNet and the fact that the concepts represent-
ing a requirements specification are defined concepts
and thus, strongly related between each other allow
for the classification described in the following.
This modeling enables a DL-reasoner to use the
synset taxonomy in order to classify the concepts
of the requirements specifications, i.e. to compute,
which concepts are equivalent (see Figure 5, upper
part in red) or, which concepts can be taxonomically
structured (see Figure 5, lower part in blue). In Fig-
ure 5, relations of the synset taxonomy are propagated
one by one to the subjects of requirements specifica-
tions of C1, C2, and C7. Namely, Sub jectC1Client,
Sub jectC2Administrator and Sub jectC7Fireman are
subconcepts of the concept Sub jectC8Person due
to the fact that ClientSynset, AdministratorSynset
and FiremanSynset are subconcepts of PersonSynset.
Thus, after classifying the ontology, the taxonomy
defined in WordNet is also reflected in the concepts
representing the specific requirements specifications.
Even more interesting is the impact on structural con-
cepts like SVOSentence. In Figure 5 for example, the
SVO sentences of case C1 and C2 are classified as
subconcept of the SVO sentence of case C8. This is
due to the fact, that in both sentences a specific type
of person (being either an administrator or a client)
opens a window of an operating system while the
fireman (of case C7) opens the window of a build-
ing. Please note that Figure 5 shows only parts of this
information. This distinction between the different
meanings of window is given through the links to dis-
tinct synsets (window: a framework of wood or metal
that contains a glass windowpane [...] or window:
((computer science) a rectangular part of a computer
screen [...]).
This kind of classification takes the structure, i.e.
the roles between the concepts, into account. Through
this classification, a different taxonomical distance
between the concepts of requirements specifications
is introduced and will lead to different similarities.
By using the WordNet link and the provided mapping
our similarity measure considers the meaning of the
requirements specifications.
ONTOLOGY-BASED SIMILARITY OF SOFTWARE CASES - Applying Ontology Reasoning to Software Retrieval
187
Figure 5: Cutout of inferred hierarchy.
5 SIMILARITY COMPUTATION
The goal of similarity computation is to get a value
between 0 and 1 that indicates an similarity between
a query and a software cases; 0 denoting no similarity
one denoting that the query is completely contained
in the software case. Please note, that this measure
is asymmetric: if a query element is not found in a
software case this results in a lower similarity value.
In contrast, if a software case contains elements that
are not part of the query, this has no negative impact
on the similarity value. This is due to the fact that
additional elements in the software case are potential
for reuse and thus wanted.
The assumption made here is that the taxonomi-
cal distance between two concepts can be the basis
of such a similarity value. By using the classification
facilities of a DL-reasoner, implicit taxonomical rela-
tions are made explicit.
Our similarity measure compares pairs of con-
cepts in the classified taxonomy. This comparison
includes both the concept placement in the taxonomi-
cal hierarchy (distance-based similarity) and the roles
and role fillers that the concepts define (role-based
similarity) (see (Gonz
´
alez-Calero et al., 1999) for a
similar approach).
For computing the distance-based similarity be-
tween two concepts, the distance of both concepts to
the least common subsumer (LCS) is computed, both
values are added and a value describing that distance-
based similarity is returned: 1 means both concepts
are in fact the same concept, and the smaller the value,
the further away the concepts are from each other.
The basic idea of role-based similarity is that the
similarity of two concepts depends on the similarity
of their subgraphs. When comparing roles, their most
important aspects are their fillers: concepts or con-
crete domains that specify the value of a role. When
comparing two concepts, the function of role-based
similarity is recursively applied. The recursion termi-
nates when two concepts without roles are compared;
their similarity is given by the distance-based similar-
ity function.
The roles of both concepts are recursively com-
pared and similarities are summed up for every con-
cept. A value describing role-based similarity is re-
turned: 1 means both concepts have the same roles,
and the smaller the value, the less their roles have in
common.
Both aspects can be computed independently from
each other. Therefore, we defined two algorithms,
one computing the distance-based similarity between
two concepts and another one comparing the common
roles of concepts. The similarity between two concept
definitions is the sum of the distance-based similarity
and the role-based similarity divided by two.
6 EXPERIMENTS
In the previous sections, we used a small running ex-
ample for demonstrating our approach to ontology-
based requirements comparison. In the following, we
give an overview of the experiments we are currently
running for applying the approach to larger examples
and to industrial software cases. First, we sketch the
technical setting.
Technical Setting. The metamodel of RSL is cre-
ated with a standard UML tool (Enterprise Archi-
tect) and an appropriate profile containing the meta-
model. For creating requirements specifications ac-
cording to the RSL metamodel, a specific software de-
velopment tool is under development. This tool was
used by some industrial partners and us for creating
formalized requirement specifications. Internally, the
tool uses a graph representation (see (Dahm and Wid-
mann, 2003) for details), which is also based on the
RSL metamodel. This way, a defined representation
for requirements specifications is available for our ex-
periments.
A JAVA-based converter maps the graph repre-
sentation of requirements specifications to an OWL
knowledge base by using the mapping described in
this paper. For debugging the OWL knowledge base,
Prot
´
eg
´
e is used. Prot
´
eg
´
e also provides interfaces to
the DL-reasoners Pellet and Racer. For debugging in-
KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development
188
ference behavior, Pellint is applied.
5
The similarity computation is a further JAVA-
component developed by us. It traverses the classi-
fied knowledge base for computing similarities as pre-
sented in Section 5. Together with the converter, this
component will be integrated in the above mentioned
software development tool together with other sim-
ilarity measures based on text and graph structures.
The tool is intended to provide sophisticated reuse
mechanisms in combination with software transfor-
mations (
´
Smiałek, 2006).
Industrial Experiments. We first created examples
to test the similarity measure by slightly varying re-
quirements specifications, and manually judge the
similarity of software cases. These experiments show
that the mapping described in Section 3 provides
coherent DL-models. Furthermore, we could show
that the taxonomical similarity measure computes the
same similarity ranking for these test cases as consid-
ered plausible by us beforehand. For applying the ap-
proach in a broader scale, 16 industrial software cases
have been created by international software organiza-
tions using the mentioned tool. The application do-
mains are in the area of internet banking, investment
funds management, emergency systems, forestry sys-
tems, financial contract systems, and funding sys-
tems. They include 261 requirements written in RSL
in total, which map to more than 30.000 defined con-
cepts. Those concepts are reduced to relevant con-
cepts for similarity computations as described in Sec-
tion 3 (see Table 1). Current experiments still use
subsets of the industrial software cases for classifica-
tion due to the complexity of the resulting ontologies.
However, the general approach could be verified and
further work will extend the number of tractable con-
cepts.
Table 1: Sizes of industrial software cases.
Upper model defined concepts 34
Software cases 16
Requirements 261
Defined concepts (complete mapping) 30184
Relevant defined concepts 8325
Relevant roles 36
7 DISCUSSION
In this paper, a new approach in the direction of
reusing software on the basis of its semantical de-
5
protege.stanford.edu, clarkparsia.com/pellet,
www.racer-systems.com, pellet.owldl.com/pellint
scription is presented. For this task, a link between a
formal requirements specification and an ontology is
established. This link and the mapping of the formal
requirements specifications to Description Logic con-
cepts enable inferences that classify existing require-
ments specifications. This classification can be done
offline before the reuse activities start. For reusing
parts of former software cases, similarity computa-
tions on the basis of classified concepts are applied.
When using ontologies in industrial settings the
ontology construction should be considered as an au-
tomatic process unless a knowledge engineer is con-
tinuously available within the organization. Auto-
matic ontology construction can be achieved through
learning ontologies (which is not considered in this
paper), or by constructing ontologies programmati-
cally from existing data and knowledge sources. In
this paper, we examine the formal notation of re-
quirements specifications as such a data source and
developed an automatic mapping from RSL require-
ments specifications to an ontology (here called DL-
model). By doing so, the modeling effort typically
needed when ontology-based approaches are applied
is reduced to linking words to WordNet elements in
order to specify the intended meaning. This can be
done by the requirements engineers who specify the
requirements, i.e. no knowledge engineer is needed.
Structural relations are implicit defined by using the
tool, which automatically creates requirements speci-
fications relying on the given metamodel.
The use of a given ontology like WordNet supplies
an easy-to-use starting point. Although WordNet con-
tains more than 155.000 words and about 117.000
synsets leading to almost 207.000 word-sense pairs,
it does not contain all needed terms for any domain
at hand. This holds especially for highly domain-
specific terms which typically can be found in spec-
ifications. The requirements engineers define these
terms as WordNet extensions, i.e. they specify a tax-
onomical relation. Linking words to WordNet ele-
ments and specifing extensions if necessary is the only
knowledge modeling effort needed in our approach.
This however, is essential for our approach since it
provides the basis for the comparison of requirements
specifications created for different domains, by differ-
ent persons or even in different organizations. With-
out such a link, an alignment or comparison of differ-
ent software cases would be difficult, one could only
apply natural language processing in order to guess
the intended meanings.
The approach of retrieving cases on the basis of
a similarity measure originates in case-based reason-
ing (CBR) (Aamodt and Plaza, 1994). Bergmann
has described how to set up taxonomy similarities
ONTOLOGY-BASED SIMILARITY OF SOFTWARE CASES - Applying Ontology Reasoning to Software Retrieval
189
for various relationship types (Bergmann, 1998). In
(Gonz
´
alez-Calero et al., 1999) and (Gomes et al.,
2004) further combination of CBR and Description
Logic are presented. In the paper at hand, we ap-
ply such taxonomical similarity measures to a tax-
onomy, which has been constructed by first mapping
metamodelling-based requirements specifications to a
DL-Model and then classifying this DL-model by a
Description Logic reasoner.
As a further aspect of CBR, Memory Organization
Packages (MOPs) (Schank, 1982) are used for orga-
nizing cases into significant portions for allowing par-
tial matching of cases. Also specializations are used
for structuring those parts. In this respect, our ap-
proach is similar to MOPs, however, the structure we
use is based on formal concepts of a description logic,
which enable classification services provided by DL-
reasoners, whereas MOPs use a frame-based strict hi-
erarchy.
Through the mapping given in this paper, highly
structured defined concepts are automatically created
from RSL requirements specifications. Description
Logic inferences provide the classification of these
highly structured concepts. By this means, the given
taxonomy of WordNet’s synset is used for classify-
ing all structured elements of requirements specifica-
tions (e.g. SVOSentences) and implicitly existing tax-
onomical relations even between different software
cases are made explicit. We can not somehow relax
the relatively strong modeling with defined concepts,
because moving from defined to primitive concepts
would not allow this classification of structured con-
cepts.
(Borgida et al., 2005) describes similarity mea-
sures for Description Logics, where diverse types of
concept definitions like nested roles are taken into ac-
count. But composite concepts consisting of multi-
ple roles (like SVOSentences) are not yet considered.
However, examining concept definitions in such detail
could be used for improving our similarity measure.
The current mapping of the metamodel takes gen-
eralization relations and associations into account.
Attributes to datatypes like strings or numbers are not
considered. Including those would need Description
Logics that handle concrete domains, and would need
a similarity measure for comparing strings (e.g. based
on information retrieval). Since the tool under de-
velopment already provides such mechanisms, which
will be combined with our ontology-based similarity
measure, we will not go into that direction.
Compared to other specification technologies as
e.g. provided by the specification language Z
(Woodcock and Davies, 1996), our approach al-
lows specifications on the much higher requirements
level, instead of specifiying computer programs it-
self. However, our requirements specification lan-
guage RSL is also used for creating architecture mod-
els and detailed design models through transforma-
tions (Kalnins et al., 2005). Furthermore, the link
of the specification to ontologies enables the support
through reasoners, which is not given in other specifi-
cation languages.
Finally, our approach extends simple WordNet-
based similarity measures as described e.g. in (Ped-
ersen et al., 2004). These measures provide similar-
ity values for synset pairs only and cannot compare
structured elements like SVO sentences or whole sce-
narios. However, this is essential in our approach.
8 SUMMARY
In this paper, a new combination of formal require-
ments specifications and description-logic based in-
ferences is used for facilitating software retrieval. A
formal mapping of requirements specifications repre-
sented by means of a metamodel to a highly differen-
tiated DL-model is given. This model enables a rea-
soner to classify former requirements specifications.
A taxonomy-based similarity computation uses the
classified taxonomy as basis for comparing require-
ments specifications. Our experience so far showed
that the approach works. We were able to map several
industrial software cases into one ontology. Further
application to industrial environments is currently un-
der development.
ACKNOWLEDGEMENTS
This work is partially funded by the EU:
Requirements-driven Software Development System
(ReDSeeDS) (contract no. IST-2006-33596). The
project is coordinated by Infovide, Poland with
technical lead of Wasaw University of Technology
and with University of Koblenz-Landau, Vienna Uni-
versity of Technology, Fraunhofer IESE, University
of Latvia, HITeC e.V. c/o University of Hamburg,
Heriot-Watt University, PRO DV, Cybersoft and
Algoritmu Sistemos.
REFERENCES
Aamodt, A. and Plaza, E. (1994). Case-based reasoning:
Foundational issues, methodological variations, and
system approaches. Artificial Intelligence Communi-
cations, 7(1):39–59.
KEOD 2009 - International Conference on Knowledge Engineering and Ontology Development
190
Baader, F., Calvanese, D., McGuinness, D., Nardi, D., and
Patel-Schneider, P. (2003). The Description Logic
Handbook. Cambridge University Press.
Bergmann, R. (1998). On the use of taxonomies for repre-
senting case features and local similarity measures. In
6th German Workshop on CBR.
Bergmann, R., Breen, S., Manago, M., Wess, S., G
¨
oker, M.,
Althoff, K.-D., and Traph
¨
oner, R. (1999). Developing
Industrial Case-Based Reasoning Applications - The
INRECA Methodology, volume 1612 of Lecture Notes
in Artificial Intelligence. Springer.
Borgida, A., Walsh, Thomas, J., and Hirsh, H. (2005). To-
wards measuring similarity in description logics. In
International Workshop on Description Logics, Edin-
burgh, Scotland. DL2005.
Dahm, P. and Widmann, F. (2003). GraLab - Das Graphen-
labor. Projektbericht 4.3.0, University of Koblenz-
Landau, Institute for Software Technology.
Fellbaum, C., editor (1998). WordNet: An Electronic Lexi-
cal Database. MIT Press.
Fuchs, N. E., H
¨
ofler, S., Kaljurand, K., Rinaldi, F., and
Schneider, G. (2005). Attempto controlled english:
A knowledge representation language readable by hu-
mans and machines. Lecture Notes in Computer Sci-
ence, 3564:213–250.
Gomes, P., Pereira, F. C., Paiva, P., Seco, N., Carreiro, P.,
Ferreira, J. L., and Bento, C. (2004). Using wordnet
for case-based retrieval of uml models. AI Commun.,
17:13–23.
Gonz
´
alez-Calero, P. A., G
´
omez-Albarran, M., and D
´
ıaz-
Agudo, B. (1999). Applying DLs for retrieval in case-
based reasoning. In Proc. of the 1999 International
Workshop on Description Logics (DL’99).
Kalnins, A., Celms, E., and Sostaks, A. (2005). Model
transformation approach based on MOLA. In Work-
shop on Model Transformations in Practice (MTIP),
MoDELS/UML ’2005.
OMG (2006). Meta Object Facility Core Specification,
version 2.0, formal/2006-01-01. Object Management
Group.
OMG (2007). Unified Modeling Language: Superstructure,
version 2.1.1 (non-change bar), formal/07-02-05. Ob-
ject Management Group.
OWL (2004). OWL Web Ontology Language Overview.
W3C. http://www.w3.org/TR/owl-features.
Patel-Schneider, P. F. (1993). Description-logic knowledge
representation system specification from the KRSS
group of the ARPA knowledge sharing effort. Tech-
nical report, DARPA Knowledge Representation Sys-
tem Specification (KRSS) Group of the Knowledge
Sharing Initiative.
Pedersen, T., Patwardhan, S., and Michelizzi, J. (2004).
WordNet::similarity - measuring the relatedness of
concepts. In Proc. 19th National Conference on Ar-
tificial Intelligence (AAAI-04), page 3p.
Salotti, S. and Ventos, V. (1998). Study and formalization
of a case-based reasoning system using a description
logic. In EWCBR, pages 286–297.
Schank, R. C. (1982). Dynamic Memory: A Theory of
Learning in Computers and People. Cambridge Uni-
versity Press.
´
Smiałek, M. (2006). Mechanisms for requirements based
model reuse. In International Workshop on Model
Reuse Strategies - MoRSe, pages 17–20.
´
Smiałek, M., Ambroziewicz, A., Bojarski, J., Nowakowski,
W., and Straszak, T. (2007). Introducing a unified
requirements specification language. In Madeyski,
L., Ochodek, M., Weiss, D., and Zendulka, J., edi-
tors, Proc. CEE-SET’2007, Software Engineering in
Progress, pages 172–183. Nakom.
Wolter, K., Krebs, T., and Hotz, L. (2008). Combined sim-
ilarity measure for determining similarity of model-
based and descriptive requirements. In Artificial In-
telligence Techniques in Software Engineering (ECAI
2008 Workshop), pages 11–15.
Woodcock, J. and Davies, J. (1996). Using Z: Specification,
Refinement, and Proof. Prentice Hall International.
ONTOLOGY-BASED SIMILARITY OF SOFTWARE CASES - Applying Ontology Reasoning to Software Retrieval
191