ONTOLOGY-BASED TEST DATA GENERATION
USING METAHEURISTICS
Zolt´an Szatm´ari, J´anos Ol´ah and Istv´an Majzik
Department of Measurement and Information Systems, Budapest University of Technology and Economics
H-1117, Magyar Tud´osok krt. 2, Budapest, Hungary
Keywords:
Ontologies, Autonomous agents, Optimization with metaheuristics, Test data generation.
Abstract:
Software testing is an expensive, yet essential stage in all software development models, thus there is a great
effort from the research community to facilitate or even automate this step. Although much of the testing
process is automated by modern software development environments (e.g., test execution, monitoring), the
selection of test data remains generally a manual process.
In this paper we present a novel approach for test data generation in case of testing data dependent behaviour
of autonomous software agents. The proposed method uses the metamodel of the agent’s environment derived
from the context ontology, and utilizes the input specifications to formulate the goal of testing. Our approach
suggests the use of metaheuristic search techniques for the generation of optimal test data, usually referred to
as search-based software test data generation.
1 INTRODUCTION
Software testing is the process of evaluating the qual-
ity of the software under test (SUT) by controlled
execution, usually with the primary aim to reveal in-
adequate behavior or performance problems. During
testing a set of test cases is executed to verify the ex-
pected behaviour. A test case consists of input data,
precondition, expected output and postcondition.
Testing is an essential step of all software devel-
opment models. However, writing test cases is ex-
pensive, labor-intensive and time consuming, thus fa-
cilitation or automation of the testing process is de-
sired. The main challenge in test generation is to
avoid ad-hoc testing and support test case generation
using measurable coverage metrics and well-defined
method.
One of the most important tasks in automated test
generation is test data generation, which is the pro-
cess of identifying input data that satisfy certain cri-
teria (test goals). A typical test goal is the verifica-
tion of the behaviour in selected (often all) states of
the SUT. Considering this goal, the automated gene-
ration of realistic and feasible input data is usually
difficult, because of the large state space of the SUT.
However, in certain cases, goals of testing can be ex-
pressed solely by referring to the input domain of the
SUT, without considering its internal states.
Such cases include testing of autonomous soft-
ware agents. A formal definition of autonomous
agents is given in (Franklin and Graesser, 1996),
which states the following: An autonomous agent is
a system situated within and a part of an environment
that senses that environment and acts on it, over time,
in pursuit of its own agenda and so as to effect what
it senses in the future. Thus the goal of testing au-
tonomous agents can be expressed as testing the be-
haviour in case of various configurations of the envi-
ronment (context).
Construction of efficient test data (that cover all
valid configurations using a minimal set of test cases)
is still a difficult problem. Application of determinis-
tic test generation algorithms is often impractical, due
to the high number of potential configurations and the
related semantic constraints that determine the fea-
sible and valid configurations and influence the effi-
ciency of testing in a nontrivial way.
In this paper we propose a novel automatic test
data generation approach, which utilizes the context
model of the agent and applies metaheuristics in or-
der to generate efficient test data.
First, we propose an ontology based construction
of the context model (Section 3). This way the hi-
erarchy and relations of the elements (objects and
217
Szatmári Z., Oláh J. and Majzik I..
ONTOLOGY-BASED TEST DATA GENERATION USING METAHEURISTICS.
DOI: 10.5220/0003533902170222
In Proceedings of the 8th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2011), pages 217-222
ISBN: 978-989-8425-75-1
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
changes) in the environment can be precisely formu-
lated, which can be directly utilized when defining
and computingcontext coverage (as an important cov-
erage metric during testing).
Second, on the basis of the context model, we ex-
press the semantic constraints that are included in the
functional specification or generally characterize the
domain (determining the valid context configurations,
e.g., the arrangement of objects or timing of changes)
in the form of model patterns (Section 4). Usually,
these patterns are overlapping and constructed at diff-
erent levels of the hierarchy of the context model,
which makes the search for test data difficult. This is
why we propose search-based software test data gene-
ration (Section 5): instead of deterministic (in worst
case exhaustive) search in the space of valid context
models, we rely on an iterativeimprovement of an ini-
tial test set by modifying configurations (adding new
elements from the context model) and measuring the
quality of the resulting test data. Measuring requires
the construction of a so-called fitness function that in-
corporates a refined coverage metric with respect to
the model patterns.
Finally, we propose implementation technology in
the form of a model manipulation framework that al-
lows efficient representation of model patterns and
manipulation of test data (Section 6).
2 REFERENCE ARCHITECTURE
An agent can be described with an agent function in
abstract mathematical form. The implementation of
the function is called agent program. The environ-
ment in which the agent operates is usually referred
to as its context.
The reference architecture is shown in Figure 1.
This set up is very similar to the arrangement that au-
thors use in (Russell and Norvig, 2003), when defin-
ing the connection between an agent and its context.
Figure 1: Architecture of an autonomous agent.
The agent program utilizes an internal represen-
tation of the context, that stores the knowledge of
the agent about its environment. This representation
should describe all the things and events that are re-
levant for the behaviour (control algorithms) of the
agent.
The input of the agent program is provided by the
perception module, that identifies the current situation
and the changes of the context. Based on the per-
ception information changes are applied on the inter-
nal context representation. The control of the agent
may include reasoning, learning and adaptation to the
evolving context. Based on the internal context repre-
sentation, the internal rules and goals, the agent prog-
ram makes a decision and generates the input for the
actuators. In this paper we focus on the testing of the
agent program, and we will not deal with the testing
of perception and action execution.
Considering our test goal, testing is implemented
through the generation and manipulation of the
agent’s context (this way the input data for the agent
program). Specific configurations and changes in the
context are considered as test data. To be able to gen-
erate these test data in an automated way, a flexible
but expressive representation of the context is neces-
sary. For this purpose we propose an ontology based
modeling approach: A context ontology is defined that
supports the description of the context elements.
3 CONTEXT ONTOLOGY
Ontologies expressed in description logic formalism
(Bechhofer, 2004) are commonly used to represent
knowledge base in a well-structured and expressive
way. Domain experts can easily use this modelling
approach since it is close to the human thinking and
supports rapid development of domain specific lan-
guages.
Ontologies consist of terminologies (TBox) and
model instances (ABox). A TBox describes the con-
cepts, its relationships and properties. An ABox col-
lects the model elements that are TBox-compliant in-
stances. In other words terminology is a metamodel-
like “dictionary” to define a model while model
instances store the knowledge about the modelled
things.
The context ontology is a domain-specific descrip-
tion of the objects and events in the agent’s context
that are relevant for its behaviour:
The static objects that can be found in the context
are modelled using an ontology concept hierarchy
(in other words a dictionary based taxonomy).
The relations between concepts are also modelled.
ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics
218
Every object could have some properties (e.g., lo-
cation). Properties can be modelled using Data
properties or Object properties, that are base ele-
ments in an ontology TBox model.
Dynamic changes in the environment should also
be modelled using this ontology. We included this
dynamic aspect in the context ontology by defin-
ing the concept of changes with regard to objects
(i.e., an object appears, disappears), their proper-
ties (e.g., a property changes) and relations. Us-
ing these concepts a dynamic context can be de-
scribed.
We also defined context patterns. Each context
pattern represents a fragment of a context model as a
specific arrangement of elements within the context.
These patterns are originated from the specification
(use cases) of the SUT.
We propose the usage of the context ontology
during the requirement specification and test defini-
tion phases due to its expressiveness and the tool-
supported consistency checking facility. In the later
development and test generation phases the context
ontology can be mapped to a domain specific meta-
model and model, while the context patterns can be
mapped to model patterns. Axioms in the ontology
can be mapped to well-formedness constraints (with
regard to the metamodel) and model patterns (that re-
quire a desired configuration of model elements).
4 TESTING CONCEPT
Through the last decades, several approaches have ap-
peared for automatic software test data generation,
however most approaches concentrate on structural
testing. In (Ferguson and Korel, 1996) the authors di-
vided these methods into three classes.
Random methods obviously select input data by
random selection. Path-oriented methods reduce the
test data generation to a path problem, where a path
in the control flow is selected (usually to trigger a
selected program statement), and then the task is to
generate input data to execute that path. In the goal-
oriented approach, the path selection is eliminated,
thus the goal is to find particular input data which trig-
ger a selected statement in the program code. Meth-
ods using this approach monitor the program execu-
tion with the current input data, and classify branches
according to their influence on execution of the de-
sired branch.
The weakness of these approaches in case of agent
testing is the program analysis stage. Both path- and
goal-oriented approaches require the analysis of prog-
ram code, which can be complicated in case of large
programs, partly implemented programs with com-
ponent stubs, or legacy code. Furthermore, all in-
troduced approaches handle complex data structures
with difficulty, though most modern software appli-
cations deal with large and complex input data.
In this paper we focus on functional testing. Our
approach is based on the high-level behaviour speci-
fications of the agent program, that are relations be-
tween the program input and output. For example a
specification states that if a particular configuration
of objects is present in the context, the agent executes
an associated action.
We utilize these specifications to formulate re-
quirements regarding the context of the SUT. These
requirements include the presence of particular con-
figuration of objects stated in the input specification.
These configurations can be represented by context
patterns.
The goal of test data generation is to cover all con-
text patterns. Hence a generated suite of test data
is perfect for our goal, if it covers all required con-
text patterns, this way it is appropriate to determine
whether the SUT behaves as specified. We will refer
to this as sound suite of test data.
The generated test data is an instance model that
conforms to the metamodel constructed from the con-
text ontology, thus it shall fulfill the well-formedness
constraints defined by the metamodel. Additionally,
the test data shall conform with the context patterns
that also originate from the context ontology (for ex-
ample, they require the presence of certain objects
when another object is already present in the instance
model). We will refer to these restrictions as semantic
constraints. We will refer to test data that is well-
formed and satisfies the semantic constraints as valid
test data.
Since the generated test data is an instance model,
manipulation of this model during the execution of
the test data generation algorithm can be implemented
in the form of a model transformation (MT). These
transformations take an input model and produce an
output model by the application of a transformation
rule. In our case the metamodel of the input and the
output instance models are the same, thus we apply
endogenous MTs.
Previously we have stated that a sound and valid
suite of test data covers all required context patterns
and satisfies the semantic constraints. Finding the op-
timal set of test data is a non-trivialproblem due to the
large number of patterns, the hierarchy of objects and
relations included in these patterns and the overlap-
ping nature of patterns and semantic constraints (e.g.,
patterns may contain configurations of elements from
other patterns, constraints may complementary etc.).
ONTOLOGY-BASED TEST DATA GENERATION USING METAHEURISTICS
219
According to these problems, the issue of test data
generation can be formulated as an optimization prob-
lem. A fitness function assigns a real value to each
suite of generated test data. This value indicates how
well a particular candidate fulfills the criteria aggre-
gated in the fitness function. One such criterion is the
coverage of the patterns, i.e., the number of patterns
that is included directly or indirectly (taking into ac-
count the hierarchy and overlapping of patterns) in the
set of generated test data. Another criterion that can
be taken into account is the size of the set of test data
(that shall be kept low to reduce the cost of testing).
Our task is to locate the global maxima of the fitness
function, this way to find the optimal set of test data.
5 SEARCH-BASED TEST DATA
GENERATION
Search-based software engineering (SBSE) is the
use of search-based optimization algorithms (usually
metaheuristic search techniques) to software engi-
neering problems. SBSE is an approach with increas-
ing relevance, since search techniques were success-
fully applied to a number of software engineering
problems throughoutthe whole software development
life-cycle (Harman, 2007). Software testing is proba-
bly the most important application domain of SBSE.
Furthermore the amount of research in search-based
software test data generation alone is so significant
that it led to a survey by McMinn (McMinn, 2004).
Metaheuristics are the primary subfield of stochas-
tic optimization applied for a very wide range of prob-
lems. Metaheuristics can be divided into single-state
methods (i.e., hill-climbing, simulated annealing or
tabu search) and population methods (i.e., genetic al-
gorithms and evolution strategy from the field of evo-
lutionary computation, or particle swarm optimiza-
tion from the class of swarm intelligence methods).
An exhaustive and up to date description of meta-
heuristics is presented by Luke (Luke, 2009).
Metaheuristics are advantageous in problems de-
scribed as “I know when I see it”. In our case,
for example, the formulation of a deterministic algo-
rithm would be impractical taking into accountthe de-
pendency and hierarchy between semantic constraints
and context patterns to cover, though we are able to
score the quality of a candidate solution (test suite)
and decide whether it is optimal.
The key ingredients for the application of search-
based optimization to test data generation is the
choice of representation of the solutions and the defi-
nition of the fitness function. In order to successfully
apply metaheuristic search techniques, a good repre-
sentation should fulfill the heuristic belief about the
space of candidate solutions. This means that similar
solutions behave similarly, thus small changes in pa-
rameters will result in small changes in the quality of
the current solution.
As we already mentioned, the task of test data ge-
neration can be interpreted as generation of instance
models, thus in this case the candidates are repre-
sented as model instances. We call the generated test
data sound according to the fitness function, when all
required objects are covered according to the estab-
lished goals (i.e., there are matches of the context pat-
terns within the model). The fitness function formu-
lated to guide the test data generation should reward
model instances that contain the context patterns with
higher scores. The computation of the coverage of
model patterns, that is the core of the fitness function,
should well handle the introduced hierarchy and de-
pendency problems.
Additionally, the formulationof operators is a fun-
damental question when metaheuristic algorithms are
applied for optimization. These operators define how
the candidate solution(s) can be updated in each iter-
ation. In traditional problems, the candidate solutions
are represented as vectors, thus updating is executed
by the manipulation of values in the vectors.
Since our candidate solutions are represented by
instance models, updating of a candidate can be exe-
cuted by the introduced model transformations. Pos-
sible transformations of candidate solutions are de-
fined by a set of model transformation rules prior to
the execution of the test data generation algorithm. In
every iteration of the applied metaheuristic algorithm,
an arbitrary number of rules are selected and exe-
cuted. Obviously, these rules do not violate the well-
formedness constraints provided by the metamodel.
Figure 2 presents the entire workflow of the pro-
posed test data generation algorithm.
6 IMPLEMENTATION
To implement the proposed test data generation ap-
proach a model manipulation framework is needed,
that supports metamodel based model manipulation
tasks to generate instance models conforming to the
domain specific metamodel.
The set of initial instance models, which forms the
input for the test data generation algorithm, should be
constructed based on the context metamodel.
The test data generation algorithm utilizes the fol-
lowing functions of the model manipulation frame-
work:
Since the conformance to the metamodel is the
ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics
220
Figure 2: Workflow of the proposed approach.
primary requirement in the case of the generated
instance models, the framework should support
efficient metamodel conformance checking.
Application specific requirements are represented
using content patterns, so checking the coverage
of model patterns should be supported.
Since the operators in the test data generation
algorithm are defined as model transformation
rules, the efficient execution of such rules is re-
quired.
Several model transformation frameworks exist
that support these functions. Graph transforma-
tion based frameworks (e.g., VIATRA2
1
, AGG
2
) ap-
ply graph pattern matching and graph transformation
rules for checking and manipulating model instances.
Another solution could be the application of a rule en-
gine (e.g., Drools
3
), which supports the construction
of domain specific rules that can be used for pattern
checking and model transformation.
In certain cases (e.g., in case of autonomous
robots) the context model describes a real world that
consists of 3D objects and dynamic behaviourof these
objects. For demonstration and visualization purposes
the context model can be transformed into a visualiza-
tion language (e.g., X3D is an open-standard format,
that is able to describe 3D scenes and objects).
7 AN EXAMPLE
We demonstrate our proposed approach on the ex-
ample of a simplified version of the Wumpus World.
This world is a popular demonstrating environ-
ment for intelligent agents, thoroughly discussed in
(Russell and Norvig, 2003).
1
See http://www.eclipse.org/gmt/VIATRA2/ for details.
2
See http://user.cs.tu-berlin.de/˜gragra/agg/ for details.
3
See http://www.jboss.org/drools for details.
Wumpus World is a cave with a number of rooms.
Each room is represented with a square. The neigh-
borhood of a room consists of four rooms (north,
south, east and west).
In our simplified example only one Wumpus and
one treasure is present on arbitrary squares. If the
Wumpus is at a square, then there is stench on that
square and all on its neighboring squares. If the trea-
sure is at a square, then there is glitter on that square.
The agent in this world perceives the current
square where it is located. According to the two per-
ceivable elements (i.e., stench and glittering), these
perceptions are represented by two element vectors.
The agent may turn 90
left or right, and go for-
ward. In our case if it goes to a wall nothing happens,
and if the agent advances on a square where a Wum-
pus is waiting, the agent is destroyed. The agent may
decide to leave the cave, if it can not safely determine
the treasure.
Let us consider the following input specification.
The agent operating in an arbitrary Wumpus World
that conforms to the rules introduced above, it is able
to avoid the Wumpus, and find the treasure or quit,
when no further safe move is possible. We may apply
our proposed approach in order to generate test data
(i.e., Wumpus world) to test whether the agent fulfills
this specification.
The context ontology created for the Wumpus
World contains all the elements mentioned above,
with proper relations. The derived metamodel from
this ontology is presented on Figure 3.
According to the ontology axioms defined for the
Wumpus World, we may derive one semantic con-
straint. If the Wumpus is at a square, then there is
stench at all neighboring squares. The model pattern
expressing this constraint is shown on Figure 4. This
is a negative pattern: if it has a match in the candidate
model, then a square exists near a Wumpus without
stench, i.e., the model is not valid for testing.
ONTOLOGY-BASED TEST DATA GENERATION USING METAHEURISTICS
221
Figure 3: Metamodel of the simplified Wumpus World.
Figure 4: Semantic constraint for the test generation.
A Wumpus World generated to test the agent is
sound if it contains all possible elements (Wumpus,
treasure and start square). Context patterns express-
ing these requirements are presented on Figure 5. The
fitness function which measures the quality of a can-
didate Wumpus World counts the patterns that are
covered. For example, if the three model fragments
has exactly one occurrences in the model (i.e., there is
one Wumpus, one treasure and one start square), the
fitness function is maximal. The proper appearance
of squares with stench is guaranteed by the semantic
constraint.
Figure 5: Context patterns that shall be covered.
Since an initial model can be an empty n×n cave,
the applied model transformations may add a Wum-
pus, a treasure or stench to a square, or transform
it to a start square. During the iteration of the se-
lected metaheuristic, a candidate solution is checked
whether it fulfills the well-formedness rules and the
semantic constraints, and then the fitness function es-
timates its quality by checking whether there are oc-
currences of each context pattern in the candidate.
In this example, let us denote the number of oc-
currences of a context pattern i in a given candidate
model with k
i
. Let x
i
= 0, if k
i
= 0 || k
i
> 1, and x
i
= 1
if k
i
= 1, where i S and S is the set of all context pat-
terns. Then the fitness function may assign
iS
x
i
to
the candidate. If the selected threshold is three, the
generated test data covers all context patterns once.
8 CONCLUSIONS AND FUTURE
WORK
Verification of autonomous software agents is a diffi-
cult task, which requires the generation of valid and
sound test data according to the system specifications.
In this paper we introduced an approach, which
uses the context ontology to determine validity of the
generated test data through the derived metamodel
and semantic constraints, and measures the sound-
ness of test data with context patterns derived from
the system specification(s). Furthermore we proposed
the use of search-based test data generation to deter-
mine optimal test data. The implementation of the
proposed approach is currently under development.
The generated test data have to be sound accord-
ing to various, often conflicting context patterns si-
multaneously. An optimal solution is as sound as pos-
sible, while it remains valid, thus it is usually a trade-
off between the individual test goals. This type of
problem is usually referred to as multiobjective opti-
mization problem. In the future we plan to investigate
this problem and apply hierarchical decomposition on
the basis of the hierarchy of the input metamodel.
REFERENCES
Bechhofer, S. (2004). OWL web ontology lan-
guage reference. W3C recommendation.
http://www.w3.org/TR/owl-ref/.
Ferguson, R. and Korel, B. (1996). The chaining approach
for software test data generation. ACM Trans. Softw.
Eng. Methodol., 5:63–86.
Franklin, S. and Graesser, A. (1996). Is it an agent, or just
a program?: A taxonomy for autonomous agents. In
Proc. of the Third International Workshop on Agent
Theories, Architectures, and Languages.
Harman, M. (2007). The current state and future of search
based software engineering. In 2007 Future of Soft-
ware Engineering, FOSE ’07, pages 342–357, Wash-
ington, DC, USA. IEEE Computer Society.
Luke, S. (2009). Essentials of Metaheuristics. Available on-
line. (http://cs.gmu.edu/sean/book/metaheuristics).
McMinn, P. (2004). Search-based software test data gene-
ration: A survey. Software Testing, Verification and
Reliability, 14:105–156.
Russell, S. and Norvig, P. (2003). Artifical Intelligence. A
Modern Approach. Pearson Education Inc., second
edition.
ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics
222