APPLICABILITY OF BET TO ELUSIVE BUGS IN
DIVERSE APPLICATION AREAS
M. Chaudhary, B. Chen, P. Desai, R. Hemmatti, F. Lionetti, T. Hachisuka and W. Howden
CSE, UCSD, La Jolla, CA 92075, U.S.A.
Keywords: Testing, Elusive Bugs, Defects, Bounded Exhaustive Testing, BET, Frameworks, Fault Model, Failure
Model, Test Oracle, Inverse Oracle.
Abstract: The basic principles of Bounded Exhaustive Testing (BET) are reviewed, as well as the concept of an
Elusive Bug (EB). Initial work on the application of BET to EB's previously indicated that it provides a new
and promising approach to this problem. A four-part BET/EB oriented test framework involving: fault
model development, BET test generation design, failure model identification and automated oracle design is
introduced. The framework provides a systematic approach to BET/EB. It was applied to three very
different areas of application. The research indicated the general applicability of BET and the BET/EB
framework. It resulted in increased insight into BET/EB including the development of new techniques, such
as the BET/EB "inverse oracle". The research illustrated how fault models can be used to put BET
application bounding on a systematic basis. It also illustrated how failure models can be used to facilitate
the development of automated oracles, and how they can be used, along with fault models, to systematically
define the effectiveness scope of a BET testing strategy.
1 INTRODUCTION
1.1 Exhaustive Testing
BET is based on the observation that the set of
possible inputs to a program can be classified
according to the "size" of the application instance,
and that defects will occur for small versions in the
same way that they will occur for larger versions.
For example, a sort routine can have input arrays of
different sizes but many failures are as likely to
occur for arrays of small size as easily as they are for
larger arrays. This leads to the idea of exhaustively
testing over "bounded" versions of an application.
Similar finite testing ideas also occur in model
analysis and testing, in which abstraction may be
used to create a finite version of a program or
system e.g. (Artho, 2008).
The BET approach was recently popularized as
an automated class testing method, e.g. (Sullivan,
2004). More recent work has generalized BET, with
more flexible test data generation mechanisms, e.g.
(Howden, 2007), and the identification of BET
oriented test oracles that may not be "complete" but
are more powerful than simple crash detection.
(Howden, 2008). In this paper we are specifically
concerned with the use of BET as a testing method
for attacking the Elusive Bug problem.
1.2 Elusive Bugs
An Elusive Bug has been characterized as one in
which its manifestation depends on a combination of
conditions (Howden, 2008). Elusive bugs often
involve conditions which are individually
meaningful, but for which a combination may have
no meaning other than the post facto discovery that
it causes a failure. Instead of trying to second guess
which combinations might be useful, or how to
define and select them, the idea in BET is to test
them all, within application instance bounds.
1.3 Fault and Failure Models
In order to use BET to systematically test for elusive
bugs we need a general framework for its
application. It should be general enough to be useful
in diverse application areas and at all levels of
testing. The generic approach that we followed has
four parts to it: fault model identification, BET test
generator design, failure model identification and
automated oracle design. We investigated this
approach to BET in three application areas.
277
Chaudhary M., Chen B., Desai P., Hemmatti R., Lionetti F., Hachisuka T. and Howden W. (2009).
APPLICABILITY OF BET TO ELUSIVE BUGS IN DIVERSE APPLICATION AREAS.
In Proceedings of the 4th International Conference on Software and Data Technologies, pages 277-282
DOI: 10.5220/0002213602770282
Copyright
c
SciTePress
Fault models for software describe software
defects, as opposed to failure models which describe
invalid behavior. Fault models can be categorized as
white or black box. White box fault models, as in the
case of white box testing, are defined directly in
terms of program constructs. Mutation testing is
based on white box fault models, in which program
faults are defined in terms of perturbations of
program statements. Black box fault models, as in
the case of black box testing, define faults indirectly
in terms of classes of inputs.
In general, EB-oriented fault models associate
faults with combinations of behaviour-affecting
conditions. The conditions may originate from black
or white box views of the software. One of the
purposes of developing fault models is to
characterize the defects for which a related testing
method is effective. Another, for fault models
associated with BET, is to help identify the size of
the application instance which BET is going to have
to test over to achieve fault model coverage.
Since BET can generate a large number of tests,
it is desirable to develop an automated oracle for
checking its results. It is not always possible to
construct a "complete" oracle. An oracle is complete
if it can compute a necessary and sufficient relation
for the validity of all observed input-output
behavior. An incomplete oracle computes a relation
that is necessary or sufficient but not both. The
simplest incomplete oracle is a robustness checker,
e.g. (Miller, 2006), which determines whether or not
a program has crashed or delivered an unexpected
exception on a test. Although it may not be possible
to develop a complete automated oracle, it is often
possible to devise a necessity oracle that will do
better than simple robustness checking. The first
step in designing an oracle is to develop a failure
model which characterizes the class of invalid
behavior that can be detected. A framework for
defining and developing incomplete oracles and
failure models was previously described in
(Howden, 2008).
When a BET testing strategy is designed, its
effectiveness is circumscribed by the fault and
failure model(s) that characterize the kinds of faults
it will be able to detect. We note that the four parts
of the approach are not always applied in order. The
availability of a certain kind of automated oracle
may determine the failure model, or the fault model
and the BET test generator may be identical.
1.4 Diverse Application Areas
Programs from three application areas were
considered: graphics, numerical simulation, and
distributed systems. For each of these we describe
the application of the four-part BET procedure and
consider its applicability.
2 GRAPHICS
One of the more important contemporary graphics
applications involves rendering. Rendering provides
a visual representation of simulated objects
illuminated by simulated lighting. Part of the
rendering process involves the determination of light
ray intersections with an object surface (Kensler,
2006).
One common way of representing surfaces is to
use contiguous triangles. Both the triangle surfaces
and a light ray can be considered to exist in a 3d
grid, in which points represent vertices and line
endpoints. A variety of algorithms have been
developed. The testing problem is difficult because
there has not been a programmable way of
determining if the results of a test are correct, i.e. no
automated oracle. Previous practice includes visual
examination of the output from a rendering program.
A failure to detect an intersection point may show up
as a black dot on the screen (Woo, 1996).
The BET approach to this application area led to
a way of automatically testing intersection programs
using an automated oracle.
2.1 Fault Model
In this application, we considered both white and
black box fault models. The first was used to
motivate the second.
a) White box. There are three kinds of faults that
can occur: computational expressions, precision, and
program logic. The first can be covered by testing
the computational expressions in isolation. The other
two by considering their role in the generation of
black box faults.
b) Black box. The program input consists of
surfaces and light rays. Round-off errors may result
in failures in which an intersection at an exact vertex
or surface boundary is missed. If a boundary divides
two surfaces, then a ray passing through a common
boundary may be missed during calculations made
for one surface but not the other, so a correct result
could be determined. If the boundary is a surface
edge, then the same calculations could produce an
incorrect result, indicating that an input based fault
model should include examples in which surface
boundaries are both internal and external.
Precision faults may also occur when a ray
passes very close to a surface without intersecting it.
ICSOFT 2009 - 4th International Conference on Software and Data Technologies
278
Intersection computations for the case where a ray
intersects a surface at its interior may be less likely
to fail due to precision problems, and more likely to
fail due to program logic errors that do not show up
elsewhere.
These considerations lead to a fault model in
which differently directed rays are combined with
one or more surfaces. The elusive bug (EB) fault
model, which consists of all combinations of
relevant behavioral conditions, will include
combinations such as a ray passing at a sharp angle
through a three-way vertex. This is typical of the
kinds of combinations that black box testing may not
test for since there is no obvious cause-effect
relationship here, and the combination is not
semantically meaningful. Yet it is typical of the kind
of unexpected combinations that can unexpectedly
cause problems.
2.2 BET Test Data Generation
The above fault model can be covered with a BET
testing approach having the following properties.
First construct a cube containing possible
intersection and vertex points that is large enough to
include representative surfaces constructed from
three contiguous triangles. The cube also includes
rays which begin at locations in the cube and are
directed at different possible angles. Ray generation
can be simplified by considering the object surface
triangle to be a subsurface of a larger super-surface.
Rays are constructed that connect each possible
originating location with a location on the surface. If
the surface location is in the object subsurface, the
algorithm should determine that an intersection
exists. If outside the object subsurface, no object
subsurface intersection should be identified.
2.3 Failure Model
In this example, we have a simple failure model. The
program fails if it identifies a ray surface
intersection when none exists, or fails to identify an
intersection when it does exist.
2.4 Automated Oracle
One of the more interesting general concepts that we
discovered during the investigation of this example
was that of an inversion oracle. In this approach, the
test generator starts with an output result and then
constructs all inputs that should lead to that result. In
this way it knows what the output should be for any
test. In the surface intersection example, output can
be True (intersection) or False (no intersection).
2.5 Evaluation
Previous experience with this problem domain
provided examples of incorrect rendering in which
an intersection was not observed by the graphics
rendering tool. Standard test practice, which was
both awkward and unreliable, was to choose random
input data and to visually examine the output for
problems. Failed intersection detections showed up
as a black spot in the display. A sample BET testing
system was constructed using the design described
above and run on the examples defined by the EB
fault model. The use of the inversion oracle made it
possible to reliably and automatically detect
intersection defects.
3 FINITE ELEMENT
SIMULATION
A common method of simulating physical objects is
to construct a grid or mesh. Each cell in the grid is
associated with an equation that describes the
properties of the cell. The cell takes inputs from one
or more units surrounding it, and then produces
outputs which become inputs to its neighbours. The
properties of the corresponding physical entity are
simulated over a sequence of steps, at the beginning
of which inputs are produced at one or more "input"
cells and then at the end outputs measured at one or
more "output" cells.
The Continuity simulation program (Bioeng.,
2008) can be used to simulate electrophysiology of
living structures, and in particular to determine the
transmitted effects of a stimulus. For example, a
voltage could be supplied to one surface, and the
model used to measure the transmitted effects
arriving at some other location or surface.
3.1 Fault Model
This application involves a variety of conditions,
corresponding to different kinds of inputs that will
affect behavior. The EB-oriented fault model is
associated with combinations of these conditions.
The Continuity program can accommodate a
wide variety of problem applications. For our
experiment, we used a generic simple version, in
which a voltage is transmitted by "pure diffusion"
through a cell, rather than through a means involving
more complex equations. The input conditions that
could vary were: mesh shape, stimulus location,
mesh dimensionality, basis functions, derivative
type, solution steps, rendering, and boundary
condition constraints.
APPLICABILITY OF BET TO ELUSIVE BUGS IN DIVERSE APPLICATION AREAS
279
3.2 BET Test Generation
As in other BET applications, a central concern is
the application bounding issue. For some of the
conditions the values are binary, so both can be
chosen. For others, the possibilities are large or
unbounded. Examples of the latter are: model size,
number of simulation steps, and location of stimulus.
The mesh size is related to the problem of
solution convergence. The idea is to try models with
different numbers of mesh elements and observe if
important phenomena are observed. If not, then the
model is too coarse. The size is increased until that
behavior is observed and if the behavior stabilizes
then the model size is fixed. In our experiments, a
mesh with a stack of four elements was found to be
sufficient.
The number of steps is also related to behavior
convergence. Once a certain number of simulation
cycles have been performed, if the behavior
converges then that is considered adequate. In our
experiments, between 6 and 10 steps was found
sufficient so these two values were chosen for the
tests.
Finally, the stimulus could be located in any
location in the model. We chose the following set of
six representative possibilities: top line, middle line,
top left point, top middle point, point at the center of
the mesh, and simultaneously at all points
throughout the mesh.
The above choices, with a simple diffusion
model, resulted in a set of 1536 tests.
3.3 Failure Model
Our failure models included a robustness model. An
output value is referred to as NaN (not a number) if
it is not numeric. The occurrence of this output
indicates failure, since numeric output is a necessary
condition for validity.
We were interested in detecting functional
failures. Even if they corresponded to incomplete,
necessity property, this would be stronger than
simple robustness. To do this, we considered
different output relationships that must occur for
different stimulus conditions and mesh geometry. As
an example of this, consider a simple 4-by-4, 2D
mesh, with a stimulus that occurs simultaneously
across the middle of the mesh. The "output" value at
the middle of the top and bottom at the end of the
simulation must be the same. These necessary
relationships, together with the NaN robustness
requirements, constituted our failure model.
3.4 Automated Oracle
Our test oracles paralleled the above failure models:
a robustness oracle determined if outputs were NaN,
and our (incomplete) functional oracle measured
required relationships between output values.
Previous automated oracles for testing this
application involved the use of regression tests. The
dependency of a relation-based failure model, like
the one we used, on a class of inputs requires more
initial test planning than the use of a simple
regression failure-model/oracle, but it is less fragile
and more reliable in the following sense. If a set of
regression tests fails, then new tests may have to be
established as the gold standard. It may not be clear
which set of results are the valid ones. In the case of
a necessity relationship, it is always valid.
3.5 Evaluation
For this example, BET tests based on our failure
model were run against the Continuity system. They
revealed a known existing bug as well as many
additional problems. The existing bug was of the EB
type, in the sense that it only occurs when one of our
fault model suggested combinations is used.
Other defects that were discovered included
memory leaks and problems with certain kinds of
boundary constraints. They also included a classic
elusive bug in which some functionally meaningless
combination of input types caused a design and
coding defect to surface. BET provided the first
systematic rigorous approach to testing that has been
available for this application.
4 DISTRIBUTED SYSTEMS
We viewed distributed systems as those in which a
set of processes communicates with each other to
perform some function. In our investigation of
distributed systems, instead of looking at a particular
application we considered previous approaches to
the testing problem, to see where BET might fit in,
and whether or not it could be an improvement.
The first approach (Boy, 2004), which we will
refer to as "Random", uses randomized testing. A set
of clients is simulated, and for each client a set of
random requests is sent to the server. The test driver
keeps track of the sequence of calls that are made
from the simulated clients, along with the received
responses. Failures correspond to invalid sequences.
A set of invariants is created which all sequences
must satisfy. In the vocabulary of this paper, the test
oracle is a necessity oracle.
ICSOFT 2009 - 4th International Conference on Software and Data Technologies
280
There are several possible problems with this
kind of approach. The first is characteristic of all
automated testing methods that depend on random
inputs. If there are a large number of possible
sequences, random selection may easily miss the
small number of them that cause failures. Also, if the
selection is truly random, then arbitrarily long
sequences must be included, which will exacerbate
this problem. Another problem, not connected to the
randomness of test input, is a lack of control over the
communications medium. Some faults may involve
network transmission race conditions, which cannot
be directly explored.
The first problem, random selection, is avoidable
with BET, provided the bounding constraint does
not exclude a fault. (Boy, 2004) describes a bug that
was found by random testing but whose discovery
seems to require luck when random testing is used.
The same defect would have been found every time
using a straightforward application of BET.
The second drawback to Random, the inability to
explore network induced failures, can be overcome
using a test strategy in which the network is
simulated, as in Cesium. In this approach, the
application processes run in the same network
environment as a test driver. This enables the driver,
at each moment of simulation time, to check process
queues for required consistency relationships and
other properties. It can also manipulate the queues to
simulate failures such as lost messages and so on.
A drawback to Cesium is that the tester is still
responsible for constructing all test scenarios. There
may be combinations of events that could cause a
failure that the test designer did not think of. The
advantage of Random is that there is always some
probability of these occurring, even if it is very
small. BET combines the potential inclusiveness of
Random, with the guarantee of execution of Cesium.
4.1 Fault Model
We considered faults that correspond to sequences
of interactions between the processes in the system.
For example, a race condition caused by a long
delay in a message making its way through the
system corresponds to a sequence of
communications in which a request and its receipt
are separated by a certain sequence of events that
were made possible by that delay, and which causes
consequent invalid behavior.
4.2 BET Test Generation
There are two principal issues here: test generation
and the bounded application specification.
In a BET-oriented approach, a simulated
environment like that used in Cesium, would be
employed, but with a generic test driver and network
simulator that would implement an all-combinations
approach to test construction.
The first bounding consideration is the number
of processes that we included in our test sequences.
If our fault model is concerned with faults resulting
from the simultaneous use of shared resources, then
BET will have to generate tests in which up to three
processes are involved.
A second consideration is a bound on "amount of
time" (i.e. number of simulation cycles), for which a
message to another process can be stalled inside the
simulated network before delivery. In (Guillarmo,
2000), the authors incorporate lower and upper
bound communication delays. These could be
incorporated into the fault model. Communications
that take longer than this are treated as network
failures, and the corresponding messages left
undelivered. This aspect of a fault model guides the
construction of the corresponding aspect of the BET
test generator.
Assuming that we can make the tests long
enough to cover race condition message sequences,
the length of a test is still an open issue. In the case
where there are finite resources such as message
queues, a state can be reached in which system
behavior is altered due to the exhaustion of this
resource. If we can test all combinations of messages
that are long enough to cause this situation, then we
would cover this aspect of the fault model. But if
this only occurs after lengths of message sequences
for which the consideration of all combinations is
impractical, then it is necessary to have tests which
set the system to the necessary intermediate state
before the exhaustive all-combinations part of the
test generation process begins.
4.3 Failure Model
Generic failure conditions for distributed systems
include: failure to send a message and failure to
receive a message. Automated oracles need to cover
this simple failure model.
Failure models may also be more specific to the
application such as the assignment of the same lock
to two processes by a lock server. This is the kind of
failure described below in the Evaluation Section.
4.4 Automated Oracle
The approach suggested in (Boy, 2004) involves the
detection of illegal patterns in sequences of
messages and responses. For example, certain
APPLICABILITY OF BET TO ELUSIVE BUGS IN DIVERSE APPLICATION AREAS
281
requests will require a response, and a missing
response failure will show up in the invoking
sequence as an illegal pattern. This was the approach
we followed.
4.5 Evaluation
The BET approach to this problem was analyzed
with respect to the sample defect described in (Boy,
2004). In this example there are 2 or more clients, A
and B, and a lock server. The code was apparently
written so that if a client does not receive a
requested lock after a certain amount of time, then it
sends a release() for the lock to the server. Suppose
that the grant() message had been issued by the
server but not yet received by process A when it
sends the release(). The logic written into the server
is such that when it gets the release() it assumes the
lock is free, so that it can allocate it to a subsequent
request() from process B. As soon as A receives the
tardy grant() we will have a situation in which both
A and B have the same lock.
The random testing experiments described in
(Boy, 2004) resulted in the detection of this bug.
However, a different set of tests, just as likely to be
chosen, would have missed it. On the other hand, the
more precise approach described in the Cesium
paper would have repeatably and reliably found the
defect, provided the tester had thought to construct
such a test. BET would reliably generate a defect-
revealing sequence every time.
5 CONCLUSIONS
The results of our investigation were positive in two
ways. The four-part framework was an effective,
generic approach to the application of BET, in which
the essential concerns are identified and separated.
In addition, BET was found to be an effective defect
detection technique across the wide range of
examples that were considered.
Fault models can be used to specifically
document the defects for which a set of tests will be
effective. They proved to be a convenient way to
consider the application-bounding aspect of BET, in
order to systematically define minimal bounds for
reliable fault detection. In all three application
examples, the requirement that the BET test
generator "cover" the fault model facilitated the
identification of necessary lower bounds on the
"size" of the bounded application to be used in test
construction.
Failure models were found to be useful in
identifying a class of failures that can be observed.
Taken together, the fault and failure methods
circumscribe the defects for which a BET testing
effort is guaranteed to be effective.
Our BET-oriented test procedure led to
significant paradigm shifts in the way that two of our
applications could be tested. For the graphics
application it led to the creation of an inverse oracle.
For the finite element simulation, it led to a failure
model that was stronger than simple robustness and
more consistent than regression testing. Both of
these novelties were due to the very essence of BET
– bounding the complete problem domain into
meaningful subdomains by some central
characteristics. By considering these central
characteristics, stronger correctness oracles naturally
become apparent. This is not likely to occur during
simple random testing in which the test domain does
not have the sort of structure that lends itself to the
discovery of stronger failure models and automated
oracles. While this kind of paradigm shift may not
be unique to our testing methodology, and may not
occur for every application, our experience indicated
that our systematic testing framework facilitates this
kind of change in thinking.
REFERENCES
Artho, C., Leungwattanakit, W., Hagiya, M., Tanabe, Y.,
Tools and Techniques for Model Checking Networked
Programs. SNPD, 2008.
Bioeng. Dept., http://www.continuity.ucsd.edu, UCSD,
2008.
Boy, N., Casper, J., Pacheco, C., Williams, A.,
"Automated Testing of Distributed Systems, Final
project report, MIT 6.824, May 2004.
Guillarmo A. A., and Cristian, F., Simulation-based
Testing of Computer Protocols for Dependable
Embedded Systems, The Journal of SuperComputing,
16(1-2), 2000.
Howden, W.E. and Rhyne, C., Test Frameworks for
Elusive Bug Testing, ICSOFT, Barcelona, Spain, 2007.
Howden, W.E., Elusive Bugs, Exhaustive Testing and
Incomplete Oracles, ICSOFT, Porto, Portugal, 2008.
Kensler, A.; Shirley, P. “Optimizing Ray-Triangle
Intersection via Automated Search”. IEEE Symposium
on Interactive Ray Tracing 2006, Sept., 2006.
Miller, B. P., Cooksey, G, and Moore, F. “An Empirical
Study of the Robustness of MacOS Applications
Using Random Testing,” Proceedings of the 1st
International workshop on Random testing, 2006.
Sullivan, K, J., Yang, J., Coppit, D., Khurshid, S.,
Jackson, D., Software Assurance by Bounded
Exhaustive Testing, Proc. ISSTA, 2004.
Woo, A., Pearce, A., and Ouellette, M. “It’s really not a
rendering bug, you see...” IEEE Computer Graphics &
Applications, 16(5), September 1996.
ICSOFT 2009 - 4th International Conference on Software and Data Technologies
282