Speciﬁcation Based Testing of Object Detection for Automated Driving

Systems via BBSL

Kento Tanaka

, Toshiaki Aoki

1 b

, Tatsuji Kawai

1 c

, Takashi Tomita

1 d

, Daisuke Kawakami

and Nobuo Chida

Japan Advanced Institute of Science and Technology, 1-1, A sahi-dai, Nomi, Ishikawa, 923-1292, Japan

Advanced Technology R&D Center, Mitsubishi Electric Corporation,

8-1-1, Tsukaguchi-Honmachi, Amagasaki, Hyogo, 661-8661, Japan

Keywords:

Automated Driving, Machine Learning, Deep Learning, Object Detection, Testing, Formal Speciﬁcation,

Image Processing.

Abstract:

Automated driving systems(ADS) are major trend and the safety of such critical system has become one

of the most important research topics. How ever, AD S are complex systems that involve various elements.

Moreover, it is difﬁcult to ensure safety using conventional testing methods due to the diversity of driving

environments. Deep Neural Network(DNN) is effective for object detection processing that takes diverse

driving environments as input. A method such as Intersection over Union ( I oU) that deﬁnes a threshold value

for the discrepancy between the bounding box of the inference result and the bounding box of the ground-

truth-label can be used to test the DNN. However, there is a problem that these tests are difﬁcult to sufﬁciently

test to what extent they meet the speciﬁcations of ADS. Therefore, we propose a method for converting formal

speciﬁcations of ADS written in Bounding Box Speciﬁcation Language (BBSL) into tests for object detection.

BBSL is a language that can mathematically describe the speciﬁcation of OEDR ( O bject and Event Detection

and Response), one of the tasks of ADS. Using these speciﬁcations, we deﬁne speciﬁcation based testing of

object detection for ADS. Then, we evaluate that this test is more safety-conscious for ADS than tests using

IoU.

1 INTRODUCTION

Automated driving systems (ADS) are ac tively devel-

oped by several manufacturers an d their failure can

cost human life (Devi et al., 2020). There fore, ensur-

ing their safety has become one of the most important

research topics. However, ADS are complex systems

including various elements, such as ma chine learning,

route search algor ithm and sensing technology. Fur-

thermor e, the driving environment surrounding those

systems is diverse. Therefore, it becomes difﬁcult

to design speciﬁcations and test them as in general

software development methods. To solve this prob-

lem, government agencies in various countr ie s are re-

searching frameworks for designing and testin g ADS

by deﬁning and designing multiple scenarios of driv-

ing environments, systematizing use cases, setting

https://orcid.org/0000-0002-3532-6954

https://orcid.org/0000-0002-1209-6375

https://orcid.org/0000-0003-1247-5663

https://orcid.org/0000-0003-1249-7862

safety standards, and establishing evaluation frame-

works. For example, the National Highway Trafﬁc

Safety Administra tion (NHTSA) in the United States

ﬁrst determines the level of a utomation of the auto-

mated driving system to be developed, and then de-

velops the Operatio nal Design Domain(ODD) and the

Object and Event and Response ( OEDR) ar e clar iﬁed.

ODD is the speciﬁc conditio ns under which ADS or

its functions are designed to o perate, such as road

types, spe ed limits, lighting conditions, weather con-

ditions, and other operational constraints. The various

driving environments are then classiﬁed into some-

what abstract scenar ios, such as ”merging into an-

other lane a t low speed,” and OEDRs are designed

based on these scenarios to monitor the driving e n-

vironm ent and respond appr opriately to these objects

and events (Thorn et al., 2018). Th ese levels of au-

tomation, ODDs, and OEDRs ar e deﬁned in the SAE

J3016 standar d (Co mmittee, 2021 ).

Among the components of ADS with the above

characteristics, our research focuses on the object de-

250

Tanaka, K., Aoki, T., Kawai, T., Tomita, T., Kawakami, D. and Chida, N.

Speciﬁcation Based Testing of Object Detection for Automated Driving Systems via BBSL.

DOI: 10.5220/0011997400003464

In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2023), pages 250-261

ISBN: 978-989-758-647-7; ISSN: 2184-4895

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

tection process. In object detection for ADS, deep

neural networks (DNN) are used to cope with large

amounts of input. As shown in Figure 1, this DNN

typically takes an image as input and a labeled rect-

angle c alled bounding box as the ou tput of the infer-

ence result. The general approach to testing such a

DNN is to match the inferred label by the DNN with

the ground-truth label, given an image. However, it

is difﬁcult for DNN to detect the position of a par-

ticular object in perfect agreement with its ground-

truth label. Therefore, in pr a ctice, it is tested using a

threshold called Intersection over Union (IoU) (Ever-

ingham and Winn, 20 12). IoU is a number that quan-

tiﬁes the degree of overlap between two boxes. In

the case of object detection, IoU evalua te s the over-

lap of the ground -truth label and inferred labe l. For

example, Figure 2 shows two images with a ground-

truth la bel (red bounding box) and an inferred label

(green bounding box). In this case, the IoU is about

1/3 in both images. However, given a safety require-

ment to stop when there is a vehicle in the direction of

travel, the DNN that inf e rs the gree n bounding box in

the left image violates the safety requirement, while

the DNN that infers the green bounding box in the

left image satisﬁes it. The above shows that the test

of object detection using IoU is at variance with the

speciﬁcations for AD S. Therefore, it is necessary to

study speciﬁcatio n-based testing methods for ADS.

Figure 1: DNN inputs and outputs.

Figure 2: Problems when thresholds are deﬁned in the IoU.

In order to achieve speciﬁcation based testing of

the object detection process, a rigorously deﬁned

speciﬁcation of how the system should operate in

a given driving environment is req uired. Among

the tasks of automated driving systems, th ere is a

languag e called BBSL for writing speciﬁcations for

OEDR (Tanaka et al., 2022). BBSL is a language that

can describe speciﬁcations mathematically by repre-

senting objects such as other vehicle s and pedestrians

as b ounding b oxes and using positional relationships

between bounding boxes. The speciﬁcation based

testing proposed in this paper is a method for spec-

iﬁcation based testing of the object detection pro-

cess, which has a particular impact on safety am ong

the co mponents of automated driving systems. Us-

ing a simple example speciﬁcation, we evaluate that

the test is speciﬁcation based an d includes important

safety contests that cannot be considered in conven-

tional IoU based testing m ethods.

2 RELATED WORK

A common ADS is composed of four functional mod-

ules, namely, the sensing module, the perception

module, the planning module and the control mod-

ule in Figure 3. The purpose of our study is to judge

whether a bug exists in the perception module o f the

perception of these.

Figure 3: Typical architecture of an ADS.

The general approach of testing a DNN in the

perception module is to match the inferred label by

the DNN with the ground-truth label, given an im-

age. Usually, these ground-tr uth lab e ls are ob ta ined

by manual la beling (Sun et al., 2019) ( Kondermann

et al., 2016 ). Then, IoU, also known as Jaccard in-

dex, is used as a threshold to de termine whe ther the

positional descriptions of the ground-truth-label and

inferred-label match o r not. IoU compu tes th e dis-

crepancy between the bounding box of the inference

result and the bounding box of th e ground-truth-label,

but in our study, we use the speciﬁcation of AD S and

compute it ba sed on whether the speciﬁcation is sat-

isﬁed.

In perception testing, due to the huge input

space of the DNN models, it is a great challenge

to specify the oracles fo r all the input ima ges.

One solution to this problem is metamo rphic test-

ing.Metamorphic testing was intr oduced to tackle the

problem when the test oracle is absent in tr aditional

software testing (Chen et al., 2 020). This test de-

scribes the system f unctionality in terms of generic re-

lations(metamorphic relations) between inputs rather

than as mapping s between input and output. In ADS,

Various metamor phic relations have been proposed,

over images and frames in a scenario. For example,

in object detection , ther e is a metamorphic relation

that objects detected in the original im a ge should also

be detected in the synthetic images (Shao, 2021), and

Speciﬁcation Based Testing of Object Detection for Automated Driving Systems via BBSL

251

for LiDAR based object detection, there is a metamor-

phic relatio n over the image that the noise points out-

side the Region of Interest (ROI) should not affect the

detection of objects w ithin the ROI (Zhou and Sun ,

2019). Also, two metamorphic relations over fra mes

in a scenario are proposed to respectively for iden-

tifying temporal and stereo inconsistencies that exist

in d ifferent fram es of a scenario (Ramanagopal et al.,

2018). The temporal metamorphic relation says that

an object detected in a previous frame should also be

detected in a later frame, and the stereo metamorphic

relation is deﬁned in a similar way, for regulating the

spatial consistency of the ob je cts in different frames

of a scenario.

Recently, temporal logics based formal speciﬁca-

tions have been adopted in the monitoring of the per-

ception module of ADS. In general, temporal log-

ics are a family of formalism used to express tem-

poral properties of systems. For example, a new

form called Timed Quality Temporal Logic (TQTL),

which can be used to express te mporal properties that

should be held by the perception mod ule during ob-

ject detection (Dokhanchi et al. , 2018) . Co nceptu-

ally, the prop e rties expressed by TQTL are similar to

the ones in the metamorphic relations as mentioned

above (Ramanagopal et al., 2018). however, by adopt-

ing such a formal speciﬁcation to express these pr op-

erties, one can synthesize a m onitor that automati-

cally c hecks the satisﬁability of the system execution.

TQTL is later extended to Spatio-Temporal Q uality

Logic (STQL) (Balakr ishnan et al., 20 21), which has

enriched syntax to express more r eﬁned properties

over the boundin g boxes u sed in object detection. In

our study, we test the perception modu le of ADS u s-

ing a formal speciﬁcation called BBSL. BBSL does

not currently use temporal logics, so temporal proper-

ties cannot be expressed. However, BBSL can express

properties related to position on the bounding box in

more detail than STQL.

3 BOUNDING BOX

SPECIFICATION LANGUAGE

The speciﬁcation-based test proposed in this paper

uses speciﬁcations written in a formal speciﬁcation

languag e called BBSL (Tanaka et al., 20 22). BBSL

is a language that describes objects in an image as

bounding boxes and the positional relationships be-

tween the bounding boxes at a level of abstraction

that can be deﬁned manually. By deﬁnin g a bounding

box as a two-dimensional interval in in te rval analy-

sis (Moore et al., 2009), BBSL strictly describes posi-

tional relationships on an image as relationships o n a

bounding box (or set of bou nding boxes). I n additio n,

special relations and fu nctions are deﬁned for posi-

tional relations that cannot be described b y interval

analysis. In this section, we show the types and rela-

tions of BBSL and outline the speciﬁcations of ADS

written in BBSL.

First, we explain the basic types used in BBSL ,

interval and bb. The interval type represents an inter-

val like decelerating distance in Figure 4. This inter-

val has th e same deﬁn ition as that deﬁned in interval

analysis. This is shown in Deﬁnition 1.

Figure 4: Visual r epresentation of the driving environment

and abstract speciﬁcations written in BBSL.

Deﬁnition 1. Let a

,a be real numbers. Then an inter-

val a is deﬁned as follows:

a = [a

,a] = {x ∈ R : a ≤ x ≤ a}

Objects in the ima ge, such as vehicles, are

represented by bounding boxes deﬁned by a two-

dimensional interval. The bb type represents a bound-

ing box , which has the same deﬁnition in interval

analysis. This is shown in Deﬁnition 2.

Deﬁnition 2. Let a

,i = 1,...,n be interva ls.Then a

multidimensional interval a is deﬁned as follows:

a = (a

,...,a

)

In particular, when n = 2, the multidimensional inter-

val a is called the bound ing box.

The magnitude relationship of intervals has the

same deﬁnition as the interval analysis. However,

since interval analysis is deﬁned to calculate real val-

ues with ro unding er rors and measurement errors,

many relationships cannot be applied to the descrip-

tion of p ositional relationships. Therefore, BBSL pro -

vides unique operations and re la tionships that are use-

ful for describing image speciﬁca tions.

First, we introduce the basic relation s used to de-

scribe the positional relation ship between intervals.

These deﬁnitions are the same as those used in in-

terval analysis and are shown in Deﬁnition 3, Deﬁn i-

tion 4.

Deﬁnition 3. Let a,b be intervals.Then the binary re-

lation < on two intervals is deﬁned as follows:

a < b ⇔

a < b

Deﬁnition 4. Let a, b be intervals.Then the equiva-

lence relation = on two intervals is deﬁned as follows:

a = b ⇔ a

= b and a = b

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

252

With these relatio nships, it is possible to represent

the positional relationship between the vehicle’s y-

coordinate interval and the stoppingDistance deﬁned

as the distance to be maintained, as shown in Figure 5.

Figure 5: Example of simple positional relationships.

The deﬁnition of the inclusion relationship be-

tween intervals is shown in Deﬁnition 5.

Deﬁnition 5. Let a,b be intervals.Then the inclusion

relation ⊆ on two intervals is deﬁned as follows:

a ⊆ b ⇔ b

≤ a and a ≤ b

For the positional relationship between these in-

tervals, ≈ is introduced as a shorthand notation to

express the relationship where two intervals overlap,

which frequently occ urs in the speciﬁcation of auto-

mated driving systems. This is shown in Deﬁnition 6.

Deﬁnition 6. Let a,b be intervals.Then the overlap

relation ≈ on two intervals is deﬁned as follows:

a ≈ b ⇔ b

≤ a and a ≤ b

In addition, a function called PROJ function is

provided to map the object to the x- an d y-axis side

intervals. This is shown in Deﬁnition 7.

Deﬁnition 7. Let a be bounding box (a

).Then the

projection function PROJ

from bounding box to in-

terval is deﬁned as follows:

PROJ

(a) = a

,PROJ

(a) = [

a,a],PROJ

(a) = [a,a]

,i ∈ {x,y}

BBSL can describe various positional relation-

ships strictly using the PROJ function and interval re-

lationships described above for the bounding boxes

representin g objects. For example, Figure 6 shows

some examples of a positional relationship for two

bounding boxes a and b describ e d in BBSL.

By using such types and functions, BBSL can de-

scribe OEDR speciﬁcations for ADS strictly from an

image perspective. Listing 1 shows an example of

a speciﬁcation described in BBSL. This speciﬁcatio n

deﬁnes the c ases in wh ic h ADS should or should not

Figure 6: Examples of various positional relationships that

can be distinguished using the PROJ function.

stop, depending o n the position of a single vehicle.

The spe c iﬁca tion described in BBSL is divided into

three blocks, as shown in this speciﬁcation. The ﬁrst

block, external function block (lines 1-8 of Listing 1),

deﬁnes functions to receive values needed in advance

to write this speciﬁcation. For example, the speciﬁ-

cation provides a function to che c k for the presenc e

of a vehicle, a function to return the bounding box

surrounding the veh ic le , and a function to return an

interval representing the stopping distance.

The second block, the precondition block (lines

10-12 in Listing 1), is used to descr ibe the conditions

for ap plying the speciﬁcation . For example, it states

that the speciﬁcation is written on the assumption that

there will always be a vehicle of some kind.

The third block, case b lock (lines 14-19 and 21-25

in Listing 1), is the main part of this speciﬁcation . It

describes a case for e a ch reaction of the ADS and the

conditions un der which the system should react. This

means that in the list 1, th e speciﬁcation strictly de-

scribes the need to stop if a vehicle is before or over-

laps the stop pingDistance from the viewpoint of the

forward camera image, as in Figure 5.

Listing 1: Speciﬁcations of ADS response to t he distance to

the vehicle.

exfunction

//Judge the existence of the vehicle.

True if it exists.

vehicleExists():bool

//Calculate the bounding box that

surrounds the vehicle.

vehicle():bb

//Calculate the interval that represents

the range to be stopped.

stoppingDistance():interval

endexfunction

precondition

[vehicleExists() = true]

endprecondition

case stop

let vehicle : bb = vehicle(),

stoppingDistance : interval =

stoppingDistance() in

PROJ

(vehicle)

≈

stoppingDistance

Speciﬁcation Based Testing of Object Detection for Automated Driving Systems via BBSL

253

or PROJ

(vehicle)

stoppingDistance

endcase

case NOT stop

let vehicle : bb = vehicle(),

stoppingDistance : interval =

stoppingDistance() in

PROJ

(vehicle)

stoppingDistance

endcase

4 SPECIFICATION BASED

TESTING

As described in the previous section, speciﬁca-

tions fo r ADS written using BBSL are unambigu-

ous and rigorously described. Therefore, speciﬁca-

tions written in BBSL ca n be used to rigorously deﬁne

speciﬁcation-based tests for object detection pro c ess-

ing. .In this section, we make some p reparations and

deﬁne speciﬁcatio n-based tests.

The speciﬁcation written in BBSL is represents

for images with multiple labeled bound ingboxes, as

shown in Figure 7. First, th e set of images with such

labeled boundingboxes is sh own in Deﬁnition 11.

Figure 7: Examples of Image with multiple boundingboxes

labeled.

Deﬁnition 8. Let I be a set of images, BB be a set

of boundingboxes, and L be a set of labels such as

vehicle and ped estrian. The set of images I

with

multiple boun dingboxes with labels is deﬁned a s fol-

lows:

= I × 2

BB×L

Next, a speciﬁcation written in BBSL for u se in

testing is deﬁned in Deﬁnition 9.

Deﬁnition 9. A speciﬁcation written in BBSL is de-

ﬁned as a pair S = (C, f ).

• C is the set of cases. For example, in Listing 1,

C = {stop,NOT stop}.

• Let f be deﬁned by the fu nction f : I

− → 2

where I

− = {i

∈ I

| i

satisﬁes precondi-

tion con dition. }.

Hereafter, I

− of f in the speciﬁcation S = (C, f ) is

denoted as dom( f ).

For the pu rpose of preparing a test covering all

images or deﬁning a unique test, the three types of

properties on the speciﬁcation written in BBSL a re

deﬁned in Deﬁnitio n 10.

Deﬁnition 10. For a speciﬁcation S = (C , f ) written

in BBSL, the properties of the three types of speciﬁca-

tions are de ﬁned as follows:

S is an exhaustive speciﬁ c ation

⇔ ∀i ∈ dom( f ).( f (i) 6=

S is an exclusionary speciﬁcation

⇔ ∀i ∈ dom( f ),∃c ∈ C.

( f (i) = {c }or f (i) =

S is a non-red undant speciﬁcation

⇔ ∀c ∈ C, ∃i ∈ dom( f ).(c ∈ f (i))

Thus, the sp eciﬁcation of Listing 1 is an exhaus-

tive, exclusionary, and non-redundant speciﬁcation.

In this paper, unless otherwise sp eciﬁed, speciﬁcation

S written in BBSL is an exhaustive, exclusionary, and

non-redundant speciﬁcation .

Next, th e de ﬁnitions of the basic elements neces-

sary to de ﬁne a test are given in Deﬁn ition 11.

Deﬁnition 11. The test data T d is a subset of the set

of images I. The object d etection system to be tested

is exactly the DNN shown in Figure 1, wh ich takes

an image as input an d return s an image with inferred

labels as output. This is deﬁned by the function SU T :

I → I

. In addition, assume that the ground-truth

label is given by the function GT : I → I

Using the above deﬁnitions, a test case is deﬁned

in Deﬁnition 12.

Deﬁnition 12. Given an exhaustive, exclusionary,

and non- redundant speciﬁc ation S = (C, f ), test data

T d and ground-truth d ata GT , test ca se CASE is d e -

ﬁned as follows:

CASE = {(td, f (GT (td))) | td ∈ T d)}

In Deﬁnition 12., f (GT (td)) plays the role of a

speciﬁcation-based pseudo-oracle.

Finally, the decision conditions for the

speciﬁcation-based test ar e deﬁned in Deﬁnition 13.

Deﬁnition 13. Given an exhaustive, exclusionary,

and non-redundant speciﬁcation S = (C, f ), SUT

and a test case C ASE, the test decision condition

P : CASE → {T,F} is deﬁned for any case (td,c) ∈

CASE is deﬁned as follows:

P(td, c) =



T f (SUT(td)) = c

F othewise

The above deﬁnitions enabled spec iﬁcation-based

testing using speciﬁcations written in BBSL.

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

254

5 EXPERIMENT

In th is section, we actually prepare an object detec-

tion system for ADS, a gran d tuluth dataset with two-

dimensional bounding box, and a speciﬁcation writ-

ten in BBSL, an d compare the proposed test method

with the IoU method.

5.1 Preparations

First, we used the KITTI dataset (Geiger et al., 2013)

for T d

and T d

(T d

∩ T d

0) as the two types o f

test data set and GT as the grand truth label. These

are 349 and 1300 images from the forward camer a

of ADS, respec tively, as shown in Table 1. Eac h im-

age contains one or more vehicles, and the number

of vehicles in the dataset is 2736 and 5644, respec-

tively. The speciﬁcations of these grand tuluth la-

bels are given by enclosing each vehicle in a two-

dimensional bounding box.

Table 1: Test Data Details.

Name Number of images Number of vehicles

T d

349 2736

T d

1300 5644

Next, two ob je c t detection systems to b e tested are

prepare d as SUT

and SUT

, re spectively. Both object

detection algorithms used in these systems ar e ba sed

on Yolov3 (Redmon and Farhadi, 201 8), and the DNN

network used is d a rknet53. In our study, we prepared

two types of object detection systems by using

yolov3-kitti.weights(https://dr ive.google.com/ﬁle/d/

1BRJDDCMRXdQdQs6-x-3PmlzcEuT9wxJV/view)

for SUT

and yolov3.weights(https://github.com/

patrick013/Object-Detection---Yolov3/blob/master/

model/yolov3.weights) for SUT

from publicly

available weighting ﬁles, without training them

indepen dently.

In addition, we prepared four simple speciﬁca-

tions written in BBSL as S

, S

and S

on which to

base our tests. S

is Listing 1 alr eady described above

as an example, which deﬁnes the cases in which ADS

should or should not stop depending on the distance

of the vehicle in front. S

is shown in Listing 2. This

speciﬁcation d eﬁnes the cases in which ADS should

and should not stop dependin g on whether the target

vehicle encroaches into the linear distance of the own

vehicle or not.

Listing 2: Speciﬁcations of ADS response to t he position of

x-axis the vehicle.

exfunction

//Judge the existence of the vehicle.

True if it exists.

VehicleExists():bool

//Calculate the bounding box that

surrounds the vehicle.

Vehicle():bb

//Calculate the interval that represents

the range to be stopped.

directionAreaDistance():interval

endexfunction

precondition

[VehicleExists() = true]

endprecondition

case stop

let Vehicle : bb = lVehicle(),

directionAreaDistance : interval =

directionAreaDistance() in

PROJ

(Vehicle)

≈

directionAreaDistance

endcase

case NOT stop

let Vehicle : bb = Vehicle(),

directionAreaDistance : interval =

directionAreaDistance() in

not(PROJ

(Vehicle)

≈

directionAreaDistance)

endcase

is shown in Listing 3. This spe c iﬁca tion com-

bines S

and S

, and speciﬁes that it is a case of Stop

if the distance betwe e n vehicles is close and the vehi-

cle has entered the travel direction, and a case of Not

Stop otherwise.

Listing 3: Speciﬁcations with two cases combining S

and

exfunction

vehicleExists():bool

vehicle():bb

directionAreaDistance():interval

stoppingDistance():interval

endexfunction

precondition

[vehicleExists() = true]

endprecondition

case stop

let vehicle : bb = vehicle(),

directionAreaDistance : interval =

directionAreaDistance(),

stoppingDistance : interval =

stoppingDistance() in

PROJ

(vehicle)

≈

directionAreaDistance

and PROJ

(vehicle)

≈

stoppingDistance

endcase

case NOT stop

let vehicle : bb = vehicle(),

directionAreaDistance : interval =

directionAreaDistance() in

not(PROJ

(vehicle)

≈

directionAreaDistance) or

not (PROJ

(vehicle)

≈

stoppingDistance)

endcase

Finally, S

is shown in Listing 4. Like S

, this

speciﬁcation is a combinatio n of S

and S

. The con-

dition of x

yStop is the same as that of S

, but the

Speciﬁcation Based Testing of Object Detection for Automated Driving Systems via BBSL

255

condition of No t Stop is divided into ysafe xwarning,

xsafe ywarning and NOT warning in more detail de-

pending on the relationship between the y and x coo r-

dinates on the image.

Listing 4: Speciﬁcations with four cases combining S

and

exfunction

vehicleExists():bool

vehicle():bb

directionAreaDistance():interval

stoppingDistance():interval

endexfunction

precondition

[vehicleExists() = true]

endprecondition

case x_ystop

let vehicle : bb = vehicle(),

directionAreaDistance : interval =

directionAreaDistance(),

stoppingDistance : interval =

stoppingDistance() in

PROJ

(vehicle)

≈

directionAreaDistance

and PROJ

(vehicle)

≈

stoppingDistance

endcase

case ysafe_xwarning

let directionAreaDistance : interval =

directionAreaDistance(),

stoppingDistance : interval =

stoppingDistance() in

PROJ

(vehicle)

≈

directionAreaDistance

and not(PROJ

(vehicle)

≈

stoppingDistance)

endcase

case xsafe_ywarning

let directionAreaDistance : interval =

directionAreaDistance(),

stoppingDistance : interval =

stoppingDistance() in

not(PROJ

(vehicle)

≈

directionAreaDistance)

and PROJ

(vehicle)

≈

stoppingDistance

endcase

case NOT warning

let vehicle : bb = vehicle(),

directionAreaDistance : interval =

directionAreaDistance() in

not(PROJ

(vehicle)

≈

directionAreaDistance) and

not (PROJ

(vehicle)

≈

stoppingDistance)

endcase

Each of these spe c iﬁca tions is interpreted by im-

plementing the conditions in Python using BBSL se-

mantics. Since all of the speciﬁcations described here

describe conditions for a single vehicle object, the test

is interpreted f or on ly one vehicle for each image with

multiple veh ic le s in it. Therefore, the number of test

cases is 2736,5644, which is the number of vehicle

objects in T d

and T d

, resp ectively.

The implementation of each exfunction was given

as a constant. T he values are those of the coordi-

nates with th e upper left corner as 0 in all images

(size is 1242 × 375), and the values are shown in Ta-

ble 2. In Table 2, sD() in the Given Exfunction col-

umn stands for stoppingDistance() and dA() for direc-

tionAreaDistance.

Table 2: Details of all tests.

Test

Given

Exfunctions

Td SUT

ID1 S

sD()

= [275, 375]

T d

SU T

ID2 S

sD()

= [275, 375]

T d

SU T

ID3 S

sD()

= [250, 375]

T d

SU T

ID4 S

sD()

= [300, 375]

T d

SU T

ID5 S

dA()

= [420, 821]

T d

SU T

ID6 S

sD()

= [275, 375]

dA()

= [420, 821]

T d

SU T

ID7 S

sD()

= [275, 375]

dA()

= [420, 821]

T d

SU T

ID8 S

sD()

= [275, 375]

dA()

= [420, 821]

T d

SU T

ID9 S

sD()

= [275, 375]

dA()

= [420, 821]

T d

SU T

ID10 S

sD()

= [275, 375]

T d

SU T

ID11 S

dA()

= [420, 821]

T d

SU T

5.2 Evaluation

The ju dgment results of the propo sed test and the

judgment results with IoU

0.6

and IoU

0.8

on 11 differ-

ent tests are shown in Table 3. It can be seen that,

unlike the IoU ca lc ulated only from the test data an d

SUT, the proposed meth od changes its judgment de-

pending on the given speciﬁcation. In addition, even

though the test data sets T d

and T d

were not pre-

pared artiﬁcially, it is clear that the re is a large dis-

crepancy between the I oU test an d th e proposed test.

Furthermore, for test ID1, we measured the judg-

ment result in the case of IoU

0.6

and the judgment

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

256

Table 3: Tests results.

Test ID

IoU

0.6

(T /T + F)

IoU

0.8

(T /T + F)

suggestion test

(T /T + F)

ID1 2180/(2180 + 556) = 79.7% 1432/(1432 + 1304) = 52.3% 2527/(2527 + 209) = 92.4%

ID2 1221/(1221 + 1515) = 44.6% 791/(791 + 1945) = 28.9% 1929/(1929 + 807) = 70.5%

ID3 2180/(2180 + 556) = 79.7% 1432/(1432 + 1304) = 52.3% 2500/(2500 + 236) = 91.4%

ID4 2180/(2180 + 556) = 79.7% 1432/(1432 + 1304) = 52.3% 2524/(2524 + 212) = 92.3%

ID5 2180/(2180 + 556) = 79.7% 1432/(1432 + 1304) = 52.3% 2591/(2591 + 145) = 94.7%

ID6 2180/(2180 + 556) = 79.7% 1432/(1432 + 1304) = 52.3% 2604/(2604 + 132) = 95.2%

ID7 2180/(2180 + 556) = 79.7% 1432/(1432 + 1304) = 52.3% 2502/(2502 + 234) = 91.4%

ID8 1221/(1221 + 1515) = 44.6% 791/(791 + 1945) = 28.9% 2065/(2065 + 671) = 75.5%

ID9 1221/(1221 + 1515) = 44.6% 791/(791 + 1945) = 28.9% 1867/(1867 + 869) = 68.2%

ID10 4842/(4842 + 802) = 85.8% 3182/(3182 + 2462) = 56.4% 5299/(5299 + 345) = 93.9%

ID11 4842/(4842 + 802) = 85.8% 3182/(3182 + 2462) = 56.4% 5314/(5314 + 330) = 94.2%

result in the case of the proposed test for each case

of expected value as shown in Figure 8. The aggre-

gate re sults are shown in Ta ble 4. The numbers to

the r ight of



shown in Table 4 are the number

of applicable test cases, corresp onding to



Figure 8, respectively. In Ta ble 4 and Figure 8, BBSL

refers to the proposed test, and expe ctation is c in this

test case (td,c). As can be seen fr om the results, al-

though the number of test cases is biased, th ere are

test cases that corre spond to all of them. In particular,

the existence of test cases corresponding to



and



indicates that there are cases in which the pro-

posed meth od makes decisio ns that differ fr om those

of IoU. In additio n, the existence of test cases corre-

sponding to



and



indicates that the proposed test

detects malfunctions such as not being able to stop

when ADS sould be stop based on the speciﬁcation.

The above ind icates that th e proposed method detects

the test cases where the IoU is inadequate if these data

are valid.

Figure 8: How to classify test cases in Test ID1.

6 DISCUSSION

As shown in Table 3, the propo sed test can be per-

formed on various test da ta sets as long as they are

ground truth d ata with a two-dimensional bound-

ing box . Since many test data for object de tec-

tion systems have two-dimensional bound ing boxes

as groun d truth data, the proposed test can be per-

formed on many existing data sets. In addition, the

proposed method determines the tolerance for mis-

alignment using an algorithm that is clearly different

from IoU in that it d etermines whether a test case is

acceptable or not based on the speciﬁcations written

in the prepare d BBSL.

Furthermore, we focus on each of the data in Ta-

ble 4 and d iscuss its effectiveness based on each ex-

ample. First, we discussed test case to



on Figure 9,

in other words, a case where the expected value is Not

Stop in the speciﬁcation, and where both the proposed

test and IoU

0.6

are judged as T. The white vehicle in

Figure 9 is the re levant test case, and the bounding

box of the inferred r e sult is indicated. Since this white

vehicle has a sufﬁcient distance from its own vehicle,

the expe cted value is ”Not Stop” based on th e speciﬁ-

cation of Listing 1 . The bounding box in the inferred

result is determined to be T with IoU

0.6

because it is

detected with almost no deviation. In such a case,

there is clearly no defect in the speciﬁcation of ADS,

and the proposed test is judged to be T, which is rea-

sonable.

Figure 9: Example of a test case corresponding to



Figure 8.

Next, we will discuss the test case classiﬁed as



in other words, an environment where th e expected

value is Stop in the speciﬁcation, and Figure 10 is an

example where the proposed te st and IoU

0.6

are both

determined to be T. The red vehicle in Figure 10 is

the relevant test case, and the bounding box of the in-

ferred result is shown. Since this vehicle is very close

to its own vehicle, the expected value is Stop based

on the speciﬁcation of L isting 1. The bounding box

in the inferred result has some visual deviation, but

Speciﬁcation Based Testing of Object Detection for Automated Driving Systems via BBSL

257

Table 4: Test case classiﬁcation results for test ID1.

expectation IoU

0.6

:T, BBSL:T IoU

0.6

:F, BBSL:T IoU

0.6

:T, BBSL:F IoU

0.6

:F, BBSL:F

Not stop



1744



345



stop



404



103

because it is a large object in the image, Io U is calcu-

lated to be high, and it is judged to be T with IoU

0.6

Since the direction of the misalignment is in the lat-

eral direction and the distanc e recog nition is correct,

there is no defect in the spec iﬁca tion of ADS, and the

proposed test is also judged to be T, which is reason-

able.

Figure 10: Example of a test case corresponding to



Figure 8.

Next, we will discuss the test case classiﬁed as



in other words, in other words, an environment with

an expected value of Not stop in the speciﬁcation,

which is judged as T in th e proposed test and F in the

IoU

0.6

test. The vehicle surrounded by a red bound-

ing box of grand-truth label hidden by a white vehicle

and a building in the upper image of Figure 11 is the

correspo nding test case, and the bound ing box of the

inferred result is shown in the lower image. Since

this vehicle is sufﬁciently far from own vehicle, the

expected value is Not stop based on the speciﬁcation

of Listing 1. Since the object d e te ction sy stem unde r

test cann ot recognize the vehicle in the back and only

detects the white car in the front, the IoU is very low

and is determined to be F with IoU

0.6

. However, if

the white vehicle in the front can be correctly re cog-

nized and judged to have a sufﬁcient distance, there

is no defect in the speciﬁcation base of ADS, and the

proposed test is judged to be T, which is reasonab le .

Next, we will discuss the test case classiﬁed as



, in other words, an environment with an expected

value of Stop in the speciﬁcation, w hich is judged as

T in the proposed test and F in rhw IoU

0.6

test. The

black vehicle surrounde d by a red bounding box in

the image in Figure 12 is the relevant test case, and

the inferred result is indicated by the purple bound-

ing box. Since this vehicle is very close to the own

vehicle, the expected value is Stop based on the spec-

iﬁcation of Listing 1. Since only the ﬁrst pa rt of the

vehicle is shown in the image, the object detection

system recognizes the vehicle as smaller than the ac-

tual size of the vehivle indicated by the red bounding

box, and the IoU is very low and is determined to be F

Figure 11: Example of a test case corresponding to



Figure 8.

at IoU

0.6

. However, most of the discrepancy is in the

height of the vehicle, and there is almost no discrep-

ancy in the re cognition of the distance from own ve-

hicle. Therefore, there is no defect in the speciﬁcation

base of ADS, and the proposed test is also determined

to be T, whic h is reasonab le.

Figure 12: Example of a test case corresponding to



Figure 8.

Next, we will discuss the test case classiﬁed as



, in other words, in other words, an environment

where the expected value is Not stop in the speciﬁca-

tion a nd the proposed test d etermines F and IoU

0.6

be T. The red vehicle in the image in Figure 13 is the

correspo nding test case, and the inferred result is indi-

cated by the purple bounding box . The red vehicle in

the image in Figu re 13 is the test case, and the inferred

result is indicated by the purple bounding b ox. Since

this vehicle is far away fr om the own vehicle, the ex-

pected value is ”Not stop” based on the speciﬁcation

of Listing 1. And since this inferred re sult shows a

slight downward of the misalignment but almost no

upward or lateral of misalignment, the IoU is not low

and is determined to be T at IoU

0.6

. H owever, because

of the downward of the misalignment, there is a possi-

bility that the vehicle may stop in a situation where it

is not necessary to sto p due to the misalignment with

respect to the recognition o f the distance between the

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

258

vehicles. Therefore, this is a fault based on the spec-

iﬁcation that reduces the reliability of ADS, and the

proposed test also judg e s it a s F, which is reasonable.

Unlike the case of judging by I oU, the proposed

test can reﬂect the dir ection of misalignm e nt in the

judgment criteria by using the speciﬁcation written in

BBSL. Therefore, as in the red vehicle in Figure 10,

eve n if the degree of misalignment of the inferred re-

sult is large, it can be judged to be True on the spec-

iﬁcation basis. Conversely, even when the degree of

misalignment of the inferred result is small, as in the

case of the black vehicle in Figure 1 3, it can be deter-

mined to be false on the speciﬁcation basis. This is an

important feature of the proposed test.

Figure 13: Example of a test case corresponding to



Figure 8.

Next, we will discuss the test case classiﬁed as



in other words, an environment wh ere the expected

value is Stop in the speciﬁcation, and the proposed

test determines F and IoU

0.6

to b e T. The vehicle sur-

rounded by the red bounding box on the uppe r image

in Figure 14 is the corresponding test case, and the

bounding box of the detection result is shown on the

lower image. Since this vehicle is close to the own ve-

hicle, the expected value is Sto p based on the speciﬁ-

cation of Listing 1. The bounding box in the inferred

result is determined to be T with IoU

0.6

because it is

detected with almost no deviation. However, since th e

subtle misalignment of the distance between vehicles

crosses the border of the stop condition in the speci-

ﬁcation, it is a defect based on the speciﬁcation that

reduces the saf ety of the autom atic driving system,

and is judged as F in the proposed test, which is rea-

sonable. Thus, the proposed test strictly determines T

or F for test cases around the bou ndary of case co ndi-

tion described in the speciﬁcation. Since the bound-

ary of case condition on the speciﬁcation is a par t that

should be func tionally tested especially carefully, it is

an important feature of the proposed test that this part

is rigorously tested.

Next, we will discuss the test case classiﬁed as



in other words, an environment wh ere the expected

value is Not Stop in the spec iﬁca tion, and wh ere the

proposed test and IoU

0.6

are both judg e d as F. The red

vehicle partially hidden by a building in th e image in

Figure 15 is a relevant test c ase. This vehicle has a

Figure 14: Example of a test case corresponding to



Figure 8.

sufﬁcient distanc e from the vehicle, so the exp ected

value is Not stop based on the speciﬁcation of List-

ing 1. Since the object detection system under test

cannot recognize this red vehicle at all and there is

no overlap with the inferred r esult of the white vehi-

cle before it, IoU is 0 a nd IoU

0.6

is determined to be

F. This is because Listing 1 is sufﬁciently far from

the vehicle. This is an exam ple where the inference

result does not satisfy the conditions of the precon-

dition bloc k of Listing 1, while the ground truth data

satisfy Not stop.

Figure 15: Example of a test case corresponding to



Figure 8.

Finally, we will discuss the test case classiﬁed as



, in other words, Figure 16, which is an environ-

ment where the expected value is Stop in the speciﬁ-

cation, and where both the proposed test and IoU

0.6

determine the value to be F. The vehicle on th e far left

in Figure 16 is the relevant test case, and since this

vehicle is very close to its own vehicle, it is a n ex-

pected value Stop based on the speciﬁcation of List-

ing 1. And since th e ob je ct detection system under

test doe s not recognize this vehicle at a ll, IoU is 0 and

IoU

0.6

is determined to be F. This is an example of

a situa tion in which Listing 1 speciﬁcation says Stop

for ground truth data , but the car does not exist as a

inferred result, and the precondition block conditio n

of Lisitng 1 speciﬁcation is no longer satisﬁed.

The examples shown in Figures 15 and 16 are

both examples of c ases where the inf erred results fall

outside scope of the speciﬁcation. In th e ﬁrst exam-

ple, the white vehicle before the vehicle is correctly

recogn ized and judged to have a sufﬁcient distance,

Speciﬁcation Based Testing of Object Detection for Automated Driving Systems via BBSL

259

so there is no defect, and it is reasonable to judge the

car to be T in the proposed test. However, in the sec-

ond example, the system is not able to detect a vihicle

at a position where it should stop due to a sho rt dis-

tance between own vehicle and the vehicle, and this

is a defect that reduces the saf e ty of ADS. Thus, there

are cases in which the inferred result is outside the

scope of the speciﬁcation even if the image is su b-

ject to the speciﬁcation and the expe cted value can

be deﬁned, and it was not possible to clarify how to

give aHEREHEREHEREaccurate judgment to th ese

cases. Therefore, the pr oposed test gives priority to

safety and deﬁnes both cases to be judged as F.

Figure 16: Example of a test case corresponding to



Figure 8.

Based on the above discussion, the proposed test

returns a valid decision result as a test of safety and

reliability based on the speciﬁcation. This is clearly

different from th e test method used to evaluate the

performance of object detectio n systems su ch as IoU.

Furthermore, it is an important test when incorporat-

ing an object detection system into a large piece of

software that requires high reliability and high safety,

such as an ADS.

7 CONCLUSIONS

By using the proposed test method, the object detec-

tion system of an ADS can be tested based on the

speciﬁcations. Since the test is based on the degree

to wh ic h th e object detection system under test meets

the speciﬁcation when it is incorporated into an ADS

with the relevant speciﬁcation, the test is able to detect

cases of impair safety or reliability defects that are not

detected by conventional testing methods. For th ese

reasons, our test is an important and innovative test

for incorpora ting object detection systems into com-

plex and safety critical software such as ADS.

Finally, we show three future works. The ﬁrst is to

formally verify spe c iﬁca tions written in BBSL o n the-

orem proving. Since BBSL has not yet been formal-

ized in a th eorem proving system, and no pa rser has

been prepared, this study was programmed in python

so that the implementation would be equivalent to the

speciﬁcation used in the experiments.This work is im-

portant for testing in larger, mor e realistic environ-

ments and will contribute to the development of real-

time monito ring tools for object detection systems.

The second is to extend speciﬁcation-based testing

with mo re complex ADS speciﬁcations described in

BBSL.The tests exper imented with in our study used

only a simple speciﬁcation for the relatio nship be-

tween a single object in the image and the own vihi-

cle. However, the description capability of BBSL dis-

cussed in this paper is only part of th e picture, and in

practice it can describe the positional relationships of

multiple objects and objects of complex shapes. We

think that testing extensions to handle these speciﬁca-

tions will contr ibute to the d evelopment of even more

secure ADS. T he third is to propose and evaluate cov-

erage that correlates to the quality of the speciﬁcation-

based tests proposed in our study. It is not known how

many and what kind of test cases are needed to suf-

ﬁciently test the speciﬁcation-based test proposed in

our study. To incre ase the utility o f this test, we be-

lieve it is necessary to pro pose validity index for test,

for example, coverage on the position on the image

and coverage on the conditions of the speciﬁcation

written in BBSL.

REFERENCES

Balakrishnan, A., Deshmukh, J., Hoxha, B ., Yamaguchi, T.,

and Fainekos, G. (2021). Percemon: Online monitor-

ing for perception systems. CoRR, abs/2108.08289.

Chen, T. Y., Cheung, S. C ., and Yiu, S. (2020). Metamor-

phic testing: A new approach for generating next test

cases. C oRR, abs/2002.12543.

Committee, O.-R. A. D. O. (2021). Taxonomy and Deﬁ-

nitions for Terms Related to Driving Automation Sys-

tems for On-Road Motor Vehicles.

Devi, S., Malarvezhi, P., Dayana, R., and Vadivukkarasi, K.

(2020). A comprehensive survey on autonomous driv-

ing cars: A perspective view. Wirel. Pers. Commun.,

114(3):2121–2133.

Dokhanchi, A., Amor, H. B., Deshmukh, J. V., and

Fainekos, G. (2018). Evaluating perception systems

for autonomous vehicles using quality temporal logic.

In Colombo, C. and Leucker, M., editors, Runtime

Veriﬁcation, pages 409–416, Cham. Springer Interna-

tional Publishing.

Everingham, M. and Winn, J. (2012). The pascal visual

object classes challenge 2012 (voc2012) development

kit. Pattern Anal. Stat. Model. Comput. Learn., Tech.

Rep, 2007:1–45.

Geiger, A., Lenz, P., Stiller, C., and Ur tasun, R. (2013).

Vision meets robotics: The kitti dataset. The Inter-

national Journal of Robotics Research, 32(11):1231–

1237.

Kondermann, D., Nair, R., Honauer, K., Krispin, K., An-

drulis, J., Brock, A., G¨ussefeld, B., Rahimimoghad-

dam, M., Hofmann, S., Brenner, C., and J¨ahne, B.

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

260

(2016). The hci benchmark suite: Stereo and ﬂow

ground truth with uncertainties for urban autonomous

driving. In 2016 IEEE Conference on Computer Vi-

sion and Pattern Recognition Workshops (CVPRW),

pages 19–28.

Moore, R. E., Kearfott, R. B., and Cl oud, M. J. (2009). In-

troduction to Interval Analysis. Society for Industrial

and Applied Mathematics.

Ramanagopal, M. S., Anderson, C., Vasudevan, R., and

Johnson-Roberson, M. (2018). Failing to learn: Au-

tonomously identifying perception failures for self-

driving cars. IEEE Robotics and Automation Letters,

3(4):3860–3867.

Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental

improvement.

Shao, J. (2021). Testing object detection for autonomous

driving systems via 3d reconstruction. In 2021

IEEE/ACM 43rd International Conference on Soft-

ware Engineering: C ompanion Proceedings (ICSE-

Companion), pages 117–119.

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Pat-

naik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine,

B., Vasudevan, V., Han, W., Ngiam, J., Zhao, H., Tim-

ofeev, A., Ettinger, S., Krivokon, M., Gao, A., Joshi,

A., Zhang, Y., Shlens, J., Chen, Z., and Anguelov, D.

(2019). Scalability in perception for autonomous driv-

ing: Waymo open dataset. CoRR, abs/1912.04838.

Tanaka, K., Aoki, T., Kawai, T., Tomita, T., Kawakami, D.,

and Chida, N. (2022). A formal speciﬁcation language

based on positional relationship between objects in

automated driving systems. In 2022 IEEE 46th An-

nual Computers, Software, and Applications Confer-

ence (COMPSAC), pages 950–955.

Thorn, E., Kimmel, S. C., and Chaka, M. (2018). A frame-

work for automated driving system t estable cases and

scenarios.

Zhou, Z. Q. and Sun, L. (2019). Metamorphic testing of

driverless cars. Commun. ACM, 62(3):61–67.

Speciﬁcation Based Testing of Object Detection for Automated Driving Systems via BBSL

261