Subsign Detection and Classiﬁcation System for Automated Trafﬁc-sign

Inventory Systems

Lykele Hazelhoff

1,2

, Ron op het Veld

, Ivo Creusen

1,2

and Peter H. N. de With

1,2

CycloMedia Technology B.V, Zaltbommel, The Netherlands

Eindhoven University of Technology, Eindhoven, The Netherlands

Keywords:

Object Detection, Trafﬁc Sign Recognition, Object Classiﬁcation.

Abstract:

Road safety is inﬂuenced by the accurate placement and visibility of road signs, which are maintained based on

inventories of trafﬁc signs. These inventories are created (semi-)automatically from street-level images, based

on object detection and classiﬁcation. These systems often neglect the present complimentary signs (subsigns),

although clearly important for the meaning and validity of signs. This paper presents a generic, learning-based

approach for both detection and classiﬁcation of subsigns, which is based on the same principles as the system

employed for ﬁnding trafﬁc signs and can be used as an extension to automated inventory systems. The

system starts with detection of subsigns in a region below each detected sign, followed by analysis of the

results obtained for all capturings of the same sign. When a subsign is found, the corresponding pixel regions

are extracted and subject to classiﬁcation. This recognition system is evaluated on 3, 104 signs (397 with

subsign) identiﬁed by an existing inventory system. At a detection rate of 98%, only 757 signs (24.4% of the

signs) are labeled as containing a subsign, while 91.4% of the subsigns of a class known to our classiﬁer are

also classiﬁed correctly.

1 INTRODUCTION

Road safety is strongly inﬂuenced by the correct

placement and accurate visibility of trafﬁc signs to

e.g. warn road users for upcoming dangerous situa-

tions or inform them about speed limits or other re-

strictions. The validity and legal meaning of these

road signs are commonly affected by co-attached

complimentary signs, which contain e.g. directions,

time restrictions or arbitrary texts. Figure 1 displays

examples of such complimentary signs. As the visi-

bility of road signs degrades over time due to aging,

accidents or vandalism, accurate inventories of road

signs are of signiﬁcant interest to governmental in-

stances and subcontractors tasked with road mainte-

nance. Moreover, these inventories are applicable to

driver assistance systems or autonomous vehicles.

These inventories can be generated manually,

tracking all roads, but efﬁciency can be improved by

employing street-level images, together with object

detection and classiﬁcation techniques to retrieve the

trafﬁc-sign positions and types. Multiple of such sys-

tems exists , i.e.(Hazelhoff et al., 2012),(Maldonado-

Bascon et al., 2007),(Maldonado-Bascon et al.,

2008),(Overett and Petersson, 2011),(Timofte et al.,

2009),(Timofte et al., 2011). These generally start

by processing every image to detect the present road

signs, which are then often tracked over multiple con-

secutive frames. Classiﬁcation of the signs is usu-

ally performed either directly after detection, or after

tracking. Recognition scores over 90% are reported,

where some systems include over 90 different sign

types.

The reported systems focus primarily on the

recognition of road signs, but commonly ignore the

presence of complimentary signs. However, these

subsigns are of great importance during analysis

of the inventory results, and are therefore typically

added manually. This is a time-consuming process,

especially since subsigns only occur for the minor-

(a) (b) (c) (d)

Figure 1: Example of a complimentary signs altering the

meaning of road signs.

262

Hazelhoff L., op het Veld R., Creusen I. and H. N. de With P..

Subsign Detection and Classiﬁcation System for Automated Trafﬁc-sign Inventory Systems.

DOI: 10.5220/0004654402620268

In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 262-268

ISBN: 978-989-758-004-8

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

ity of signs. Extension of these automated trafﬁc-

sign inventory systems with a subsign recognition

module would increase the inventory generation efﬁ-

ciency, and would decrease the required manual inter-

action. However, subsign detection and recognition is

rather difﬁcult, as the subsign contents vary greatly

and sometimes contain arbitrary texts and/or custom

symbols. Furthermore, these signs are smaller than

normal road signs and consist of less discriminative

colors, as they are usually white. Also, the captur-

ing quality and conditions are varying, as the captur-

ings are made outdoors during different weather con-

ditions and from driving vehicles, typically with large

inter-capturing distances at higher driving speeds.

In literature, few publications report on the recog-

nition of subsigns. In (Hamdoun et al., 2008), rectan-

gle detection is employed in a region below road signs

to retrieve present supplementary signs, followed by

a classiﬁcation stage to solely retrieve exit-lane sub-

signs. Rectangle detection is also exploited in (Nien-

huser et al., 2010), where found rectangles are classi-

ﬁed using a two-stage cascade, aiming at discrimina-

tion between both subsign and non-subsign rectangles

and between 4 different subsign types. In (Puthon

et al., 2012), a region growing approach is described

and compared against several other techniques, where

the proposed method achieves a correct detection rate

over 70%.

This paper describes a generic and learning-based

approach for both detection and classiﬁcation of sub-

signs. The work forms an extension to our exist-

ing trafﬁc-sign inventory system (Hazelhoff et al.,

2012), but due to the generic nature of our algorithm,

it is also applicable to other, similar systems such

as (Maldonado-Bascon et al., 2007), (Maldonado-

Bascon et al., 2008), (Overett and Petersson, 2011),

(Timofte et al., 2009) and (Timofte et al., 2011)). In-

stead of treating the complimentary signs as an addi-

tional sign class, and thereby searching the complete

image for complimentary signs, the output of existing

road-sign inventory systems, like any of the above-

mentioned systems, is exploited. This narrows the

search area to the regions below the identiﬁed traf-

ﬁc signs, which increases robustness, since objects

with a similar appearance compared to subsigns (i.e.

white rectangles) occur frequently in real-world situ-

ations. The subsign recognition system exploits both

the single-image detections and tracked detections (in

this paper referred to as 3D signs) given by existing

inventory systems.

The system starts by detection of subsigns in a

ﬁxed region below each of the detected signs, as

the vast majority of subsigns are located below road

signs. Afterwards, the subsign detection results ob-

(a) (b) (c) (d) (e)

Figure 2: Example of a 3D sign, consisting of multiple de-

tections of the same trafﬁc sign tracked over multiple con-

secutive capturings.

tained for each detection of a 3D sign, are combined

to improve robustness. When a subsign is found, the

corresponding pixel regions are extracted, which are

then subject to classiﬁcation, to retrieve either the

subsign type or a subsign-with-text code. This system

is evaluated on a large, real-world dataset containing

3, 104 signs (397 signs with subsign), with 29 differ-

ent subsign types for classiﬁcation. It will be shown

that subsign detection is indeed possible with reason-

able performance, even with a generic concept.

The remainder of this paper is organized as fol-

lows. Section 2 contains the system overview of our

subsign detection and classiﬁcation system, which is

described in detail in Sect. 3. The performed exper-

iments and results are found in Sect. 4, followed by

the conclusions in Sect. 5.

2 SYSTEM OVERVIEW

The system for automatic recognition of subsigns de-

scribed in this paper operates on 3D signs detected

by our trafﬁc-sign inventory system (Hazelhoff et al.,

2012). These 3D signs consist of multiple detections

of the same road sign, tracked over consecutive image

frames. An example of an input 3D sign is shown in

Fig. 2. The system overview of the subsign recogni-

tion system is depicted in Fig. 3, and the four primary

modules are brieﬂy described below.

1. Single-image detection: The region below each

detection given by the inventory system is divided

in overlapping windows. Each window is de-

scribed based on densely extracted SIFT descrip-

tors, which are subject to classiﬁcation with a lin-

ear Support Vector Machine (SVM). The maxi-

mum SVM output of one out of all windows is

returned for each analyzed detection.

2. Multiview Detection: The single-image detection

results are combined to determine the presence of

a subsign for each 3D sign.

3. Subsign Localization: When a subsign is found

for the 3D sign, the pixel region corresponding to

the subsign, is retrieved for each detection with a

positive SVM output during the single-image de-

tection stage.

SubsignDetectionandClassificationSystemforAutomatedTraffic-signInventorySystems

263

Single-image

detection

Single-image

detection

Single-image

detection

Single-image

detection

Subsign

localization

Multiview

detection

Subsign

classification

Subsign

localization

Subsign

localization

Subsign

localization

Subsign

yes/no

ob108

3.4

1.7

0.2

–2.1

Figure 3: System overview of the subsign recognition system.

4. Subsign Classiﬁcation: The identiﬁed subsign re-

gions are subject to classiﬁcation. Afterwards, the

subsign type is retrieved based on weighted vot-

ing.

3 ALGORITHM DESCRIPTION

3.1 Single-image Subsign Detection

At ﬁrst, each detection of the input 3D sign is ana-

lyzed independently, where a measure for the possi-

ble presence of a complimentary sign is computed.

This stage starts by extraction of the region below the

detection, where we employ a region height of twice

the detection height, as subsigns are not necessarily

located directly below the road signs. Within this re-

gion, a sliding-window approach is employed, with

a windows size of 80% of the detection width and

30% of the detection height, which corresponds to

the typical subsign size. Each extracted window is

resized to a standard size of 120 × 45 pixels, and af-

terwards, SIFT descriptors (Lowe, 2004) are extracted

from a dense grid at a single scale. These descriptors

are selected since they are very robust against varia-

tions in lighting conditions and small object deforma-

tions, e.g. caused by subsign rotation and skewness,

which occur commonly. After concatenation of these

descriptors, the resulting feature vector is subject to

classiﬁcation using a linear SVM. This classiﬁer is

trained on a large training set containing over 30, 000

subsigns and non-subsigns windows. After all win-

dows are analyzed, the maximum SVM output for the

subsign class over all evaluated windows is selected

as subsign presence indicator.

3.2 Multiview Subsign Detection

Trafﬁc-sign inventory systems commonly identify the

same road signs in multiple images, where the corre-

sponding detections are tracked over the subsequent

image frames. As all different detections contain the

same sign, but are captured from different distances

and from various viewpoints, the robustness of the

subsign detection stage is improved by combining the

subsign detection scores obtained for each of the in-

dividual detections. As complimentary signs may not

be visible from all different viewpoints, and may be

visible more clearly from a close distance, we com-

pare three different combination methods against each

other. The ﬁrst method averages the individual detec-

tion scores, the second selects the best score (i.e. the

most likely subsign detection) and the third employs

the median score. These methods are deﬁned as:

• Average:

3D,av

= mean(d

det 1

, d

det 2

, ..., d

detN

• Max:

3D,ma

= max(d

det 1

, d

det 2

, ..., d

detN

• Median:

3D,me

= median(d

det 1

, d

det 2

, ..., d

detN

In these formulas, d

denotes the corresponding de-

tection score for the 3D trafﬁc sign, and d

det i

de-

notes the detection score for the i-th detection, with

i ∈ (1, ..., N) and N being the total amount of detec-

tions of the respective 3D sign.

3.3 Subsign Localization

When the presence of a subsign is indicated for a 3D

sign, the subsign pixel region is identiﬁed for all de-

tections for which the single-image detection phase

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

264

results in a positive SVM output for the subsign class.

During detection, the window with the highest SVM

output is selected, whereas in this stage, this window

is extended by all adjacent windows which also have

a positive SVM output. This involves iterative selec-

tion of all windows which are overlapped by the cur-

rent subsign region for at least 40%, where initially

the window with the highest SVM output forms the

current subsign region. Figure 4 portrays several ex-

amples of obtained subsign regions.

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

(q) (r) (s) (t)

(u) (v) (w) (x)

Figure 4: Examples of localized subsigns. This ﬁgure also

displays examples of subsign classes contained in the sets

employed for training of the classiﬁcation stage.

3.4 Subsign Classiﬁcation

Each localized subsign region is classiﬁed, where

a standard object recognition method is reused for

the new purpose, which essentially operates as fol-

lows. Each of the retrieved subsign regions is ﬁrst

resized to a predeﬁned size, after which SIFT de-

scriptors (Lowe, 2004) are extracted from a dense

grid. These descriptors are then concatenated, and

the resulting feature vector is L2 normalized after-

wards. Then, multi-class classiﬁcation is performed

using linear SVMs in a One-versus-All setup. Subse-

quently, the classiﬁcation result for the 3D sign is re-

trieved based on weighted voting, where the weight is

deﬁned as the difference in SVM outputs of the win-

ning class and second class.

This classiﬁcation system is trained on the detec-

tion output from over 100, 000 3D signs, which are

manually annotated. During this process, non-ideal

detections are removed, as these may inﬂuence the

classiﬁer training in a negative way. This results in

29 different subsign classes, where frequently occur-

ring texts are included separately, while less frequent

texts are covered by a general subsign-with-text class

(in practice, the text itself is unconstrained). As non-

subsign detections may occur, we have also included

a non-subsign class. Figure 4 displays the majority of

the subsign categories within our training set.

4 EXPERIMENTS AND RESULTS

4.1 Subsign Detection Dataset

Description

The above-described system is evaluated on a large

and diverse test set, not overlapping with any of the

used training sets. This set is constructed by ap-

plying our road sign recognition system (Hazelhoff

et al., 2012) on a moderate-size geographical area,

containing rural roads, smaller towns and a part of

a city environment. Within this region, street-level

panoramic images are captured on each public road,

using an inter-capturing interval of 5 meter. This set

contains 40, 128 images (about 200 km of road) in to-

tal. As these images are captured from driving ve-

hicles under various weather and lighting conditions,

cover various environments and represent a geograph-

ical area, we consider this test set as a representative

and real-world dataset.

Within this region, 3, 104 3D signs are identiﬁed

by our trafﬁc-sign inventory system (Hazelhoff et al.,

2012), where each sign is detected in about 5 differ-

ent images. These signs are employed to assess the

detection and classiﬁcation performance of our sub-

sign recognition approach. From the 3, 104 signs, 397

(about 12.8%) have a subsign, where we should note

that although this may look like a low percentage, this

corresponds to the real-world occurrence of subsigns.

Furthermore, we should note that the signs are cap-

tured from various viewpoints and distances (and thus

with various resolutions), and are captured under var-

ious lighting and weather conditions. For each of the

3D signs, both the presence of a subsign and its op-

tional subsign code are marked manually as ground

truth. Example images of detected signs and regions

SubsignDetectionandClassificationSystemforAutomatedTraffic-signInventorySystems

265

(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k)

Figure 5: Example images contained in our test dataset.

below these signs are shown in Fig. 5.

We have assessed the detection and classiﬁcation

accuracy separately, as both aim at different aspects

(i.e. the occurrence of a subsign and its speciﬁc type).

4.2 Detection Performance

The detection performance is assessed for both the

single-image detection and the multiview detection

stages. During evaluation, the following performance

metrics are employed:

• Precision:

T P

T P+F P

• Recall:

T P

T P+F N

where T P denotes the correct detections (true pos-

itives), FP denotes the erroneously found subsigns

(false positives) and FN the missed subsigns (false

negatives). Fig. 6 displays the recall-precision curves

0 0.2 0.4 0.6 0.8 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Recall

Precision

Single image

3D,av

3D,ma

3D,me

Figure 6: Recall-precision curves for both the single image

and multiview detection stages.

for both the single-image and multiview detection

stages. For clarity, we additionally included a

zoomed-in version, focusing at the high recall region

in Fig. 7. We should note that the sharp bend in the

curve for the single-image detection occurs because

for about 10% of the images, the subsign is com-

pletely invisible (e.g. due to occlusions).

It can be observed that multiview detection sig-

niﬁcantly outperforms the single-image detection.

This is explained by the fact that many of the con-

tained signs are captured from a relatively large dis-

tance, from non-ideal viewpoints, or are (partially)

occluded, complicating subsign detection. Moreover,

the single-image detection stage is unable to retrieve

about 11% of the present subsigns, e.g. due to afore-

mentioned reasons. When combining the different de-

tections of the same sign, the optimal views can be

exploited, which increases the detection performance

0.8 0.85 0.9 0.95 1

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

Recall

Precision

Single image

3D,av

3D,ma

3D,me

Figure 7: Recall-precision curve zoomed-in around the high

recall region.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

266

Table 1: Summary of the classiﬁcation results. The last two rows are subclasses of types known to our classiﬁer.

Total Correctly classiﬁed Found as non-subsign

Total detections 757 100.0% - - - -

Non-subsign 368 48.6% 0 0% 63 17.1%

Unknown types 17 2.2% 0 0% 0 0%

Known types 372 49.1% 340 91.4% 0 0%

-common subsigns 200 53.8% 178 89.0% 0 0%

-text subsigns 172 46.2% 162 94.2% 0 0%

considerably. We have found that for a high recall,

the three multiview methods behave similarly, where

the median and average methods outperform the max

method for the lower recall region. Numerically, at a

detection rate of 98%, the correct detection score is

51.3%, resulting in 757 signs out of the 3, 104 total

3D signs marked as containing a subsign. Although a

precision of 51.3% looks quite low, subsigns only oc-

cur in a minority of cases, implying that for inclusion

of subsigns in the inventory result, only about 24.1%

of the 3D signs have to be checked, thereby reducing

the amount of manual checks with a factor 4.1 com-

pared to evaluating all 3D signs by hand.

Considering the processing time, the single-image

detection stage takes ∼ 1.9 seconds per detection, re-

sulting in an average processing time of about 9.5 sec-

onds for each 3D sign. We should note that although

this looks rather slow, this computation time applies

to each 3D sign (not each image) and is valid for a

single-threaded implementation measured on a 2009

i7-920 CPU, operating at 2.67 GHz. Also, we should

point out that the subsign detection stage adds about

0.9% of the total computational load required for all

other components within the road-sign inventory sys-

tem. Since this task can be performed in parallel for

each 3D sign (employing multi-core computers in a

distributed computing environment), this processing

can be distributed to reduce the elapsed time of this

stage to a sufﬁciently low number suitable for our pur-

pose.

4.3 Localization and Classiﬁcation

Performance

After multiview detection, each 3D signs identiﬁed as

containing a subsign is subject to classiﬁcation. We

consider three subgroups, as indicated in Table 1. The

falsely detected subsigns are grouped into the non-

subsign category, which may be classiﬁed as a non-

subsign (an additional class in our classiﬁcation sys-

tem) but can never be classiﬁed correctly. The second

category contains the minority of subsign types that

are not known to our classiﬁer (and are also non-text),

and are therefore always classiﬁed incorrectly. The

third group contains the types known by our classi-

ﬁer. This category is divided into two segments: com-

mon subsigns and text subsigns. The ﬁrst category

contains all subsigns that occur frequently, including

broadly used text subsigns, which are all learned as a

separate class by our classiﬁer. The second category

contains all other text subsigns, as in practice the text

can be unconstrained and may even differ in a single

letter. The classiﬁcation results are summarized in Ta-

ble 1. As can be noticed, 91.4% of the subsigns with

a type known to our classiﬁer is recognized correctly.

Furthermore, 17.1% of the falsely detected subsigns

is recognized accordingly, decreasing the number of

false positives and resulting in a correct detection ra-

tio of about 56% (at a detection rate of 98%).

When evaluating the processing time, the localiza-

tion and classiﬁcation stages require 2.3 seconds per

3D sign, which results in an average total process-

ing time slightly above 10 seconds per 3D sign for

the complete subsign recognition system. This adds

about 0.9% to the total computational load required

for all other components within the road sign inven-

tory system.

5 CONCLUSIONS AND FUTURE

WORK

This paper has presented a system for recognition of

subsigns as an extension for automated trafﬁc-sign

recognition systems. The system ﬁrst analyzes of the

regions below each detected trafﬁc sign (given by the

road sign recognition system) using generic object de-

tection methods. Second, the consecutive detections

of the same 3D sign are combined to determine the

presence of a subsign for each 3D sign. For signs

identiﬁed as having a complimentary sign, the sub-

sign pixel region is determined for each detection, and

these regions are then subject to classiﬁcation to re-

trieve the subsign type.

Performance evaluation on a large and diverse test

set showed that reliable detection is feasible, where

at a detection rate of 98%, about 51% of the detec-

tions is correct. Since subsigns are sparse (i.e. only

SubsignDetectionandClassificationSystemforAutomatedTraffic-signInventorySystems

267

occur below the minority of signs), this results in a

signiﬁcant reduction in manual effort to include sub-

signs in road-sign inventories. This score may be in-

creased by using the statistics of combinations of pri-

mary sign types and subsign occurrences, to exclude

certain sign types from subsign detection. Classiﬁ-

cation of subsigns is challenging, since subsigns are

small and have varying resolutions and a high diver-

sity in contents, where falsely detected subsigns are

also present. Nevertheless, 91.4% of the subsigns

with a class known to our classiﬁer are classiﬁed cor-

rectly. Additionally, 17.1% of the falsely detected

subsigns are recognized accordingly as non-subsign.

This implies that by a combined detection and classi-

ﬁcation approach, the correct detection ratio is further

improved to ∼ 56% for a detection ratio of 98%.

REFERENCES

Hamdoun, O., Bargeton, A., Moutarde, F., Bradai, B., and

Chanussot, L. (2008). Detection and recognition of

end-of-speed-limit and supplementary signs for im-

proved european speed limit support. In 15th World

Congress on Intelligent Transport Systems (ITS), New

York,

Etats-Unis.

Hazelhoff, L., Creusen, I. M., and De With, P. H. N.

(2012). Robust detection, classiﬁcation and position-

ing of trafﬁc signs from street-level panoramic images

for inventory purposes. In Applications of Computer

Vision (WACV), Workshop on, pages 313 –320.

Lowe, D. G. (2004). Distinctive image features from scale-

invariant keypoints. Int. Journal of Computer Vision

(IJCV), 60(2).

Maldonado-Bascon, S., Lafuente-Arroyo, S., Gil-Jimenez,

P., Gomez-Moreno, H., and Lopez-Ferreras, F. (2007).

Road-sign detection and recognition based on support

vector machines. Intelligent Transportation Systems,

IEEE Transactions on, 8(2):264 –278.

Maldonado-Bascon, S., Lafuente-Arroyo, S., Siegmann,

P., Gomez-Moreno, H., and Acevedo-Rodriguez, F.

(2008). Trafﬁc sign recognition system for inventory

purposes. In Intelligent Vehicles Symposium, 2008

IEEE, pages 590 –595.

Nienhuser, D., Gumpp, T., Zollner, J., and Natroshvili, K.

(2010). Fast and reliable recognition of supplementary

trafﬁc signs. In Intelligent Vehicles Symposium (IV),

2010 IEEE, pages 896–901.

Overett, G. and Petersson, L. (2011). Large scale sign detec-

tion using hog feature variants. In Intelligent Vehicles

Symposium (IV), 2011 IEEE, pages 326 –331.

Puthon, A.-S., Moutarde, F., and Nashashibi, F. (2012).

Subsign detection with region-growing from con-

trasted seeds. In Intelligent Transportation Systems

(ITSC), 2012 15th International IEEE Conference on,

pages 969–974.

Timofte, R., Zimmermann, K., and Van Gool, L. (2009).

Multi-view trafﬁc sign detection, recognition, and

3d localisation. In Applications of Computer Vision

(WACV), 2009 Workshop on, pages 1 –8.

Timofte, R., Zimmermann, K., and Van Gool, L. (2011).

Multi-view trafﬁc sign detection, recognition, and 3d

localisation. Machine Vision and Applications.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

268