APPEARANCE BASED PAINTINGS RECOGNITION FOR A

MOBILE MUSEUM GUIDE

Claudio Andreatta

∗

ITC-irst Istituto per la ricerca scientiﬁca e tecnologica

38050 Povo, Trento, Italy

Fabrizio Leonardi

ITC-irst Istituto per la ricerca scientiﬁca e tecnologica

38050 Povo, Trento, Italy

Keywords:

appearance based recognition, image retrieval, color normalization.

Abstract:

This paper presents a prototype of a visual recognition system for a handheld interactive museum guide.

Contextualized information about museum drawings may be obtained by the user, without any knowledge

about how the system works by simply pointing a palmtop camera towards the painting and taking a shot. The

system was tested and performance was found to be satisfactory in challenging environment conditions.

1 INTRODUCTION

Human-computer interaction has become an increas-

ingly important part of our daily lives, and many re-

search projects are focused on ﬁnding non intrusive,

simple, and natural technology to allow a casual user

to interact with complex systems. In this context, vi-

sion based interfaces have many advantages.

New and interesting possibilities are offered by the

employment of Personal Digital Assistants (PDA).

Nowadays they can not only manage personal in-

formation, such as contacts, appointments, and to-

do lists, but also also connect to the Internet, act as

global positioning system (GPS) devices, run multi-

media software and be equipped witch sensors such

as digital cameras and microphones.

In this paper we propose a system which uses vi-

sion recognition techniques to provide a museum vis-

itor contextual information about a painting as in

(Robertson et al., 2004) and (Albertini et al., 2005).

In our test scenario the visitor brings a PDA equipped

with a digital camera. To ask for information about

a picture, the visitor simply points the PDA camera

to the painting and pushes a button. The PDA moni-

tor notiﬁes the user whether the system recognize the

museum painting or about the impossibility of analyz-

ing the image (e.g. the item was not correctly framed

or no image analysis was possible due to poor light

condition).

In the following we describe the system architec-

∗

Corresponding author: andreatta@itc.it

Paintings

Presentation

Vision

Engine

Recognition

Presentation

HTTP

SERVER SIDECLIENT SIDE

Server

PDA with camera

Figure 1: Simpliﬁed system architecture.

ture, the vision recognition engine and the image

processing techniques involved in the preprocessing

stage (Section 2). We present and discuss in the con-

cluding section the experimental results (Section 3).

2 SYSTEM ARCHITECTURE

The simpliﬁed system architecture is depicted in Fig-

ure 1. The PDA device is a HP IPAQ 5550 with

128 Mb and Intel X-Scale PXA255 400 Mhz proces-

sor and it is equipped with a wireless card, support-

ing WiFi connectivity and a digital camera LifeView

FlyCam CF 1.3M. The user interface was developed

in C# and runs on a Windows Pocket PC 2003 OS.

The vision recognition engine and the presentation

provider on the server side are implemented in C++

and run on a Linux machine. All the communications

138

Andreatta C. and Leonardi F. (2006).

APPEARANCE BASED PAINTINGS RECOGNITION FOR A MOBILE MUSEUM GUIDE.

In Proceedings of the First International Conference on Computer Vision Theory and Applications, pages 138-142

DOI: 10.5220/0001359901380142

 SciTePress

(a) (b)

Figure 2: (a) The PDA in camera mode, (b) A drawing shot

example.

among the client and the servers use a standard http

protocol.

When the interface is in camera mode, the display

represents what the camera frames. When the mu-

seum visitor wishes information about a painting, he

points the camera toward the painting and pushes a

query button. It is preferable that the whole draw-

ing, including the frame, be represented. The picture

is sent then to the recognition engine. On a positive

recognition the PDA provides a multimedia presen-

tation of the museum painting, otherwise a feedback

about the impossibility of analyzing the image will be

shown in order to help the user in better using the sys-

tem.

2.1 Vision Recognition Engine

The visual recognition engine purpose is to classify an

unknown query image submitted by the museum visi-

tor using the PDA. The query image is compared to all

the images in the paintings database and for each dif-

ferent painting (represented by one or more images) a

similarity score is computed and evaluated.

The database is built following the learning by

example paradigm. Several images of the museum

paintings, annotated with a unique identiﬁer, are ac-

quired with the palmtop. In the learning phase the

behavior of a potential visitor is simulated: the shots

are taken not only from a frontal view but at differ-

ent angles and at different distances (Figure 3). The

region of interest (ROI) for further processing is ob-

tained considering only a part of the shot, cropping

the picture: the height and width of the ROI are 3/4

of the original height and width (240x320). In order

to simulate even more camera positions, the ROI is

moved in 25 different positions. In the query phase,

the ROI is centered and it represents the inner and

more relevant part of the picture.

Figure 3: Learning phase: the pieces pictures are acquired

simulating a casual visitor position. It is required that all

the piece is depicted in the shot.

The visual recognition engine exploits research re-

sults from the ﬁeld of image retrieval. Many research

papers and systems have been presented for image re-

trieval based on low level visual feature with the goal

of preserving effectiveness minimizing the size of the

image descriptors and the response time (Brunelli and

Mich, 2000). The computation of the feature vector

and the retrieval itself are performed by a modiﬁed

version of the content based image retrieval system

COMPASS (Brunelli and Mich, 2000), (Andreatta,

2004) ,(Andreatta et al., 2005).

The following low level features histograms are

considered to describe the ROI visual content:

- Intensity: 8 bin,

- Edges magnitude (log mapped): 8 bin,

- Edges along vertical axis (log mapped): 8 bin,

- Edges along horizontal axis (log mapped): 8 bin,

- Hue: 8 bin,

- Saturation: 8 bin.

- Intensity co-occurrence: 4 × 4 bins bidimensional

histogram,

The ROI is partitioned in a 5 × 5 ﬁxed grid in or-

der to retain spatial information (see Figure 4(a)) and

each region, denoted as ROI

(r ∈ [0, 24]), is de-

scribed independently.

Histograms can be represented as vectors and their

difference can be quantiﬁed by a metric deﬁned in the

associated vector space. A widely used family of met-

rics is the L

family deﬁned as:

(x, y)=





− y

1/p



,p≥ 1 (1)

The L

metric, also known as the Manhattan norm,

provides good results and supports efﬁcient compari-

son. The distance (dissimilarity) between two images

is deﬁned as:

d(x, y)=



) (2)

APPEARANCE BASED PAINTINGS RECOGNITION FOR A MOBILE MUSEUM GUIDE

139

where the x

are the vector descriptors of the image

region ROI

, W

are the weights associated to each

region and K is a normalizing factor.

The recognition score is computed by inspecting

the nearest items in the feature space. Let be d(n),

d(n) ∈ [0, 1], the normalized distance between the

query image and the n-th nearest item in the database

and s(n) the similarity deﬁned as s(n)=(1− d(n)).

The recognition score obtained by class c (i.e. by a

speciﬁc museum painting) is deﬁned as:

S(c)=



w(n

)s(n

) (3)

where n

is the ranking in the nearest neighbor list of

class c objects, and w(n) is a tunable weight function.

If only the nearest item in the database is considered,

the weight function is deﬁned as w(n)=δ

Once the recognition score is computed, the visual

recognition engine returns to the PDA the sorted list

of pointing identiﬁers and scores of the most relevant

hypothesis. If the score of the ﬁrst hypothesis is above

a given conﬁdence threshold the presentation corre-

sponding to the guessed painting is shown, otherwise

the client notiﬁes the rejection to the user. The conﬁ-

dence threshold may be tuned on the client side.

2.2 Preprocessing

The shots of museum paintings acquired by the palm-

top camera are of poor quality and characterized by

a low contrast due to limited dynamic range of the

sensor (a common problem in low cost cameras with

CMOS sensors) and poor light condition (in Figure

5(a) the graylevel intensity histogram of a sample im-

age is depicted). Moreover the paintings themselves

lack saturated colors, making the color information,

in most cases, unreliable making preprocessing stage

necessary. In order to normalize and increase the dy-

namic range of the pictures, color and intensity equal-

ization algorithms are employed.

Among the many color equalization algorithm de-

veloped do far, the two most widely used are: Gray

World (GW) (Buchsbaum, 1980) and White Patch

(WP) (Funt and Cardei, 1994). These two models

are considered alternatives to each other in methods

of color correction.

Both models try to emulate two human visual adap-

tation mechanisms: lightness constancy and color

constancy. The Gray World approach is typical of

the lightness constancy adaptation because it modiﬁes

the dynamic range of the histogram, assuming that

the average world is gray i.e. assumes that the av-

erage of the surface reﬂectance over the entire scene

is gray. Alternatively, the White Patch approach is

typical of the color constancy adaptation, searching

for the lightest patch to be used as a white reference

Figure 4: Image processing ﬂow: original image, normal-

ized image with the description grid superimposed, detected

overexposed areas (seed regions are green and the detected

blooming area red), features regions weights used in the

comparison.

similar to how the human visual system does. The

human vision system mechanism is also highly non

linear, since it can be global and local at the same

time. Among the models that compute local color

adaptation using spatial relation and image content we

can consider Land’s Retinex theory (Land, 1977). A

recent approach, called Autmatic Color Equalization

(ACE), merges the Retinex model and the GW model,

performing simultaneously global and local ﬁltering

(Rizzi et al., 2002).

Even if local adaptive methods give the best results,

they are computational demanding for real time ap-

plications, therefore we moved to a simple and efﬁ-

cient approach based on a variant of the GW algo-

rithm. A contrast stretching transformation was con-

sidered: the image is normalized using a piecewise

linear function whose control points are determined

by inspecting the original histogram and computing

the expected gray point as in the GW method. The

normalization method enhances the contrast of a color

image by adjusting the pixels color to span as much

as possible the entire range of available colors. The

histogram tails are cut locating the histogram bound-

aries: 0.1 percent in the black range and 0.5 percent in

VISAPP 2006 - IMAGE UNDERSTANDING

140

0.005

0.01

0.015

0.02

0.025

0.03

0 50 100 150 200 250

Ratio

Bins

Intensity

0.005

0.01

0.015

0.02

0.025

0.03

0 50 100 150 200 250

Ratio

Bins

Intensity

Figure 5: Preprocessing: original image intensity his-

togram, normalized image histogram with cutted tails.

the white range. In the acquired images, the elongated

white tail and peaks are principally due to the pres-

ence of highlights, reﬂections and over under exposed

areas. In order to recover an illuminant-invariant mea-

sure of the paintings color the GW algorithm is ap-

plied to the stretched image disregarding histograms

tails (see Figure 4 and 5).

Histogram stretching may introduce ﬁctious dy-

namic so that the feature vector no longer represents

the visual content of an image. In order to avoid

this problem, a rejection mechanism based on the his-

togram characteristics was introducted. More speciﬁ-

cally, an input image is rejected whenever:

⎧

⎪

⎨

⎪

⎩

mincut

cut

∨ I

maxcut

cut

maxcut

− I

mincut

) <W

ROI

min

roi

∨ ROI

max

roi

(4)

where I

mincut

and I

maxcut

are the cutting points

of the histogram boundaries, (I

maxcut

− I

mincut

) is

the width of the cutted histogram and ROI

min

and

ROI

max

are the ROI maximum an minimum inten-

sity.

2.3 Inhibition of Overexposed Areas

When strong light sources are present or the paint-

ing is not correctly framed, parts of the image be-

come overexposed. Blooming effect occurs and over-

exposed areas bleed into nearby darker zones and de-

tail is lost.

In order to prevent the inﬂuence of such disruptive

effects, we developed a strategy for the detection and

inhibition of overexposed areas. Potentially overex-

posed areas are detected and marked as seeds of a

0.2

0.4

0.6

0.8

0 50 100 150 200

Intensity

Width

Intensity

0.2

0.4

0.6

0.8

0 50 100 150 200

Intensity

Width

Intensity

Figure 6: Overexposed area intensity proﬁle: normal proﬁle

and blooming effect proﬁle.

region growing algorithm. The region growing pro-

cedure tries to follow the blooming effect assuming

that the intensity function is smooth and monotoni-

cally decreasing. This assumption looks to be reason-

able by inspecting the intesity proﬁle of such areas

(Figure 6).

We deﬁne as overexposed region seed an image re-

gion R that, after the preprocessing stage, has the fol-

lowing properties:



R = {p |∀I(p) ≥ I

min

and S(p) ≤ S

max

}

A(R) ≥ A

min

(5)

where I is the image intensity and S the image satu-

ration.

The region growing algorithm tracing the bloom-

ing effect works as follows: for each pixel p of the

boundary of an overexposed region seed R, it grows

the region adding a new pixel q of the neighborhood

of p according the following criteria:

⎧

⎪

⎨

⎪

⎩

I(p) − I(q) ≤ I

mono

Monotonicity

I(q) ≥ I

min

Minimum I

|G(p) − G(q)|≤G

smooth

Smoothness

G(q) ≥ G

min

Plateau

(6)

where G is the gradient magnitude.

When the overexposed region has been detected, a

weighting factor is computed as:

=1−

A(R ∩ ROI

)

A(ROI

)

(7)

APPEARANCE BASED PAINTINGS RECOGNITION FOR A MOBILE MUSEUM GUIDE

141

Figure 7: The challenging environmental conditions at the

exhibition.

to attenuate the contribution of regions which may

depict overexposed areas as in Figure 4.

3 EXPERIMENTAL RESULTS

AND CONCLUSIONS

The system prototype has been tested with synthetic

images and in real application at an exhibition in

Castello del Buonconsiglio in Trento. The recogni-

tion engine achieves a perfect score with synthetic

images, but the exhibition testbed was far more in-

teresting and challenging (Figure 7). The paintings

in the exhibition were 13; in the learning phase about

43 shots for each drawing at different positions, 563

images in total, have been taken and inserted in the

recognition module database. In the testing phase 70

shots were submitted by the PDA to the recognition

engine, the following table summarizes the results:

Results

Recognized 77.14%

False positives

2.86%

Rejected

20.00%

The high rejection ratio is due to the varying illu-

mination conditions, to the presence of spotlights over

the drawings and, as already commented, upon to the

lack of dynamic range of the camera sensor. However,

from the casual user perspective, may be preferable

that the system provides feedback about the impos-

sibility of analyzing the image and how to solve this

problem instead of an incorrect classiﬁcation which

would trigger the start of a misleading presentation.

A prototype of a palmtop museum guide based on

computer vision recognition techniques in a challeng-

ing environment has been presented along with en-

couraging experimental results. As future work we

foresee to enhance the recognition preformance by

submitting multiple shots and to improve the feed-

back provided in order to guide the user, in a non ob-

trusive way, to correctly frame the drawing.

ACKNOWLEDGMENTS

This work was supported by the Provincia Au-

tonoma di Trento, Italy, under the project PEACH:

Personal Experience with Active Cultural Heritage

(http://peach.itc.it) and by the European Union under

the project VIKEF: Virtual Information and Knowl-

edge Environment Framework (http://vikef.net).

REFERENCES

Albertini, A., Brunelli, R., Stock, O., and Zancanaro, M.

(2005). Communicating user’s focus of attention by

image processing as input for a mobile museum guide.

In IUI 2005, International Conference on Intelligent

User Interfaces.

Andreatta, C. (2004). CBIR techniques for object recogni-

tion. Technical Report ITC-irst T04-12-01.

Andreatta, C., Lecca, M., and Messelodi, S. (2005).

Memory-based object recognition in digital images. In

VMV 2005, Vision, Modelling, and Visualization.

Brunelli, R. and Mich, O. (2000). Image retrieval by exam-

ples. IEEE Trans. on Multimedia, 2(3):164–171.

Buchsbaum, G. (1980). A spatial processor model for object

colour perception. Journal of the Franklin Institute,

310:1–25.

Funt, B. and Cardei, V. (1994). Committee-based color con-

stancy. J.Opt.Soc.Am. A, 11(11):3011–3020.

Land, E. (1977). The retinex theory of color vision. Scien-

tiﬁc American, 237(3):2–17.

Rizzi, A., Gatta, C., and Marini, D. (2002). Color correction

between gray world and white patch. In Electronic

Imaging 2002 S. Jos, California (USA).

Robertson, Laddaga, and Kleek, V. (2004). Virtual mouse

vision based interface. In IUI 2004, International

Conference on Intelligent User Interfaces.

VISAPP 2006 - IMAGE UNDERSTANDING

142