TOWARDS INTENT DEPENDENT IMAGE ENHANCEMENT

State-of-the-art and Recent Attempts

Marco Bressan, Gabriela Csurka and Sebastien Favre

Xerox Research Centre Europe, 6, ch. de Maupertuis, 38240 Meylan, France

Keywords:

Intent-based image enhancement, semantic content, image understanding, photographic aesthetics.

Abstract:

Image enhancement is mostly driven by intent and its future largely relies on our ability to map the space of

intentions with the space of possible enhancements. Taking into account the semantic content of an image is an

important step in this direction where contextual and aesthetic dimensions are also likely to have an important

role. In this article we detail the state-of-the-art and some recent efforts in for semantic or content-dependent

enhancement. Through a concrete example we also show how image understanding and image enhancement

tools can be brought together. We show how the mapping between semantic space and enhancements can

be learnt from user evaluations when the purpose is subjective quality measured by user preference. This is

done by introducing a discretization of both spaces and notions of coherence, agreement and relevance to the

user response. Another example illustrates the feasibility of solving the situation where the binary option of

whether or not to enhance is considered.

1 INTRODUCTION

Considering both digital cameras and camera phones,

it is estimated that almost 400 billion images will be

captured in 2007 only (Hoffenberg, 2006). Regard-

less of the ﬁnal medium where the images will be

managed, shared and visualized, the quality expec-

tations of consumers are likely to grow steadily. It

is currently very easy for users to integrate their own

content into workﬂows such as online photoﬁnishing

or content-sharing communities. The variability of

content type, mainly due to the democratization of the

production and distribution tools, together with the

increased quality expectations, results in a demand

for automated or semi-automated image enhancement

tools that can help reducing user interaction.

New features such as automatic color balance or

red-eye correction are now standard components in

mainstream image editing applications. New compa-

nies offer products focused exclusively on automatic

image enhancement to software vendors, camera and

phone manufacturers, printing providers or directly

to end-users. Most of the current offering follows a

classical approach to image enhancement where some

kind of degradation, which has to be compensated, is

assumed.

Acquisition conditions, user expertise, compres-

sion algorithms or sensor quality, can seriously de-

grade the ﬁnal image quality and image enhancement

attempts to compensate for this degradation by al-

tering image features for subsequent analysis, dis-

tribution or display. Examples include contrast and

edge enhancement, noise ﬁltering for a wide vari-

ety of noise sources, sharpening, exposure correc-

tion, colour balance adjustment, automatic cropping

or correction of shaky images. Some of these features,

such as noise, can be objectively deﬁned and others,

such as contrast, can be inspired by human percep-

tion. Still, in most cases, the ﬁnal judgment over the

performance of an enhancement algorithm is subjec-

tive. For example, while some people might prefer to

see the shadowed details made visible by some local

contrast approach, others will appreciate the sensation

of depth caused by the original shadows.

Enhancement is mostly driven by intent. The in-

tention of a photographer to depict a scene will value

those photographs or those enhancement operations

that lead to a more faithful representation of the cap-

tured scene. The intention of a designer or an ad-

vertiser could be to enhance an image in a way op-

timal for transmitting a message, e.g. an emotion.

208

Bressan M., Csurka G. and Favre S. (2007).

TOWARDS INTENT DEPENDENT IMAGE ENHANCEMENT - State-of-the-art and Recent Attempts.

In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 208-218

DOI: 10.5220/0002068202080218

 SciTePress

The intention of a person including a photo of a baby

in a family album can be to simply highlight a fa-

cial expression, at the cost of leaving degradations

untouched or even highlighting them. The intent of

a photoﬁnishing operation is to automate image en-

hancement in order to please the largest possible audi-

ence. Their intent driven enhancement issues (e.g en-

hancing shadow details) could be solved by user eval-

uations, indicating which approach is preferred by the

majority of users.

In general, the future of image enhancement re-

lies on our ability to map the space of intents with

the space of possible enhancements. While we can

assume the space of enhancements to be reasonably

well deﬁned, the challenge will be for long to model

intentions. For this purpose, the present model based

on image degradations is insufﬁcient and we need to

extend its scope to multiple semantic, aesthetic and

contextual dimensions. Recently, we have witnessed

the ﬁrst efforts in this direction. These efforts have

mostly focused on the semantic dimension and in

many cases were pushed by the advances in object

and scene recognition towards understanding image

content. The assumption here is that semantic con-

tent drives intention. Also, the fact users are starting

to share content and express their preferences online

can be understood as instances from which this map-

ping can be learnt and some recent approaches focus

on this fact. These are probably the ﬁrst approaches

to take into account the aesthetic dimension. Work-

ing on large amounts of available user preference data

is a very promising direction for understanding intent

space.

This article details ﬁrst some recent efforts on

image enhancement to then focus on the particular

problem of semantically dependent enhancement. In

this case, the concept space is deﬁned by the set of

semantic categories and, eventually, the relationship

among these categories. To illustrate our approach

with experiments, instead of considering the complete

space of possible image enhancements, we restrict

our approach to the variations that might be gener-

ated from a particular image enhancement approach.

Having deﬁned our semantic categories and enhance-

ment space, the mapping is learnt from user prefer-

ence evaluations.

The remainder of this paper is organized as fol-

lows: section 2 describes the prior art in semantic con-

tent dependent image enhancement including a state-

of-the-art of image enhancement (section 2.1) and se-

mantic image understanding (section 2.2). In sec-

tion 3 we present a semantic content dependent image

enhancement (SCDIE) system and conclude the paper

in section 4.

2 PRIOR ART

There have always been enhancements designed for

speciﬁc types of images such as remote sensing im-

agery, medical imaging or document images. In this

last ﬁeld, recent approaches propose different en-

hancements depending on whether the document is

classiﬁed as text, block diagrams, road maps, com-

puter generated images or user photos (Allen et al.,

2004; Ichikawa and Miyasaka, 2005). There are also

enhancements which are speciﬁc to the output device

(Furuki and Yamada, 2006) especially those related to

mobile phones (Quelard, 2004).

In the ﬁeld of photography probably the ﬁrst en-

hancements clearly linked with semantic content were

human skin and sky: skin dependent exposure cor-

rection (Battiato et al., 2003), skin defect detection

and correction (Hillebrand et al., 2003) sky detec-

tion based image orientation correction (Luo, 2003)

or sky-based color enhancements (Luo and Etz, 2002;

Zafarifar and de With, 2006). Skin and sky detection

typically require low-level image understanding since

detection strongly relies on color analysis techniques.

At a higher-level there has been strong research

on face enhancement such as face makeup simula-

tion (Utsugi, 2003), skin smoothing, eye and teeth

whitening ﬁlters (Simon and Matraszek, 2006), facial

skin color-based color saturation, white balance and

overall density correction (Mutza, 2006) or adjust-

ing lightness, contrast, and/or the color levels of the

image based on the detected faces (Lin et al., 2002).

Many red-eye detection and correction make use of

face detection as a preprocessing stage for reducing

the number of candidate regions (Gaubatz and Ulich-

ney, 2002; Gasparini and Schettini, 2005).

The attention received by approaches focused on

face enhancement is not surprising. Faces, spe-

cially known faces, are common ﬁxation points when

observing a scene (Buswell, 1935; Henderson and

Hollingworth, 1999). Enhancing faces makes a lot

of sense: they are likely to receive much attention.

For closely related reasons, photographs with people

are more frequent than without. To estimate this im-

portance we applied a face detector to approximately

130000 images randomly selected from a photoﬁnish-

ing workﬂow. Roughly two out of every three images

contained at least one face. On the other side face

detection and recognition have for long represented

a challenge for the vision community and available

solutions and algorithms are readily available (Zhao

et al., 2003; Yang et al., 2002).

General object and scene detection and recogni-

tion approaches have also been proposed and, in the

general case, current performance is considerably be-

TOWARDS INTENT DEPENDENT IMAGE ENHANCEMENT - State-of-the-art and Recent Attempts

209

low the case of faces. Recent competitions (Evering-

ham et al., 2005; Everingham et al., 2006) show that

the problem is difﬁcult though important advances

were made recently (see section 2.2). Although they

mainly focus on object class recognition and detec-

tion, some of them are able to succesfully handle

scene categories too like: Indoor, Outdoor, Beach,

Mountain, City, Suburb, Road, Underwater, Sunset,

etc (Li and Wang, 2003; Barnard et al., 2003; Car-

bonetto et al., 2004; Quelhas et al., 2005; Perronnin

et al., 2006; Bosch et al., 2006).

There were some recent attempts to combine such

categories with enhancement. For example (Cham-

bah et al., 2004) propose a typical enhancement of

underwater images. On the other hand (Gallagher and

Bruehs, 2006) proposes a system where an improve-

ment parameter of sharpening or noise reduction is

generated from the belief map indicating the likeli-

hood that respective pixels are representative of faces,

ﬂash, sky, or vegetation. Similarly, (Gasparini and

Schettini, 2004; Fredembach et al., 2003) identify re-

gions as probable skin, sky, sea or vegetation in order

to avoid color cast removal which is intrinsic to those

categories, e.g. the blue of the sky.

Most of the efforts exposed link the semantic con-

tent with the enhancement under the assumption that

semantic content can guide the processing. Never-

theless, other aspects of the image can lead the en-

hancement process, such as image aesthetic or orig-

inality which in contrast with classical image qual-

ity are highly subjective measures. Recently, (Datta

et al., 2006) attempted to infer them with some visual

features using machine learning with a user prefer-

ences gathered from a peer-rated online photo set.

Image understanding and enhancement may be

also combined with meta-data information. In (Ober-

hardt et al., 2003), red eye correction and detection

counts on the knowledge the ﬂash was triggered at

time of capture. Also for correcting eyes (Sadovsky

et al., 2004) uses the information stored in the Ex-

changeable Image File Format (EXIF). This kind of

information is also suggested to improve image cate-

gorization (Lin and Tretter, 2005; Boutell and Luo,

2007). Unfortunately, the presence of meta-data is

not always ensured, mainly due to the variability en-

countered in those imaging scenarios where end users

can directly integrate into workﬂows. This is likely

to change, as standards are agreed and cameras are

able to include valuable meta-data such as geographic

location or web-retrieved information (O’Hare et al.,

2005).

The challenge of intent-based enhancement is to

estimate the function that maps an image and an inten-

tion to an enhancement. If we take into account only

semantic information and a single label is considered

per image, e.g. indoors, then the space of intents can

be modelled with a single discrete dimension which

corresponds to the categories. Dimensionality in-

creases as we consider multiple categories, locality

constraints, additional information such as meta-data,

etc. On the enhancement side, the types of processing

that can be applied to an image are restricted. In the

following sections we complete this prior art with a

list of typical enhancement techniques (section 2.1)

and with state-of-the-art image understanding tech-

niques that can be used for modelling the semantic

information on an image (section 2.2).

2.1 Image Enhancement

Image enhancement techniques are applied to obtain a

resulting image which is more suitable than the origi-

nal for a speciﬁc objective. Visual quality is a sample

objective but, depending on the application, quality

might not be the main purpose of enhancement, e.g.

medical imaging.

Enhancement algorithms can be global, where the

parameters controlling the enhancement approach are

the same over the whole image; or local, where the

parameters can vary spatially and are generally based

on the local characteristics of an image. Many en-

hancements require user interaction for setting or con-

trolling some of its parameters. In this case, the

enhancement is called manual. When all parame-

ters can be set without interaction, based on general

considerations or on image content, the enhancement

is called automatic. Automatic enhancements fre-

quently come with a preprocessing stage which esti-

mates the parameters of the actual enhancement using

image statistics. For instance, an unsharp mask where

the ﬁlter values are different on regions speciﬁed by

the user can be considered a local manual enhance-

ment. A common technique for enhancing images is

through Tone Reproduction Curves (TRCs) which are

global mappings of luminance or chrominance chan-

nels. The case where the mapping depends on the

image region is referred to as Tone Reproduction Op-

erator (TRO).

The most common enhancement techniques are

sharpening, exposure correction, color balance and

saturation adjustment, contrast and edge enhance-

ment, blocking artifact reduction and noise reduction.

There are many more enhancements focused on spe-

ciﬁc problems such as redeye correction, automatic

cropping, or glass glare removal.

Sharpness refers to the presence of crisp edges

and ﬁne details in an image. Basic sharpening ﬁlter

on images (Rosenfeld and Kak, 1982; Gonzalez and

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

210

Woods, 1992) can work in many cases but, since they

are usually high-pass ﬁlters they can also enhance

noise. To overcome this problem speciﬁc sharpening

algorithms robust to noise have been proposed, based

on weighted median ﬁlters (Fischer et al., 2002), non-

linear reaction-diffusion (Saito et al., 2003) or locally

adaptive ﬁlters (Polesel et al., 2000).

Exposure refers to the average of the global dis-

tribution of intensity along the dynamic range of the

image. Making the image darker or lighter can bring

details from the shadows or give depth to the colors

of the photograph. The automatic setting of expo-

sure, a feature present in most digital cameras, can

yield unrealistic results and exposure correction at-

tempts to overcome this problem. The most common

approach to correcting exposure is to apply gamma

correction to the image intensity. For instance, (Es-

chbach and Fuss, 1999) propose a method to deter-

mine the gamma parameter automatically from the

histogram of the input image.

The human visual system ensures the perceived

color of objects remains relatively constant under

varying illumination and reﬂectance conditions, e.g.

color constancy. When imaging devices are tailored

to common illuminants, e.g. D65, they can introduce

strong color casts when the scene has another light

source. The problem of adjusting the colors to re-

semble perceptual response is called color balance or

white balance and generally consists in a global en-

hancement. The two most common approaches for

color balance are based on two very simple assump-

tions to estimate the color cast. The Gray World

approach assumes the average chrominance on any

given image is approximately gray (Evans, 1951).

The Perfect Reﬂector algorithm assumes we can ﬁnd

a specular surface on the image reﬂecting the actual

color of the light source. Closely related, are white

point (Eschbach and Fuss, 1999) and black point

(Adams et al., 2003) approaches. Much research on

this ﬁeld has derived from the effort of developing

automatic color constancy algorithms (Barnard et al.,

2002) for machine vision purposes.

Saturation refers to the vividness of colored ob-

jects in an image. A color with more gray is con-

sidered less saturated, while a bright color, one with

very little gray in it, is considered highly saturated.

The saturation of a color can affect the emotional

reaction to an image. Colors that have low satura-

tions are often seen as dull and boring, but can also

be thought of as restful and peaceful. Highly satu-

rated colors, on the other hand, are more vibrant and

emotionally aggressive. Therefore, color saturation

is an important element in an intent based enhance-

ment system. In a classical automatic enhancement

approach, where neither the image content nor the

users intent is known, the system detects and modi-

ﬁes these extremes bringing the image saturation to a

generally acceptable level. An alternative to the direct

modiﬁcation of the saturation value in HSV space, is

to interpolate or extrapolate between the original im-

age and a black-and-white version of the image (Hae-

berli and Voorhies, 1994). Even if there exist auto-

matic saturation enhancement techniques (Eschbach

and Fuss, 1999), they must be rather conservative as

saturation preferences vary a lot between individuals

and depends often on the semantic content of the im-

age.

Contrast refers to the efﬁcient use of the dynamic

range. Improved contrast should make image details

more evident to a human observer. Contrast enhance-

ment can be achieved via global approaches (Tumblin

and Rushmeier, 1993; Eschbach et al., 1995). Spa-

tially uniform contrast enhancement approaches how-

ever fail to model perceptual attributes where local-

ity is an important characteristic. Depending on the

aggressiveness of the approach the images can ap-

pear washed-out or artiﬁcial. Limitations due to the

global nature of this technique are observed in those

images where luminance is uniformly distributed over

the whole range.

Local approaches through TROs have also been

proposed (Zuiderveld, 1994; Devlin et al., 2002; Di-

Carlo and Wandell, 2001; Fattal et al., 2002). A com-

plete evaluation of TRO performance focused on high

dynamic range display appears in (Ledda et al., 2005).

More complex approaches rely on generative models

to recover the reﬂectance typically using edge pre-

serving ﬁlters to avoid halo effects (Chiu et al., 1993;

Tumblin and Turk, 1999; Durand and Dorsey, 2002).

Blocking artifacts are the result of coding, resiz-

ing or compressing the image. A traditional approach

to reducing blocking artifacts is to low-pass ﬁlter the

pixels directly adjacent to the block boundaries. Us-

ing a Gaussian spatial domain ﬁlter (Reeve and Lim,

1984) is very fast; however, it cannot reduce artifacts

that are not conﬁned to pixels next to block bound-

aries. To overcome this problem linear block bound-

ary ﬁlters (Avril and Nguyen-Trong, 1992) or separa-

ble anisotropic Gaussian ﬁlters perpendicular to the

block boundary (Tzou, 1988) were proposed. The

drawback of these techniques is that they do not adapt

to local characteristics of the signal and change a high

frequency artifact for a low frequency one. There-

fore, (Ramamurthi and Gersho, 1986; Meier et al.,

1999), propose edge preserving space-variant region-

based ﬁlters and (Xiong et al., 1997; Kim et al., 1998)

wavelet transform to smooth blocking effects while

preserving edges. In an automatic approach it is im-

TOWARDS INTENT DEPENDENT IMAGE ENHANCEMENT - State-of-the-art and Recent Attempts

211

portant also to be able to estimate the blockiness of an

image (Minami and Zakhor, 1995; Tan and Ghanbari,

2000; Fan and de Queiroz, 2003) to adjust the level of

correction and avoid unnecessary degradation.

Imperfect instruments, problems with the data ac-

quisition, transmission and compression can all be

sources of noise on the image. Random image noise

corresponds generally to visible grain or particles

present in the image which are generally caused by

the electronic noise in the input device sensor and cir-

cuitry (e.g. scanner, digital camera). Intensity spikes,

speckle or salt and pepper noise will only affect a

small number of image pixels. They are caused by

ﬂecks of dust on the lens or inside the camera, dust

or scratches on scanned photography or ﬁlm, faulty

CCD elements, “hot pixels” occurring with long ex-

posures with digital camera, etc. Banding noise can

be introduced when the data is read from the digital

sensor (e.g. scanner streaks) and scratches on the ﬁlm

will appear as additional artifacts on the images. One

method to remove noise is by convolving the original

image with a mask (e.g. Gaussian). Its drawback is

the blurring of edges. In contrary, a properly designed

median ﬁlter is very good at removing salt and pep-

per noise preserving image detail. Promising denois-

ing results can be achieved using a wavelets (Portilla

et al., 2003), anisotropic diffusion (Perona and Malik,

1990), and bilateral ﬁltering (Tomasi and Manduchi,

1998). A recent survey of different techniques can be

found in (Motwani et al., 2004).

Image blur is a form of bandwidth reduction typ-

ically caused by relative motion between the camera

and the original scene or by an optical system that

is out of focus. It can affect the totality or part of

an image and many cameras today have built in so-

lutions to stabilize image capture. There are differ-

ent techniques available for solving the restoration

problem from blind de-convolution methods (Zhang

et al., 2000; Stern et al., 2002) to approaches that

combine power-laws with wavelet domain constraints

(Jalobeanu et al., 2002; Neelamani et al., 2004). A

method to remove the speciﬁc blur due to camera

shake was proposed in (Fergus et al., 2006). The au-

tomatic implementation of these techniques requires

the estimation of the level of blur or motion blur for

the different image regions, prior to correction.

An example of a completely automatic system in-

cluding most of the mentioned enhancements is Xe-

rox’s Automatic Image Enhancement (XAIE) (Xerox,

2006). This approach is composed of two stages: a

decision mechanism stage and an apply stage. Given

an image, the (low-cost) decision mechanism stage

determines whether a particular enhancement will or

will not be applied on an image and this is typically

done anticipating image improvement or degradation.

For this purpose, the decision mechanism uses statis-

tics such as noise measures or luminance and chromi-

nance distributions on a low resolution version of the

image. Details for particular components of the XAIE

decision mechanism can be found in (Eschbach and

Fuss, 1999; Bressan et al., 2007). The settings for the

decision mechanism are largely based on user prefer-

ence evaluations. The output of all decision mecha-

nisms are combined and fed to the apply stage. This

architecture also presents advantages in terms of com-

putational costs since only those enhancements that

can beneﬁt the image are actually applied.

2.2 Image Understanding

Image understanding refers to a set of operations that

transforms pictorial inputs into commonly understood

descriptions. Even if we are far from a complete au-

tomatic understanding/description of the image, huge

advances were made in the last few years to success-

fully assign keywords to an image based on its high-

level content. These techniques can analyze the whole

scene or focus on objects within the image. Systems

are considered generic when their technology is in-

dependent of the classes or object types. The main

difﬁculty of such generic systems is that they have to

handle not only the variations in view, imaging, light-

ing and occlusion, typical of the real world, but also

intra-class variations typical of the semantic classes,

e.g. types of chairs.

The most common tasks are recognition, classiﬁ-

cation or detection. Recognition concerns the iden-

tiﬁcation of particular object instances. Object and

scene classiﬁcation are the tasks of assigning one or

more general tags to an image. Detection is the prob-

lem of determining if one or more instances of an ob-

ject occur in an image and, typically, estimate loca-

tions and scales of the detected instances. From the

perspective of image enhancement, classiﬁcation and

detection are considered more relevant than recogni-

tion. By far, the “object” that received most of the

attention for detection and recognition has been faces

(Yang et al., 2002).

The ﬁrst multi-class categorization approaches

were based on image segmentation. Their aim was la-

belling relatively homogeneous image segments with

keywords such as sky, grass, tiger, water, rocks. To

do this they used statistical models to learn a sort

of dictionary between individual image blobs (seg-

ments) and a set of predeﬁned keywords (Barnard

et al., 2003; Carbonetto et al., 2004; Chen and Wang,

2004; Li et al., 2004).

Motivated by an analogy to bag-of-words based

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

212

learning methods for text categorization, a large set of

bag of visual word (BOV) based approaches emerged

recently. Similarly to text, an image is character-

ized by a histogram of visual word counts. In con-

trast to text categorization where a dictionary is avail-

able, one of the challenges for images is that the vi-

sual vocabulary has to be built automatically from the

training set. To do this, ﬁrst some image descrip-

tors are extracted from the image. Those descriptors

are generally based on texture, color, shape, structure

or their combination and are extracted locally on re-

gions of interest (ROI). The ROI can be obtained by

image segmentation as above, by applying speciﬁc

interest point detectors (Csurka et al., 2004; Quel-

has et al., 2005), by considering a regular grid (Car-

bonetto et al., 2004; Fei-Fei and Perona, 2005) or sim-

ply random sampling of image patches (Mar

ee et al.,

2005; Novak et al., 2006). All features extracted are

then mapped to the feature space and clustered to ob-

tain the visual vocabulary. Often a simple K-means

is used, however Gaussian Mixture Models (Farquhar

et al., 2005) or Self Organization Maps (Lefebvre

et al., 2006) can also be used to obtain a soft cluster-

ing, inline with the continuous nature of visual words.

Given a test sample, each feature vector is as-

signed to its closest visual word in the previously

trained vocabulary or to all visual words in a prob-

abilistic manner in the case of a stochastic model.

The histogram is computed by accumulating the oc-

currences of each visual word. Finally, the histogram

is fed to a classiﬁer, for example K nearest neigh-

bor (Bosch et al., 2006), probabilistic latent seman-

tic classiﬁer (Quelhas et al., 2005) or support vector

machines (Csurka et al., 2004).

Though most of the mentioned approaches use a

single visual vocabulary generally built on the whole

training set, this is not always the best option: Very

good performance is achieved when category labels

are used during the estimation of the visual vocab-

ulary. While (Farquhar et al., 2005) agglomerate

category-speciﬁc vocabularies into a single vocabu-

lary, (Perronnin et al., 2006) propose to adapt the vi-

sual vocabulary (universal) trained on the whole train-

ing set to each class using class-speciﬁc images. An

image is then characterized by a set of bipartite his-

tograms - one per class - where each histogram de-

scribes whether the image content is best modeled by

the universal vocabulary, or the corresponding class

vocabulary.

One of the drawbacks of the BOV approach is that

it considers the image as a “bag” of independent vi-

sual word instances. (Sivic et al., 2005) tried to over-

come this by building a visual vocabulary of features

describing the co-occurrences of visual words. (Fer-

gus et al., 2003; Leibe et al., 2004; Crandall and Hut-

tenlocher, 2006) proposed building generative mod-

els that take into account relatively strong geomet-

ric constraints between image patches. However, this

requires the alignment and segregation of different

views of objects in the dataset. (Csurka et al., 2005)

propose to incorporate geometric information based

on scale, orientation and closeness of the keypatches

in a boosting framework. The selected weak classi-

ﬁers are combined with the original BOV classiﬁer. In

(Carbonetto et al., 2004) geometry has been included

through generative MRF models of neighboring re-

lations between segmented regions. (Sudderth et al.,

2006; Fidler et al., 2006) proposes hierarchical learn-

ing of generic parts and feature combinations. The

above approaches aim to handle mainly object classes

and showed performance improvements for classes

such as cars, planes, faces. In contrary, (Boutell

et al., 2006) propose generative models for outdoor

scene conﬁgurations, consisting of regions’ identities

(beach, ﬁeld, mountain, street, suburban, open-water)

and their spatial relations (above, far above, below,

far below, beside, enclosed, and enclosing). However,

the improvement achieved by theses systems over the

BOV approaches is relatively modest compared to the

increased computational cost.

3 SCDIE

Semantic Content Dependent Image Enhancement

(SCDIE) is the result taking into account semantic

content for image enhancement. Classical enhance-

ment is mainly based on dimensions that model low-

level quality measures. SCDIE also considers seman-

tic dimensions and this extension allows for more pre-

cise models, e.g. an overexposed sunny landscape.

SCDIE maps this sample to a particular enhancement

which is then used to improve image quality. We call

the domain of this mapping ”Intent Space” (Ω

) and

the target ”Enhancement Space” (Ω

). Figure 1 il-

lustrates this approach for the case of SCDIE. Notice

that other dimensions can be naturally incorporated

into this model.

Such a system can be built with the following

components.

1. an image quality measure component which ana-

lyzes low-level features related with quality in the

image.

2. an image understanding component which assigns

one or more semantic labels to an image. Notice

that we assume quality and understanding are in-

dependent: image labels do not depend on quality.

TOWARDS INTENT DEPENDENT IMAGE ENHANCEMENT - State-of-the-art and Recent Attempts

213

Figure 1: The clssical AIE, extended to SCDIE and pos-

sible extentions to other dimensions of intent based image

enhancement (IBIE).

This useful assumption holds most of the time but

not always since it depends on the labels we con-

sider.

3. a function that maps the space spanned by image

quality and understanding output into the space of

possible enhancements.

4. an image enhancement component in charge of

applying the resulting enhancement.

We now specify the components in the design of

a particular instance of the system. For the image

quality component we take into account the statis-

tics which are computed in the decision mechanism

stage of XAIE (see section 2.1), e.g. luminance and

chrominance distributions and statistics, noise, edge

levels, blocking artifacts, etc. The understanding

component uses a BOV-based multi-label categorizer

trained on 8 categories: Urban, Portrait, Flowers, In-

teriors, Landscape, Snow and Sky. These categories

were chosen to be representative of images found in

typical imaging scenarios.

For the space of possible enhancements Ω

choose to use the topology provided by XAIE. We

consider 7 different enhancements dimensions in-

cluded in XAIE: contrast, exposure, shadow details,

saturation, color balance, noise reduction and sharp-

ness and discretize them into three intensity levels or

modes: low, default and high. Depending on the en-

hancement, the three bins can have a different inter-

pretation, e.g. dark, default and light for exposure

correction. With this approach, the discretization of

Ω

yields 3

= 2187 bins.

XAIE already estimates the mapping function in

the decision stage of the algorithm (see section 2.1).

The decision stage uses only the image quality com-

ponents in Ω

to determine the best enhancement in

Ω

. The decision mechanism does not take into ac-

count the semantic content. Since we assumed inde-

pendence between quality and understanding, we can

extend the mapping provided by the decision mech-

anism by linking semantic categories with enhance-

ment modes (semantic decision) provided the deci-

Figure 2: Pairs of (class,enhancement) plotted in the agree-

ment/coherence map. Green means it was preferred while

red means it was considered to have a bad effect. The size

of the circle was proportional with the relevance. Attention

should be paid to large circles with dominant color (red or

green) in the rightmost and uppermost hemispheres.

sion mechanism decides to apply the enhancement

(quality decision). The mapping between categories

and enhancements is learnt through user evaluations.

Given a category, choosing the preferred enhance-

ment is untractable even in the case Ω

is discretized.

So we ﬁrst restrict the space of enhancements by as-

suming independence between enhancements and by

screening out improbable mappings. The indepen-

dence assumption reduces the image comparisons to

3 ∗ 7 = 21 and the screening leaves out around two

thirds of these possibilities. Examples of enhance-

ments left out by the screening process were high sat-

uration on portraits or sharpness to sky images. No

category had more than six candidate enhancements

after the screening.

A representative set of images was printed from

each category and, for each image, all possible en-

hancements were printed on a single A3 size page.

Participants had to choose for each image the worst

and best enhancement. To analyze a given enhance-

ment on a given category, we considered the follow-

ing criteria:

• Agreement: measures either positive or negative

agreement among different user test participants

on a given image:

∑

i=1



+ N



+ g



+ N



where g(x) = xlog

(x), N

is the number of users,

is the number of different image considered.

, N

and N

is the number of users that chose

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

214

Figure 3: Some examples of special effect images.

the enhancement as being the best, the worst or

none of them for a given image i. Agreement mea-

sures the entropy of the distribution of preference

for all user for a single image.

• Coherence: measures uniformity of opinion

across all images of a single category for any

given participant

∑

u=1



+ N



+ g



+ N



where N

, N

and N

is the number of image for

which the user u considered the enhancement as

being the best, the worst or non of them. Coher-

ence measures the entropy of the distribution of

the preference of a single user across all images.

• Relevance: Percentage of times the enhancement

was chosen as either the best or the worst.

Figure 2 shows the pairs of (class,enhancement) in

an agreement/coherence map, allowing the following

interpretation. High agreement and high coherence

is good, meaning that everyone agrees and opinion is

consistent for all category images. Low agreement

and high coherence is bad because it indicates that

the appreciation of the enhancement is highly subjec-

tive and dependant on user preference for categories.

High agreement and low coherence is neutral, proba-

bly indicating that the image set for that category was

poorly chosen. Obviously, low agreement and low co-

herence is bad.

From this evaluation, enhancement improvements

were suggested for three out of the eight selected cat-

egories: ﬂowers, sky and urban.

3.1 Special Effects

In this section, we show a second example of SCDIE

subsystem, which tries to handle the problem of de-

tecting images with special effects in order to auto-

matically turn off the enhancement process for those

Figure 4: False Alarm Rates (FA) and False Rejection Rates

(FR) of the BOV as special effect occurence detector plotted

on a DET curve.

images. Such an automatic subsystem is interesting

in a printing workﬂow, where the growing number of

manipulated photos, images with unusual viewpoint,

lighting or artiﬁcial images of non-professional pho-

tographers (see examples in Figure 3) are most often

mixed with the rest of their album images. It is under-

stood that photographers generally prefer no further

changes in there artiﬁcially manipulated images.

In the context of SCDIE, this can be translated

as follows. The image understanding component la-

bels those images with Special Effects and the deci-

sion mechanism F maps this label to No enhancement.

To test the feasibility of such a system, we trained

a BOV (Perronnin et al., 2006) on images with and

without special effects collected from two indepen-

dent sources. Images from the ﬁrst source (1160 with

effect and 966 without) served as training data and im-

ages from the second source (536 with effect and 524

without) were used as test data. Figure 4 shows the

DET (Detection Error Tradeoff) Curve (Martin et al.,

1997) of the system acting as special effect occurence

detector. To the best of our knowledge, these are the

ﬁrst results reported for such a task. Furthermore,

special effects generally belong to clearly deﬁned cat-

egories: blur, painting/stylization, artiﬁcial lighting,

etc. We are currently experimenting if training our

system on these, better deﬁned subcategories as well

as the subcategories of natural photos (indoors, out-

doors, portrait, city, landscape, etc.) can improve the

current performance (EER =13.7) of our system.

4 CONCLUSION

The future of image enhancement relies on our ability

to map the space of intents with the space of possi-

TOWARDS INTENT DEPENDENT IMAGE ENHANCEMENT - State-of-the-art and Recent Attempts

215

ble enhancements for a given image. The main chal-

lenges are given by modelling the space of intents and

estimating the actual mapping. When the objective is

quality for user preference, user evaluations can be a

way of estimating the mapping.

The fact users are sharing their content and post-

ing their opinions online provides a unique oppor-

tunity for understanding visual preference more in

depth. From the variable facets of this visual prefer-

ence: perceptual, semantic content, aesthetic or con-

textual, we mainly focused in this paper on the par-

ticular problem of semantically dependent image en-

hancement. To illustrate our approach with exper-

iments, instead of considering the complete space

of possible image enhancements, we restrict our ap-

proach to the variations that might be generated from

a particular image enhancement approach and learnt

a mapping between our semantic categories and en-

hancement space from user preference evaluations.

Finally, a simple example scenario is presented,

showing how an SCDIE system can handle the prob-

lem of detecting images with special effects in a print-

ing workﬂow in order to automatically turn off the en-

hancement process for those images.

REFERENCES

Adams, J. E., Hamilton, J. F., Gindele, E. B., and Pillman,

B. H. (2003). Method for automatic white balance of

digital images. US Patent 6573932, Kodak.

Allen, D. J., Carley, A. L., and Levantovsky, V. (2004).

Method of adaptively enhancing a digital image. US

Patent 6807313, Oak Technology, Inc. (Sunnyvale,

CA).

Avril, C. and Nguyen-Trong, T. (1992). Linear ﬁltering for

reducing blocking effects in orthogonal transform im-

age coding,. J. Electronic Imaging, 1(2).

Barnard, K., Duygulu, P., D. Forsyth, N. de Freitas, D. B.,

and Jordan, M. (2003). Matching words and pictures.

J. of Machine Learning Research, 3.

Barnard, K., Martin, L., Coath, A., and Funt, B. (2002).

A comparison of computational color constancy algo-

rithms. IEEE Trans. on Image Processing, 11(9).

Battiato, S., Bosco, A., Castorina, A., and Messina, G.

(2003). Automatic global image enhancement by skin

dependent exposure correction. In IEEE-EURASIP

Workshop on Nonlinear Signal and Image Processing.

Bosch, A., Zisserman, A., and Munoz, X. (2006). Scene

classiﬁcation via pLSA. In ECCV.

Boutell, M. and Luo, J. (2007). Beyond pixels: Exploit-

ing camera metadata for photo classiﬁcation. pattern

recognition. Special Issue on Image Understanding

for Digital Photos. to appear.

Boutell, M., Luo, J., and Brown, C. (2006). Factor-graphs

for region-based whole-scene classiﬁcation. In CVPR

Workshop on Semantic Learning Applications in Mul-

timedia.

Bressan, M., Dance, C. R., Poirier, H., and Arregui, D.

(2007). LCE: (automatic) local contrast enhancement.

In SPIE, Electronic Imaging.

Buswell, G. (1935). How People Look at Pictures. Chicago

University Press, Chicago.

Carbonetto, P., de Freitas, N., and Barnard, K. (2004). A

statistical model for general contextual object recog-

nition. In ECCV.

Chambah, M., Semani, D., Renouf, A., Coutellemont, P.,

and Rizzi, A. (2004). Underwater color constancy: en-

hancement of automatic live ﬁsh recognition. In SPIE

Electronic Imaging, Science and Technology, volume

5293.

Chen, Y. and Wang, J. Z. (2004). Image categorization by

learning and reasoning with regions. JMLR, 5.

Chiu, K., Herf, K., Shirley, M., Swamy, P., Wang, S., and

Zimmerman, K. (1993). Spatially nonuniform scaling

functions for high contrast images. In Kaufmann, M.,

editor, Proc. Graphics Interface ’93.

Crandall, D. and Huttenlocher, D. (2006). Weakly super-

vised learning of part-based spatial models for visual

object recognition. In ECCV.

Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray,

C. (2004). Visual categorization with bags of key-

points. In ECCV Workshop on Statistical Learning

for Computer Vision.

Csurka, G., Willamowski, J., Dance, C., and Perronnin,

F. (2005). Incorporating geometry information with

weak classiﬁers for improved generic visual catego-

rization. In Int. Conf. on Image Analysis and Process-

ing.

Datta, R., Joshi, D., Li, J., and Wang, J. (2006). Studying

aesthetics in photographic images using a computa-

tional approach. In Leonardis, A., Bischof, H., and

Pinz, A., editors, ECCV.

Devlin, K., Chalmers, A., Wilkie, A., and Purgathofer, W.

(2002). Star: Tone reproduction and physically based

spectral rendering. In State of the Art Reports, Euro-

graphics.

DiCarlo, J. and Wandell, B. (2001). Rendering high dy-

namic range images. In SPIE: Image Sensors, volume

3965.

Durand, F. and Dorsey, J. (2002). Fast bilateral ﬁltering

for the display of high dynamic range images. ACM

Trans. on Graphics 21, 3.

Eschbach, R. and Fuss, W. (1999). Automatic enhancement

of scanned photographs. In EI Color Imaging: Device

Independent Color, Color Hardcopy and Graphic Arts

IV (ei16).

Eschbach, R., Waldron, B., and Fuss, W. (1995). Us patent

5340502: Image-dependent luminance enhancement.

Xerox Corporation.

Evans, R. M. (1951). Method for correcting photographic

color print. US Patent 2571697, Kodak.

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

216

Everingham, M., Gool, L. V., Williams, C., and

Zisserman, A. (2005). The pascal visual ob-

ject classes challenge results. http://www.pascal-

network.org/challenges/VOC/voc2005/results.pdf.

Everingham, M., Zisserman, A., Williams, C., and

Gool, L. V. (2006). The pascal visual ob-

ject classes challenge 2006. http://www.pascal-

network.org/challenges/VOC/voc2006/results.pdf.

Fan, Z. and de Queiroz, R. (2003). Identiﬁcation of bitmap

compression history: Jpeg detection and quantizer es-

timation. IEEE Trans. on Image Processing, 12(2).

Farquhar, J., Szedmak, S., Meng, H., and Shawe-Taylor,

J. (2005). Improving “bag-of-keypoints” image cate-

gorisation. Technical report, University of Southamp-

ton.

Fattal, R., Lischinski, D., and Werman, M. (2002). Gradi-

ent domain high dynamic range compression. ACM

Trans. on Graphics 21, 3.

Fei-Fei, L. and Perona, P. (2005). A Bayesian hierarchical

model for learning natural scene categories. In CVPR,

volume 2.

Fergus, R., Perona, P., and Zisserman, A. (2003). Ob-

ject class recognition by unsupervised scale-invariant

learning. In CVPR.

Fergus, R., Singh, B., Hertzmann, A., Roweis, S., and Free-

man, W. T. (2006). Removing camera shake from a

single image. In SIGGRAPH.

Fidler, S., Berginc, G., and Leonardis, A. (2006). Hier-

archical statistical learning of generic parts of object

structure. In CVPR.

Fischer, M., Parades, J., and Arce, G. (2002). Weighted me-

dian image sharpeners for the world wide web. IEEE

Trans. On Image Processing, 11(7).

Fredembach, C., Schr

oder, M., and S

usstrunk, S. (2003).

Region-based image classiﬁcation for automatic color

correction. In IS&T Color Imaging Conference.

Furuki, I. and Yamada, K. (2006). Image enhancement

device and image enhancement method of thermal

printer. US Patent Application 20050168561, Mit-

subishi Denki Kabushiki Kaisha.

Gallagher, A. and Bruehs, W. (2006). Method and system

for improving an image characteristic based on im-

age content. US Patent 20060228040, Eastman Kodak

Company (Rochester, NY).

Gasparini, F. and Schettini, R. (2004). Color balancing of

digital photos using simple image statistics. Pattern

Recognition, 37.

Gasparini, F. and Schettini, R. (2005). Automatic redeye

removal for smart enhancement of photos of unknown

origin. In Int. Conf. on Visual information systems.

Gaubatz, M. and Ulichney, R. (2002). Automatic red-eye

detection and correction. In ICIP.

Gonzalez, R. C. and Woods, R. (1992). Digital image pro-

cessing. Addison-Wesley Pub. Comp, Inc., Reading,

MA.

Haeberli, P. and Voorhies, D. (1994). Image processing by

linear interpolation and extrapolation. IRIS Universe

Magazine, Silicon Graphics, 28.

Henderson, J. and Hollingworth, A. (1999). High-level

scene perception. Annu. Rev. Psychol., 50.

Hillebrand, G., Miyamoto, K., and et al (2003). Skin imag-

ing and analysis systems and methods. US Patent

6571003,The Procter & Gamble Company.

Hoffenberg, S. (2006). Changing cameraphone user be-

haviour. Half-day seminar at Photokina.

Ichikawa, T. and Miyasaka, T. (2005). Web print system

with image enhancement. US Patent 6914694, Seiko

Epson Corporation (Tokyo, JP).

Jalobeanu, A., Blanc-Fraud, L., and Zerubia, J. (2002). Es-

timation of blur and noise parameters in remote sens-

ing. In Int. Conf. on Acoustics, Speech and Signal

Processing.

Kim, N., Jang, I. H., Kim, D., and Hong, W. H. (1998).

Reduction of blocking artifact in block-coded images

using using wavelet transform. IEEE Trans. Circuits

and Systems, 8(3).

Ledda, P., Chalmers, A., Troscianko, T., and Seetzen, H.

(2005). Evaluation of tone mapping operators using

a high dynamic range display. In Proc. ACM SIG-

GRAPH ’05.

Lefebvre, G., Laurent, C., Ros, J., and Garcia, C. (2006).

Supervised image classiﬁcation by som activity map

comparison. In ICPR.

Leibe, B., Leonardis, A., and Schiele, B. (2004). Com-

bined object categorization and segmentation with an

implicit shape model. In ECCV Workshop on Statisti-

cal Learning for Computer Vision.

Li, J. and Wang, J. Z. (2003). Automatic linguistic indexing

of pictures by a statistical modeling approach. PAMI,

25:9.

Li, Y., Bilmes, J. A., and Shapiro, L. G. (2004). Object class

recognition using images of abstract regions. In ICPR.

Lin, Q., Atkins, C., and Tretter, D. (2002). Image enhance-

ment using face detection. US Patent Application

20020172419, Hewlett Packard Company.

Lin, Q. and Tretter, D. (2005). Camera meta-data for

content categorization. US Patent 6977679, Hewlett

Packard Company.

Luo, J. (2003). Determining orientation of images contain-

ing blue sky. US Patent 6512846, Eastman Kodak

Company (Rochester, NY).

Luo, J. and Etz, S. (2002). A physical model based approach

to detecting sky in photographic images. IEEE Trans.

on Image Processing, 11(3).

Mar

ee, R., Geurts, P., Piater, J., and Wehenkel, L. (2005).

Random subwindows for robust image classiﬁcation.

In CVPR, volume 1.

Martin, A., Doddington, G., Kamm, T., Ordowski, M., and

Przybocki, M. (1997). The DET curve in assessment

of detection task performance. In EUROSPEECH.

Meier, T., Ngan, K. N., and Crebbin, G. (1999). Reduction

of blocking artifacts in image and video coding. IEEE

Trans. on Circuits and Systems for Video Technology,

9(3).

TOWARDS INTENT DEPENDENT IMAGE ENHANCEMENT - State-of-the-art and Recent Attempts

217

Minami, S. and Zakhor, A. (1995). An optimization ap-

proach for removing blocking effects in transform

coding. IEEE Trans. Circuits and Systems for Video

Technology, 5(4).

Motwani, M., Gadiya, M., Motwani, R., and Harris, F. C.

(2004). A survey of image denoising techniques. In

Global Signal Processing Expo and Conference.

Mutza, D. (2006). New fujiﬁlm image intelligence: The

next generation of automatic image quality optimiza-

tion. In International Congress of Imaging Science.

Fuji Photo Film (USA).

Neelamani, R., Choi, H., and Baraniuk, R. (2004). For-

ward: Fourier-wavelet regularized deconvolution for

ill-conditioned systems. IEEE Trans. on Signal Pro-

cessing, 52.

Novak, E., Jurie, F., and Triggs, B. (2006). Sampling

strategies for bag-of-features image classiﬁcation. In

ECCV.

Oberhardt, K., Taresch, G., and et al (2003). Method

for the automatic detection of red-eye defects in

photographic image data. US Patent Applications

20030044178,Milde & Hoffberg, L.L.P.

O’Hare, N., Gurrin, C., Lee, H., Murphy, N., Smeaton,

A. F., and Jones, G. J. (2005). My digital photos:

where and when? In Annual ACM international con-

ference on Multimedia.

Perona, P. and Malik, J. (1990). Scale-space and edge de-

tection using anisotropic diffusion. PAMI, 12(7).

Perronnin, F., Dance, C., Csurka, G., and Bressan, M.

(2006). Adapted vocabularies for generic visual cate-

gorization. In European Conf. on Computer Vision.

Polesel, A., Ramponi, G., and Mathews, V. J. (2000). Im-

age enhancement via adaptive unsharp masking. IEEE

Trans. On Image Processing, 9(3).

Portilla, J., Strela, V., Wainwright, M. J., and Simoncelli,

E. P. (2003). Image denoising using scale mixtures of

gaussians in the wavelet domain. IEEE Transactions

on Image Processing, 12(11).

Quelard, S. (2004). Image quality improvement for a cmos

mobile phone digital camera. Technical report, KTH

Stockholm Royal Innstitute of Technology.

Quelhas, P., Monay, F., Odobez, J.-M., Gatica-Perez, D.,

Tuytelaars, T., and Gool, L. V. (2005). Modeling

scenes with local descriptors and latent aspects. In

ICCV.

Ramamurthi, B. and Gersho, A. (1986). Nonlinear space-

variant postprocessing of block coded images. IEEE

Trans. Acoust., Speech, Signal Processing, ASSP-34.

Reeve, H. C. and Lim, J. S. (1984). Reduction of blocking

effects in image coding. Optical Engineering, 23(1).

Rosenfeld, A. and Kak, A. (1982). Digital picture process-

ing. Academic Press, Inc., New York.

Sadovsky, V., Yuan, P., Ivory, A. S., and Turner, R.

(2004). Automatic analysis and adjustment of digi-

tal images upon acquisition. US Patent Application

20040258308, Microsoft.

Saito, T., Harada, H., Satsumabayashi, J., and Komatsu, T.

(2003). Color image sharpening based on nonlinear

reaction-diffusion. In ICIP.

Simon, R. and Matraszek, W. (2006). Method and system

for enhancing portrait image that are processed in a

batch mode. US Patent Application 7050636, East-

man Kodak Company (Rochester, NY).

Sivic, J., Russell, B., Efros, A., Zisserman, A., and Free-

man, W. (2005). Discovering objects and their loca-

tions in images. In ICCV.

Stern, A., Kruchakov, I., Yoavi, E., and Kopeika, N. (2002).

Recognition of motion-blurred images by use of the

method of moments. Applied Optics, 41.

Sudderth, E., Torralba, A., Freeman, W., and Willsky, A.

(2006). Learning hierarchical models of scenes, ob-

jects, and parts. In ICCV.

Tan, K. T. and Ghanbari, M. (2000). Blockiness detection

for mpeg-2-coded video. IEEE Signal Processing Let-

ters, 7.

Tomasi, C. and Manduchi, R. (1998). Bilateral ﬁltering for

gray and color images. In ICCV.

Tumblin, J. and Rushmeier, H. (1993). Tone reproduction

for realistic images. IEEE Computer Graphics and

Applications, 13(6).

Tumblin, J. and Turk, G. (1999). Lcis: A boundary hierar-

chy for detail-preserving contrast reduction. In SIG-

GRAPH.

Tzou, K. H. (1988). Post-ﬁltering of transform-coded im-

ages. In SPIE: Applications of Digital Image Process-

ing XI, volume 974.

Utsugi, R. (2003). Method of correcting face image,

makeup simulation method, makeup method makeup

supporting device and foundation transfer ﬁlm. US

Patent 6502583, DRDC Limited (Tokyo, JP); Scalar

Corporation (Tokyo, JP).

Xerox (2006). Xerox’s Automatic Image Enhancement

System. http://www.xerox.com/downloads/usa/en/f/

FILE

PROD AIE Brochure.pdf.

Xiong, Z., Orchard, M. T., and Zhang, Y. Q. (1997). A de-

blocking algorithm for jpeg compressed images using

overcomplete wavelet representations. IEEE Trans.

Circuits and Systems for Video Technology, 7(4).

Yang, M.-H., Kriegman, D., and Ahuja, N. (2002). Detect-

ing faces in images: A survey. PAMI, 24(1).

Zafarifar, B. and de With, P. H. N. (2006). Blue sky de-

tection for picture quality enhancement. In Advanced

Concepts for Intelligent Vision Systems.

Zhang, Y., Wen, C., and Zhang, Y. (2000). Estimation

of motion, parameters from blurred images. Pattern

Recognition Letters, 21.

Zhao, W., Chellappa, R., Phillips, P., and Rosenfeld, A.

(2003). Face recognition: A literature survey. ACM

Comput. Surv., 35.

Zuiderveld, K. (1994). Contrast limited adaptive histogram

equalization. In Press, A., editor, Graphic Gems IV.

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

218