AUTOMATIC TRIMAP EXTRACTION FOR EFFICIENT ALPHA

MATTING BASED ON GRADIENT FIELD TRANSFORMS

Sang Min Yoon and Holger Graf

GRIS, TU Darmstadt, Rundeturmstrasse 10, Darmstadt, Germany

ZGDV, Computer Graphics Center, Rundeturmstrasse 10, Darmstadt, Germany

Keywords:

Alpha matting, Trimap extraction, Gradient ﬁeld.

Abstract:

Image/Video Matting aims at solving the problem of accurate foreground estimation from a given background

within still images or video sequences. The standard alpha matting method starts from a trimap, which sepa-

rates an input image into three regions: deﬁnitely foreground, deﬁnitely background, and unknown regions.

This paper presents an automatic trimap extraction based on an afﬁne transformation of gradient ﬁelds in or-

der to achieve an improved and robust Image/Video matting method. A gradient ﬁeld based background and

foreground segmentation technique provides a trimap extraction, which is robust to changing light conditions

within semi-transparent objects. Our proposed background subtraction is based on afﬁne transformed gradi-

ent projections of the input and background image and removes the background texture from a given image,

preserving the texture of the foreground objects. The presented automatic trimap extraction method reduces

the manual labor work in extracting and embedding target objects into a new background image or video

sequences and might ﬁnd its application within the broadcasting or movie industry.

1 INTRODUCTION

Image/Video matting, which deals with an extraction

of foreground objects from a background image by a

pixel with color or opacity segmentation, has been ex-

tensively studied during the last twenty years. Tradi-

tional matting techniques segment the foreground ob-

ject of a given image or video sequences by its color

and opacity characteristics. Current image/video mat-

ting technologies and systems try to efﬁciently extract

a high degree of mattes from a still image or a video

sequence and is mainly used in the ﬁlm production

and broadcasting systems. Here, image matting de-

ﬁnes a given input image as the composite of a fore-

ground layer and a background layer, and combines

those by using a linear blending of opacity values in

each pixel (Wang et al, 2007). It was ﬁrst mathemat-

ically deﬁned by (Smith et al, 1996) and models an

observed image I as a convex combination of a fore-

ground image F and a background image B by using

the alpha matte α:

I = αF + (1 − α)B (1)

where α is a value within [0,1]. From this equation

(1), the matting process has to solve an inverse prob-

lem with several unknowns, only given three con-

straints. The task of alpha matting is hence, to re-

cover the value of α, B, F at every pixel. To properly

extract meaningful foreground objects from equation

(1), several matting approaches start to segment the

input image into three regions which is referred to as a

trimap: deﬁnitely foreground, deﬁnitely background,

and unknown regions. If α = 1 or 0, we classify a

pixel within an image as ’deﬁnitely foreground’ or

’deﬁnitely background’ respectively. One of the im-

portant factors affecting the performance of a matting

algorithm is the accuracy of the trimap. The trimap

reduces the dimension of the solution space for the

matting problem, and leads the matting algorithm to

generate user-desired results.

In this paper, we describe an automatic trimap extrac-

tion methodology by developing an afﬁne transform

of gradient ﬁelds between background image and a

given input image for a robust alpha matting process.

Our proposed background subtraction system is ro-

bust in extracting the foreground regions within illu-

mination varying environments caused by changing

light conditions. Figure 1 shows the ﬂowchart for au-

tomatically extracting a trimap from a given image,

based on background subtraction and a closed-form

based alpha matting technique (Levin et al, 2006).

Yoon S. and Graf H. (2009).

AUTOMATIC TRIMAP EXTRACTION FOR EFFICIENT ALPHA MATTING BASED ON GRADIENT FIELD TRANSFORMS.

In Proceedings of the Fourth International Conference on Computer Graphics Theory and Applications, pages 52-57

DOI: 10.5220/0001766800520057

 SciTePress

Figure 1: Flowchart of our proposed automatic trimap extraction for efﬁcient alpha matting by transformed gradient ﬁelds

using cross-projection tensors.

An afﬁne transform of the cross projection tensors is

computed in order to extract the foreground elements

from a given input image automatically and separates

it into the ’deﬁnitely foreground’, the ’deﬁnitely back-

ground’ and ’unknown regions’.

2 PREVIOUS RESEARCH

In this section, we brieﬂy describe some state of

the art research for image/video matting, automatic

trimap extraction and efﬁcient background subtrac-

tion methods.

2.1 Alpha Matting

An early parametric sampling based matting algo-

rithm was proposed by (Ruzon et al, 2000), whose ap-

proach is based on a manifold, connecting the ”fron-

tiers” of each object’s color distribution. Based on

his approach, Bayesian matting (Chuang et al, 2001)

uses a continuously sliding window for neighborhood

deﬁnition, which marches inwards from the fore-

ground and background regions. In order to build

color distributions, the algorithm uses foreground and

background samples additionally to computed F, B

and α values. Thus, every pixel within a neighbor-

hood region will contribute to model the foreground

and background Gaussian. However, the paramet-

ric sampling based matting is weak when the back-

ground color is non-Gaussian. The ”knockout mat-

ting” method (Berman et al, 2000) has been devel-

oped to avoid the disadvantages of parametric sam-

pling matting by a weighted average of known fore-

ground and background pixels. Poisson matting (Sun

et al, 2004) solves a Poisson equation for the matte

by assuming that the foreground and background are

slowly varying compared to the matte. This algorithm

interacts closely with the user by beginning from a

hand-painted trimap offering painting tools to correct

errors within the matte. Defocus matting (McGuire et

al, 2005) works with pulling the mattes automatically

from video sequences captured with co-axial cameras

within different depths of ﬁeld and plane of focus. In-

stead of requiring a carefully speciﬁed trimap, some

recently proposed matting approaches allow the user

to specify a few foreground and background scrib-

bles as user input to extract a matte. However, in

order to reduce user involvement, automatic trimap

extraction, is one of the most important issues in im-

age/video matting. Optical ﬂow based background

subtraction (Chuang et al, 2002) or depth map ex-

traction (Joshi et al, 2007) is suggested to separate

foreground, and background automatically. A re-

cently published spectral matting algorithm (Levin et

al, 2007) can automatically extract a matte from an in-

put image without any user input and provides an ef-

ﬁcient video matting video matting for even complex

scenes, using an optical ﬂow to separate foreground

regions from background image.

2.2 Background Subtraction

The principle of a background subtraction methodol-

ogy is to detect moving objects by building the dif-

ference between the current frame and a reference

frame. A comprehensive overview and in depth lit-

erature review on background subtraction techniques

can be found in (Piccardi et al, 2004). Several meth-

ods for performing background subtraction try to ef-

fectively estimate the background model from tem-

porally trained sequences of images. (Wren et al,

1997) has proposed to model the background inde-

pendently at each pixel which is based on a Gaussian

probability density function. (Stauffer et al, 1999)

extended the uni-modal background subtraction ap-

proach by using an adaptive multi-modal background

subtraction method that modelled the pixel color as

a mixture of Gaussians. (Oliver et al, 2000) used an

eigen space model for background subtraction. Back-

AUTOMATIC TRIMAP EXTRACTION FOR EFFICIENT ALPHA MATTING BASED ON GRADIENT FIELD

TRANSFORMS

ground subtraction techniques which combine multi-

ple cues such as color and depth maps are also used

for video surveillance and monitoring system (Barotti

et al, 2003).

3 AUTOMATIC TRIMAP

EXTRACTION

Traditional background subtraction algorithms (Wren

et al, 1997, Oliver et al, 2000, Elgammal et al, 2000,

Barotti et al, 2003, Han, 2004) which are based on

frame differences cannot extract the illumination and

reﬂectance variable foreground objects and fail to ex-

tract the foreground objects that have similar color

values to the background. However, an image gradi-

ent ﬁeld based on projection tensors provides a way of

removing scene texture edges from images within the

illumination parameter space. We now show how to

remove the background from a given image by trans-

forming its gradient ﬁeld using cross projection ten-

sors obtained from a background image of the same

scene. The ﬁnal trimap for alpha matting is obtained

by a 2D integration of the modiﬁed gradient ﬁeld from

each color channel.

3.1 Background Subtraction from

Transformed Gradient Fields

To extract the foreground regions from a given image,

our proposed background subtraction in the gradient

ﬁelds is motivated by (Agrawal et al, 2006) whose

method removes the scene texture which is corre-

sponding to the background. From a given image I

the smoothed structure tensor, G

of the gradient im-

age ∇I is deﬁned as the convolution of the gradient

ﬁeld and Gaussian kernel. The smoothed structure

tensor with Gaussian kernel reduces the local noise

and aliasing. Equation (2) shows the structured tensor

from the gradient ﬁeld and Gaussian kernel.

= (∇I∇I

) ∗ K





∗ K

(2)

where * denotes the convolution and K

is a normal-

ized 2D Gaussian kernel of variance σ. Equation (2)

can be decomposed as eigenvectors and eigenvalues

of the input and background image as shown in equa-

tion (3) in order to extract the local intensity structure

within the image (Aubert et al, 2002).

= [v

]



0 λ





(3)

(a)Input image (b)Background Image

of D (d) D

of D (e) D

of D

Figure 2: (a) is a given input image for alpha matting (b)

a background image. (c), (d), and (e) are components of a

diffusion tensor to extract a foreground object.

where v

, v

denote the eigenvectors corresponding to

the eigenvalues λ

, λ

respectively and λ

≤ λ

. The

eigenvalues and eigenvectors of G

provide informa-

tion about the local intensity structures within the im-

age.

D =





= [u

]



0 µ





(4)

The ﬁeld of diffusion tensors at each pixel can be de-

scribed through a 2 × 2 symmetric, positive deﬁnite

matrix (Weickert, 2004). The diffusion tensors D by

selecting its eigenvectors u

, u

and eigenvalues µ

, µ

based on eigenvalues and eigenvectors of G

. D is

then obtained as equation (4). Based on the cross pro-

jection tensor of the gradient ﬁeld between the input

and background image, we suppress the texture infor-

mation of the background, still preserving the texture

of the background. With extracting the transformed

gradient ﬁeld using cross projection tensors, we inte-

grate the transformed gradient ﬁeld of each channel

of the image.

From the background of gradient ﬁelds and diffusion

function, we will focus on gradient projection of a

given image, I

, and background image,I

. By trans-

forming ∇I

with cross projection tensor D

, which

is the diffusion tensor between ∇I

and ∇I

, we can

remove all edges from I

which are in I

and retain all

edges in I

which are not in I

. Thus, scene texture

edge from an input image can be removed by trans-

forming its gradient ﬁeld using cross projection ten-

sors obtained from a background image of the same

scene. If we deﬁne an input image, I

and an back-

ground image, I

, the smoothed structure tensors of

each image are denoted as G

and G

, respectively.

Figure 2-(a) and (b) show the input image and the

background image. Figure 2-(c),(d), and (e) show the

GRAPP 2009 - International Conference on Computer Graphics Theory and Applications

extracted D

, D

= D

, and D

of the cross pro-

jection tensors.

3.2 Trimap Extraction from

Background Subtraction

From the cross projected tensors for each color chan-

nel, we extract the trimap of the input image for an

efﬁcient alpha matting. Extracted foreground regions

based on gradient ﬁelds include deﬁnitely foreground

and unknown region by illumination or intricately

shape of the target object in the input image is the

difference of the gradient ﬁeld between the input im-

age and the afﬁne transformed gradient ﬁeld by tensor

projection. The deﬁnitely foreground region based

on the roughly extracted foreground is separated into

the adaptive threshold that represents the channel dif-

ference and gradient difference. Figure 3 shows the

extracted trimap from the input image in the case of

ﬂuid and ﬁre image. Within the trimaps which are

shown in ﬁgure 3, we assigned to deﬁnitely back-

ground element as 0, deﬁnitely foreground elements

are assigned to 255, and unknown regions to 128. As

shown in ﬁgure 3, foreground regions which are ex-

tracted from the background subtraction are separated

as deﬁnitely foreground and unknown regions.

4 CLOSED-FORM MATTING

From the various image/video matting algorithms

from an image or video sequences, we tested a closed-

form based alpha matting with our automatically ex-

tracted trimap. The closed-form matting algorithm

which was recently presented by (Levin et al., 2006)

is based on explicitly deriving a cost function from lo-

cal smoothness assumptions on foreground and back-

ground. It eliminates F and B, yielding a quadratic

cost function in alpha, which can be easily solved as a

sparse linear system of equations. The mathematical

alpha matting equation which is represented in equa-

tion (1) can be re-written if each F and B is a linear

mixture of two colors over a small window around

each pixel as:

α =

∑

+ b, ∀i ∈ w (5)

where equation (5) is referred to as linear color chan-

nels, and a

and b are constants within the window w.

The matting cost function J is only dependent on α as

shown in equation (6).

J(α, a, b) =

∑

j∈I

(

∑

i∈w

(α

−

∑

− b

)

+ ε

∑

a(c

)

) (6)

Figure 3: Extracted trimap from efﬁcient background sub-

traction based on gradient ﬁelds in the case of ﬂuid and ﬁre.

From the equation (5), a

and b can be eliminated

from the cost function J, yielding a quadratic cost in

the α alone:

J(α) = α

Lα (7)

here L is an N × N operator, whose (i, j)-th element

denoted to:

∑

k|(i, j)∈w

(δ

(

i j) −

(1 + (I

− µ

)(Σ

)

−1

)(I

− µ

)) (8)

where Σ

is a 3 × 3 covariance matrix, µ

is a 3 × 1

mean vector of the colors in window w

, and I

the 3 × 3 identity matrix. The operator L, which is

called the Matting Laplacian, is the most important

analytic result from this approach. The afﬁnity de-

ﬁned in Equation (6) and the one deﬁned in Equation

(8) share the same property that nearby pixels with

similar colors have high afﬁnity values, while nearby

pixels with different colors have small afﬁnity values.

Figure 4 shows the recovered alpha values and em-

bedding into a new background. As shown in ﬁgure

4, the illuminated area within the original target ob-

ject is represented in a new background and the semi-

transparent object, e.g. ﬁre, that contains many holes

are also robustly embedded into a new background

image.

5 EXPERIMENTS

We have implemented the alpha matting algorithm

from the gradient ﬁelds and conducted some exper-

iments on a standard PC with Pentium4 1.2GHz in

order to show the efﬁciency and robustness for our

AUTOMATIC TRIMAP EXTRACTION FOR EFFICIENT ALPHA MATTING BASED ON GRADIENT FIELD

TRANSFORMS

Figure 4: Alpha extraction and matting in a new background.

Figure 5: Comparison of alpha matting between our proposed method and previous matting with depth map(McGuire et al,

2006). Left image is the original input image, the center image is the foreground.

automatic trimap extraction and alpha matting by

comparing it to previous research of background

subtraction, trimap extraction, and alpha matting

method. Our ﬁrst benchmark computes the back-

ground subtraction method based on a kernel density

estimation (Han, 2004). Based on the original image

in ﬁgure 5 (left), we show the results of our work

within the gradient ﬁeld domain (center) and the

extracted foregrond objects by (Han, 2004)(right).

The light reﬂected area within the image is well

reconstructed in our approach compared to the other.

We lead the alpha matting experiment from an au-

tomatically extracted trimap. Especially trimap ex-

traction of the semi-transparent object such as ﬁre or

ﬂuids is difﬁcult because there are many holes within

the interior of the target object. Figure 6 shows the

alpha matting of our proposed method and a pre-

vious alpha matting method based on dual cameras

(McGuire, 2006).

6 CONCLUSIONS AND FUTURE

WORKS

Within this paper, we have presented an automatic

trimap extraction using a new form of background

subtraction which is robust within illumination vari-

ant environments. The extracted trimap is applied

to a closed-form based alpha matting process as we

would like to show, that an automatic trimap extrac-

tion within a gradient ﬁeld is more efﬁcient than other

background subtraction methods which are based on

building frame differences or applying optical ﬂow

techniques to semi-transparent objects in view of

changing illumination and/or light reﬂectance. Our

experiments proved that the proposed method is very

robust in alpha matting of semi-transparent objects.

Physical phenoma, such as ﬁre or ﬂuid, have been

chosen as sample data due to their nature in time vary-

ing shape changes, comprising large wholes and un-

known regions within their inner shape, and lighting

characteristics. We are conﬁdent that this approach

could reduce the labor work in extracting and embed-

ding target objects into a new background image or

video sequences.

However, a closed-form based alpha matting pro-

cess which follows our automatically extracted trimap

extraction still takes too much time when the un-

known regions in a given image are large. We will

thus transfer our proposed algorithm into a GPU envi-

ronment in order to reduce the processing time by par-

allel calculation of the gradient ﬁelds and small neigh-

borhood window processing of unknown regions for

an efﬁcient and fast alpha extraction.

GRAPP 2009 - International Conference on Computer Graphics Theory and Applications

Figure 6: Comparison of background subtraction between gradient ﬁeld based afﬁne transform and kernel density estimation.

Gradient ﬁeld based background subtraction is more efﬁcient in the area of illumination change or light’s semi-transmission.

ACKNOWLEDGEMENTS

We have experiments with the ﬁre image data from

the Informatik, Max-Plank Institute (Ihrke et al, 2005)

and ﬂuid image data of the department of computer

science at Williams College (McGuire et al, 2006).

REFERENCES

Agrawal, A., Raskar, R., Chellappa, R., 2006. Edge sup-

pression by gradient ﬁeld transformation using cross-

projection tensors. In proceeding of IEEE CVPR

Aubert, G., Kornprobst, G., 2002. Mathematical Problems

in Image Processing: Partial Differential Equations

and the Calcuus of Varations. Applied Mathematical

Series, Springer-Verlag.

Barotti, S., Lombardi, L., Lombardi, P., 2003. Multi-

Module Switching and Fusion for Robust Video

Surveillance. In proceeding of IEEE International

Conference on Image Analysis and Processing.

Berman, A., Vlahos, P., Dadourian, A., 2000. Comprehen-

sive method for removing from an image the back-

ground surrounding. U.S. Patent.

Chuang, Y.-Y., Curless, B., Salesin, D. H., Szeliski, R.,

2001. A bayesian approach to digital matting. In pro-

ceeding of IEEE CVPR.

Chuang, Y., Agarwala, A., Curless, B., Szeliski, R., 2002.

Video matting of complex scenes. ACM Transaction

Graph.

Elgammal, A., Harwood, D., Davis, L., 2000. Non-

parametric model for background subtraction. In pro-

ceeding of ECCV.

Han, B., Comaniciu, D., Davis, L., 2004. Sequential ker-

nel density approximation through mode propagation:

applications to background modeling. In proceeding

of ACCV.

Ihrke, I., Magnor, M., 2005. GrOVis-Fire: A

Multi-Video Sequence for Volumetric Reconstruc-

tion and Rendering Research. http://www.mpi-

sb.mpg.de/∼ihrke/Projects/Fire/GrOVis Fire/

Joshi, N., Matusik, W., Freeman, W. F., 2007. Explor-

ing Defocus Matting: Non-Parametric Acceleration,

Super-Resolution, and Off-Center Matting . IEEE

Computer Graphics and Applications.

Levin, A., Linschinski, D., Weiss, Y., 2006. A closed form

solution to natural image matting. In proceeding of

IEEE CVPR.

Levin, A., Rav-Acha, A., Lischinski, D., 2007. Spectral

Matting. In proceeding of IEEE CVPR.

McGuire, M., Matusik, W., Yerazunis, W., 2006. Practi-

cal, Real-time Studio Matting using Dual Imagers. In

European Symposium on Rendering.

McGuire, M., Matusik, W., Pﬁster, H., Hughes, J, F., Du-

rand, F., 2005. Defocus Video Matting. In proceeding

of ACM SIGGRAPH.

land, A. P., 2000. A bayesian computer vision system for

modeling human interactions. IEEE Trans. on PAMI.

Piccardi, M., 2004. Background subtraction techniques:

a review. IEEE International Conference on System,

Man, and Cybernetics.

Ruzon, M., Tomasi, C., 2000. Alpha estimation in natural

images. In proceeding of IEEE CVPR.

Smith, A., Blinn, J., 1996. Blue screen matting. In proceed-

ing of ACM SIGGRAPH.

Stauffer, C., Grimson, W., 1999. Adaptive background mix-

ture models for real-time tracking. In proceeding of

IEEE CVPR.

Sun, J., Jia, J., Tang, C.-K., Shum, H.-Y., 2004. Poisson

matting. In proceeding of ACM SIGGRAPH.

Tschumperle, D., 2002. PDE’s based Regularization of

Multivalued Images and Applications. PhD Thesis,

Universite de Nice-Sophia Antipolis.

Wang, J., Cohen, M. F., 2007. Image and video matting: a

survey. Foundations and Trends in Computer Graph-

ics and Vision.

Weickert, J., 1997. A review of nonlinear diffusion ﬁltering.

In Scale-Space Theory in Computer Vision, Springer.

Wren, C., Azarbayejani, A., Darrell, T., Pentland, A. P.,

1997. Pﬁnder: real-time tracking of the human body.

IEEE Trans on PAMI.

Zomet, A., Peleg, S., 2002. Multi-sensor super resolution.

IEEE workshop on Application of Computer Vision.

AUTOMATIC TRIMAP EXTRACTION FOR EFFICIENT ALPHA MATTING BASED ON GRADIENT FIELD

TRANSFORMS