AUTOMATIC TRIMAP EXTRACTION FOR EFFICIENT ALPHA
MATTING BASED ON GRADIENT FIELD TRANSFORMS
Sang Min Yoon and Holger Graf
GRIS, TU Darmstadt, Rundeturmstrasse 10, Darmstadt, Germany
ZGDV, Computer Graphics Center, Rundeturmstrasse 10, Darmstadt, Germany
Keywords:
Alpha matting, Trimap extraction, Gradient field.
Abstract:
Image/Video Matting aims at solving the problem of accurate foreground estimation from a given background
within still images or video sequences. The standard alpha matting method starts from a trimap, which sepa-
rates an input image into three regions: definitely foreground, definitely background, and unknown regions.
This paper presents an automatic trimap extraction based on an affine transformation of gradient fields in or-
der to achieve an improved and robust Image/Video matting method. A gradient field based background and
foreground segmentation technique provides a trimap extraction, which is robust to changing light conditions
within semi-transparent objects. Our proposed background subtraction is based on affine transformed gradi-
ent projections of the input and background image and removes the background texture from a given image,
preserving the texture of the foreground objects. The presented automatic trimap extraction method reduces
the manual labor work in extracting and embedding target objects into a new background image or video
sequences and might find its application within the broadcasting or movie industry.
1 INTRODUCTION
Image/Video matting, which deals with an extraction
of foreground objects from a background image by a
pixel with color or opacity segmentation, has been ex-
tensively studied during the last twenty years. Tradi-
tional matting techniques segment the foreground ob-
ject of a given image or video sequences by its color
and opacity characteristics. Current image/video mat-
ting technologies and systems try to efficiently extract
a high degree of mattes from a still image or a video
sequence and is mainly used in the film production
and broadcasting systems. Here, image matting de-
fines a given input image as the composite of a fore-
ground layer and a background layer, and combines
those by using a linear blending of opacity values in
each pixel (Wang et al, 2007). It was first mathemat-
ically defined by (Smith et al, 1996) and models an
observed image I as a convex combination of a fore-
ground image F and a background image B by using
the alpha matte α:
I = αF + (1 α)B (1)
where α is a value within [0,1]. From this equation
(1), the matting process has to solve an inverse prob-
lem with several unknowns, only given three con-
straints. The task of alpha matting is hence, to re-
cover the value of α, B, F at every pixel. To properly
extract meaningful foreground objects from equation
(1), several matting approaches start to segment the
input image into three regions which is referred to as a
trimap: definitely foreground, definitely background,
and unknown regions. If α = 1 or 0, we classify a
pixel within an image as definitely foreground’ or
definitely background’ respectively. One of the im-
portant factors affecting the performance of a matting
algorithm is the accuracy of the trimap. The trimap
reduces the dimension of the solution space for the
matting problem, and leads the matting algorithm to
generate user-desired results.
In this paper, we describe an automatic trimap extrac-
tion methodology by developing an affine transform
of gradient fields between background image and a
given input image for a robust alpha matting process.
Our proposed background subtraction system is ro-
bust in extracting the foreground regions within illu-
mination varying environments caused by changing
light conditions. Figure 1 shows the flowchart for au-
tomatically extracting a trimap from a given image,
based on background subtraction and a closed-form
based alpha matting technique (Levin et al, 2006).
52
Yoon S. and Graf H. (2009).
AUTOMATIC TRIMAP EXTRACTION FOR EFFICIENT ALPHA MATTING BASED ON GRADIENT FIELD TRANSFORMS.
In Proceedings of the Fourth International Conference on Computer Graphics Theory and Applications, pages 52-57
DOI: 10.5220/0001766800520057
Copyright
c
SciTePress
Figure 1: Flowchart of our proposed automatic trimap extraction for efficient alpha matting by transformed gradient fields
using cross-projection tensors.
An affine transform of the cross projection tensors is
computed in order to extract the foreground elements
from a given input image automatically and separates
it into the definitely foreground’, the definitely back-
ground’ and ’unknown regions’.
2 PREVIOUS RESEARCH
In this section, we briefly describe some state of
the art research for image/video matting, automatic
trimap extraction and efficient background subtrac-
tion methods.
2.1 Alpha Matting
An early parametric sampling based matting algo-
rithm was proposed by (Ruzon et al, 2000), whose ap-
proach is based on a manifold, connecting the ”fron-
tiers” of each object’s color distribution. Based on
his approach, Bayesian matting (Chuang et al, 2001)
uses a continuously sliding window for neighborhood
definition, which marches inwards from the fore-
ground and background regions. In order to build
color distributions, the algorithm uses foreground and
background samples additionally to computed F, B
and α values. Thus, every pixel within a neighbor-
hood region will contribute to model the foreground
and background Gaussian. However, the paramet-
ric sampling based matting is weak when the back-
ground color is non-Gaussian. The ”knockout mat-
ting” method (Berman et al, 2000) has been devel-
oped to avoid the disadvantages of parametric sam-
pling matting by a weighted average of known fore-
ground and background pixels. Poisson matting (Sun
et al, 2004) solves a Poisson equation for the matte
by assuming that the foreground and background are
slowly varying compared to the matte. This algorithm
interacts closely with the user by beginning from a
hand-painted trimap offering painting tools to correct
errors within the matte. Defocus matting (McGuire et
al, 2005) works with pulling the mattes automatically
from video sequences captured with co-axial cameras
within different depths of field and plane of focus. In-
stead of requiring a carefully specified trimap, some
recently proposed matting approaches allow the user
to specify a few foreground and background scrib-
bles as user input to extract a matte. However, in
order to reduce user involvement, automatic trimap
extraction, is one of the most important issues in im-
age/video matting. Optical flow based background
subtraction (Chuang et al, 2002) or depth map ex-
traction (Joshi et al, 2007) is suggested to separate
foreground, and background automatically. A re-
cently published spectral matting algorithm (Levin et
al, 2007) can automatically extract a matte from an in-
put image without any user input and provides an ef-
ficient video matting video matting for even complex
scenes, using an optical flow to separate foreground
regions from background image.
2.2 Background Subtraction
The principle of a background subtraction methodol-
ogy is to detect moving objects by building the dif-
ference between the current frame and a reference
frame. A comprehensive overview and in depth lit-
erature review on background subtraction techniques
can be found in (Piccardi et al, 2004). Several meth-
ods for performing background subtraction try to ef-
fectively estimate the background model from tem-
porally trained sequences of images. (Wren et al,
1997) has proposed to model the background inde-
pendently at each pixel which is based on a Gaussian
probability density function. (Stauffer et al, 1999)
extended the uni-modal background subtraction ap-
proach by using an adaptive multi-modal background
subtraction method that modelled the pixel color as
a mixture of Gaussians. (Oliver et al, 2000) used an
eigen space model for background subtraction. Back-
AUTOMATIC TRIMAP EXTRACTION FOR EFFICIENT ALPHA MATTING BASED ON GRADIENT FIELD
TRANSFORMS
53
ground subtraction techniques which combine multi-
ple cues such as color and depth maps are also used
for video surveillance and monitoring system (Barotti
et al, 2003).
3 AUTOMATIC TRIMAP
EXTRACTION
Traditional background subtraction algorithms (Wren
et al, 1997, Oliver et al, 2000, Elgammal et al, 2000,
Barotti et al, 2003, Han, 2004) which are based on
frame differences cannot extract the illumination and
reflectance variable foreground objects and fail to ex-
tract the foreground objects that have similar color
values to the background. However, an image gradi-
ent field based on projection tensors provides a way of
removing scene texture edges from images within the
illumination parameter space. We now show how to
remove the background from a given image by trans-
forming its gradient field using cross projection ten-
sors obtained from a background image of the same
scene. The final trimap for alpha matting is obtained
by a 2D integration of the modified gradient field from
each color channel.
3.1 Background Subtraction from
Transformed Gradient Fields
To extract the foreground regions from a given image,
our proposed background subtraction in the gradient
fields is motivated by (Agrawal et al, 2006) whose
method removes the scene texture which is corre-
sponding to the background. From a given image I
the smoothed structure tensor, G
σ
of the gradient im-
age I is defined as the convolution of the gradient
field and Gaussian kernel. The smoothed structure
tensor with Gaussian kernel reduces the local noise
and aliasing. Equation (2) shows the structured tensor
from the gradient field and Gaussian kernel.
G
σ
= (II
T
) K
σ
=
g
2
x
g
x
g
y
g
x
g
y
g
2
y
K
σ
(2)
where * denotes the convolution and K
σ
is a normal-
ized 2D Gaussian kernel of variance σ. Equation (2)
can be decomposed as eigenvectors and eigenvalues
of the input and background image as shown in equa-
tion (3) in order to extract the local intensity structure
within the image (Aubert et al, 2002).
G
σ
= [v
1
v
2
]
λ
1
0
0 λ
2
v
T
1
v
T
2
(3)
(a)Input image (b)Background Image
(c) D
11
of D (d) D
12
of D (e) D
22
of D
Figure 2: (a) is a given input image for alpha matting (b)
a background image. (c), (d), and (e) are components of a
diffusion tensor to extract a foreground object.
where v
1
, v
2
denote the eigenvectors corresponding to
the eigenvalues λ
1
, λ
2
respectively and λ
1
λ
2
. The
eigenvalues and eigenvectors of G
σ
provide informa-
tion about the local intensity structures within the im-
age.
D =
D
11
D
12
D
21
D
22
= [u
1
u
2
]
µ
1
0
0 µ
2
u
T
1
u
T
2
(4)
The field of diffusion tensors at each pixel can be de-
scribed through a 2 × 2 symmetric, positive definite
matrix (Weickert, 2004). The diffusion tensors D by
selecting its eigenvectors u
1
, u
2
and eigenvalues µ
1
, µ
2
based on eigenvalues and eigenvectors of G
σ
. D is
then obtained as equation (4). Based on the cross pro-
jection tensor of the gradient field between the input
and background image, we suppress the texture infor-
mation of the background, still preserving the texture
of the background. With extracting the transformed
gradient field using cross projection tensors, we inte-
grate the transformed gradient field of each channel
of the image.
From the background of gradient fields and diffusion
function, we will focus on gradient projection of a
given image, I
o
, and background image,I
b
. By trans-
forming I
o
with cross projection tensor D
OB
, which
is the diffusion tensor between I
o
and I
b
, we can
remove all edges from I
o
which are in I
b
and retain all
edges in I
o
which are not in I
b
. Thus, scene texture
edge from an input image can be removed by trans-
forming its gradient field using cross projection ten-
sors obtained from a background image of the same
scene. If we define an input image, I
o
and an back-
ground image, I
b
, the smoothed structure tensors of
each image are denoted as G
o
σ
and G
b
σ
, respectively.
Figure 2-(a) and (b) show the input image and the
background image. Figure 2-(c),(d), and (e) show the
GRAPP 2009 - International Conference on Computer Graphics Theory and Applications
54
extracted D
11
, D
12
= D
21
, and D
22
of the cross pro-
jection tensors.
3.2 Trimap Extraction from
Background Subtraction
From the cross projected tensors for each color chan-
nel, we extract the trimap of the input image for an
efficient alpha matting. Extracted foreground regions
based on gradient fields include definitely foreground
and unknown region by illumination or intricately
shape of the target object in the input image is the
difference of the gradient field between the input im-
age and the affine transformed gradient field by tensor
projection. The definitely foreground region based
on the roughly extracted foreground is separated into
the adaptive threshold that represents the channel dif-
ference and gradient difference. Figure 3 shows the
extracted trimap from the input image in the case of
fluid and fire image. Within the trimaps which are
shown in figure 3, we assigned to definitely back-
ground element as 0, definitely foreground elements
are assigned to 255, and unknown regions to 128. As
shown in figure 3, foreground regions which are ex-
tracted from the background subtraction are separated
as definitely foreground and unknown regions.
4 CLOSED-FORM MATTING
From the various image/video matting algorithms
from an image or video sequences, we tested a closed-
form based alpha matting with our automatically ex-
tracted trimap. The closed-form matting algorithm
which was recently presented by (Levin et al., 2006)
is based on explicitly deriving a cost function from lo-
cal smoothness assumptions on foreground and back-
ground. It eliminates F and B, yielding a quadratic
cost function in alpha, which can be easily solved as a
sparse linear system of equations. The mathematical
alpha matting equation which is represented in equa-
tion (1) can be re-written if each F and B is a linear
mixture of two colors over a small window around
each pixel as:
α =
c
a
c
I
c
i
+ b, i w (5)
where equation (5) is referred to as linear color chan-
nels, and a
c
and b are constants within the window w.
The matting cost function J is only dependent on α as
shown in equation (6).
J(α, a, b) =
jI
(
iw
j
(α
i
c
a
c
j
I
c
i
b
j
)
2
+ ε
c
a(c
2
)
j
) (6)
Figure 3: Extracted trimap from efficient background sub-
traction based on gradient fields in the case of fluid and fire.
From the equation (5), a
c
and b can be eliminated
from the cost function J, yielding a quadratic cost in
the α alone:
J(α) = α
T
Lα (7)
here L is an N × N operator, whose (i, j)-th element
denoted to:
k|(i, j)w
k
(δ
(
i j)
1
|w
k
|
(1 + (I
i
µ
k
)(Σ
k
+
ε
|w
k
|
I
3
)
1
)(I
i
µ
k
)) (8)
where Σ
k
is a 3 × 3 covariance matrix, µ
k
is a 3 × 1
mean vector of the colors in window w
k
, and I
3
is
the 3 × 3 identity matrix. The operator L, which is
called the Matting Laplacian, is the most important
analytic result from this approach. The affinity de-
fined in Equation (6) and the one defined in Equation
(8) share the same property that nearby pixels with
similar colors have high affinity values, while nearby
pixels with different colors have small affinity values.
Figure 4 shows the recovered alpha values and em-
bedding into a new background. As shown in figure
4, the illuminated area within the original target ob-
ject is represented in a new background and the semi-
transparent object, e.g. fire, that contains many holes
are also robustly embedded into a new background
image.
5 EXPERIMENTS
We have implemented the alpha matting algorithm
from the gradient fields and conducted some exper-
iments on a standard PC with Pentium4 1.2GHz in
order to show the efficiency and robustness for our
AUTOMATIC TRIMAP EXTRACTION FOR EFFICIENT ALPHA MATTING BASED ON GRADIENT FIELD
TRANSFORMS
55
Figure 4: Alpha extraction and matting in a new background.
Figure 5: Comparison of alpha matting between our proposed method and previous matting with depth map(McGuire et al,
2006). Left image is the original input image, the center image is the foreground.
automatic trimap extraction and alpha matting by
comparing it to previous research of background
subtraction, trimap extraction, and alpha matting
method. Our first benchmark computes the back-
ground subtraction method based on a kernel density
estimation (Han, 2004). Based on the original image
in figure 5 (left), we show the results of our work
within the gradient field domain (center) and the
extracted foregrond objects by (Han, 2004)(right).
The light reflected area within the image is well
reconstructed in our approach compared to the other.
We lead the alpha matting experiment from an au-
tomatically extracted trimap. Especially trimap ex-
traction of the semi-transparent object such as fire or
fluids is difficult because there are many holes within
the interior of the target object. Figure 6 shows the
alpha matting of our proposed method and a pre-
vious alpha matting method based on dual cameras
(McGuire, 2006).
6 CONCLUSIONS AND FUTURE
WORKS
Within this paper, we have presented an automatic
trimap extraction using a new form of background
subtraction which is robust within illumination vari-
ant environments. The extracted trimap is applied
to a closed-form based alpha matting process as we
would like to show, that an automatic trimap extrac-
tion within a gradient field is more efficient than other
background subtraction methods which are based on
building frame differences or applying optical flow
techniques to semi-transparent objects in view of
changing illumination and/or light reflectance. Our
experiments proved that the proposed method is very
robust in alpha matting of semi-transparent objects.
Physical phenoma, such as fire or fluid, have been
chosen as sample data due to their nature in time vary-
ing shape changes, comprising large wholes and un-
known regions within their inner shape, and lighting
characteristics. We are confident that this approach
could reduce the labor work in extracting and embed-
ding target objects into a new background image or
video sequences.
However, a closed-form based alpha matting pro-
cess which follows our automatically extracted trimap
extraction still takes too much time when the un-
known regions in a given image are large. We will
thus transfer our proposed algorithm into a GPU envi-
ronment in order to reduce the processing time by par-
allel calculation of the gradient fields and small neigh-
borhood window processing of unknown regions for
an efficient and fast alpha extraction.
GRAPP 2009 - International Conference on Computer Graphics Theory and Applications
56
Figure 6: Comparison of background subtraction between gradient field based affine transform and kernel density estimation.
Gradient field based background subtraction is more efficient in the area of illumination change or light’s semi-transmission.
ACKNOWLEDGEMENTS
We have experiments with the fire image data from
the Informatik, Max-Plank Institute (Ihrke et al, 2005)
and fluid image data of the department of computer
science at Williams College (McGuire et al, 2006).
REFERENCES
Agrawal, A., Raskar, R., Chellappa, R., 2006. Edge sup-
pression by gradient field transformation using cross-
projection tensors. In proceeding of IEEE CVPR
Aubert, G., Kornprobst, G., 2002. Mathematical Problems
in Image Processing: Partial Differential Equations
and the Calcuus of Varations. Applied Mathematical
Series, Springer-Verlag.
Barotti, S., Lombardi, L., Lombardi, P., 2003. Multi-
Module Switching and Fusion for Robust Video
Surveillance. In proceeding of IEEE International
Conference on Image Analysis and Processing.
Berman, A., Vlahos, P., Dadourian, A., 2000. Comprehen-
sive method for removing from an image the back-
ground surrounding. U.S. Patent.
Chuang, Y.-Y., Curless, B., Salesin, D. H., Szeliski, R.,
2001. A bayesian approach to digital matting. In pro-
ceeding of IEEE CVPR.
Chuang, Y., Agarwala, A., Curless, B., Szeliski, R., 2002.
Video matting of complex scenes. ACM Transaction
Graph.
Elgammal, A., Harwood, D., Davis, L., 2000. Non-
parametric model for background subtraction. In pro-
ceeding of ECCV.
Han, B., Comaniciu, D., Davis, L., 2004. Sequential ker-
nel density approximation through mode propagation:
applications to background modeling. In proceeding
of ACCV.
Ihrke, I., Magnor, M., 2005. GrOVis-Fire: A
Multi-Video Sequence for Volumetric Reconstruc-
tion and Rendering Research. http://www.mpi-
sb.mpg.de/ihrke/Projects/Fire/GrOVis Fire/
Joshi, N., Matusik, W., Freeman, W. F., 2007. Explor-
ing Defocus Matting: Non-Parametric Acceleration,
Super-Resolution, and Off-Center Matting . IEEE
Computer Graphics and Applications.
Levin, A., Linschinski, D., Weiss, Y., 2006. A closed form
solution to natural image matting. In proceeding of
IEEE CVPR.
Levin, A., Rav-Acha, A., Lischinski, D., 2007. Spectral
Matting. In proceeding of IEEE CVPR.
McGuire, M., Matusik, W., Yerazunis, W., 2006. Practi-
cal, Real-time Studio Matting using Dual Imagers. In
European Symposium on Rendering.
McGuire, M., Matusik, W., Pfister, H., Hughes, J, F., Du-
rand, F., 2005. Defocus Video Matting. In proceeding
of ACM SIGGRAPH.
land, A. P., 2000. A bayesian computer vision system for
modeling human interactions. IEEE Trans. on PAMI.
Piccardi, M., 2004. Background subtraction techniques:
a review. IEEE International Conference on System,
Man, and Cybernetics.
Ruzon, M., Tomasi, C., 2000. Alpha estimation in natural
images. In proceeding of IEEE CVPR.
Smith, A., Blinn, J., 1996. Blue screen matting. In proceed-
ing of ACM SIGGRAPH.
Stauffer, C., Grimson, W., 1999. Adaptive background mix-
ture models for real-time tracking. In proceeding of
IEEE CVPR.
Sun, J., Jia, J., Tang, C.-K., Shum, H.-Y., 2004. Poisson
matting. In proceeding of ACM SIGGRAPH.
Tschumperle, D., 2002. PDE’s based Regularization of
Multivalued Images and Applications. PhD Thesis,
Universite de Nice-Sophia Antipolis.
Wang, J., Cohen, M. F., 2007. Image and video matting: a
survey. Foundations and Trends in Computer Graph-
ics and Vision.
Weickert, J., 1997. A review of nonlinear diffusion filtering.
In Scale-Space Theory in Computer Vision, Springer.
Wren, C., Azarbayejani, A., Darrell, T., Pentland, A. P.,
1997. Pfinder: real-time tracking of the human body.
IEEE Trans on PAMI.
Zomet, A., Peleg, S., 2002. Multi-sensor super resolution.
IEEE workshop on Application of Computer Vision.
AUTOMATIC TRIMAP EXTRACTION FOR EFFICIENT ALPHA MATTING BASED ON GRADIENT FIELD
TRANSFORMS
57