A SVD BASED IMAGE COMPLEXITY MEASURE

David Gustavsson, Kim Steenstrup Pedersen and Mads Nielsen

Department of Computer Science, University of Copenhagen , Universitetsparken 1, DK-2100 Copenhagen, Denmark

Keywords:

Image complexity measure, Geometry, Texture, Singular value decomposition, SVD, Truncated singular value

decomposition, TSVD, Matrix norm.

Abstract:

Images are composed of geometric structures and texture, and different image processing tools - such as

denoising, segmentation and registration - are suitable for different types of image contents. Characterization

of the image content in terms of geometric structure and texture is an important problem that one is often faced

with. We propose a patch based complexity measure, based on how well the patch can be approximated using

singular value decomposition. As such the image complexity is determined by the complexity of the patches.

The concept is demonstrated on sequences from the newly collected DIKU Multi-Scale image database.

1 INTRODUCTION

Images contain a mix of different types of informa-

tion, from highly stochastic textures such as grass and

gravelto geometric structures such as houses and cars.

Different image processing tools are suitable for dif-

ferent type of image contents and most tools are very

image content dependent. The deﬁnition of what is

texture and geometry is not particularly agreed upon

in the computer vision community. Our hypothesis

is that the separation between geometry and texture

is deﬁned through the purpose of the method and the

scale of interest. What may be considered an unim-

portant structure / texture in one application may be

considered important in another.

For example, segmentation of an image contain-

ing objects with clear geometric structures forming

boundaries calls for edge-based or geometry-based

methods such as watersheds (Olsen and Nielsen,

1997), the Mumford-shah model (Mumfordand Shah,

1985), level sets (Sethian, 1999), or snakes (Kass

et al., 1988). While segmentation of an image con-

taining objects only discernable by differences in tex-

ture calls for texture based segmentation methods

(Randen and Husoy, 1999). That is, the type of ob-

jects we are attempting to segment deﬁnes our scale

of interest, i.e. what type and scale of structure we

include in the model of a segment.

In denoising an image containing geometric struc-

tures calls for e.g. an edge preserving method such as

anisotropic diffusion (Weickert, 1998) or total varia-

tion image decomposition (Rudin et al., 1992). For

images containing small scale texture, a patch based

denoising method such as non-local mean ﬁltering

may be more appropriate (Buades et al., 2008). Again

we see that depending on the purpose we include

structures at ﬁner scales into the model of the prob-

lem as needed.

As a ﬁnal example, we mention that total varia-

tion (TV) image decomposition, and other functional

base methods, are very successful for inpainting im-

ages containing geometric structures (Chan and Shen,

2005). Unfortunately the functional based methods

fails to faithfully reconstruct regions containing small

scale structures, however texture based methods man-

age to reconstruct such images (Efros and Leung,

1999; Criminisi et al., 2004; Gustavsson et al., 2007;

Cuzol et al., 2008). In the functional approaches the

focus is solely on large scale structures or geometry,

whereas in the texture methods small scale texture is

included in the model.

Prior knowledge about the methods and the image

content are therefore essential for successfully solv-

ing a task. A natural question is: ”For a given type of

images, which type of methods are suitable?” Often

one wants to characterize the methods by analyzing

the type of images that it is (un)suitable for. To be

able to characterize the methods in this way, the im-

ages must be characterized with respect to the image

contents. An image complexity measure is needed,

i.e. a measure that quantify the image contents with

respect to geometric structure and texture or scale of

interest.

A patch based complexity measure using Singular

Gustavsson D., Steenstrup Pedersen K. and Nielsen M. (2009).

A SVD BASED IMAGE COMPLEXITY MEASURE.

In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 34-39

DOI: 10.5220/0001785400340039

 SciTePress

Value decomposition (SVD) is presented. The com-

plexity for the patch is determined by the number of

singular values that are required for good approxima-

tion - the matrix rank of a good approximation. The

number of singular values that are required for ap-

proximating an image patch is used for characteriz-

ing the patch content. The global complexity measure

for the image is computed as the mean complexity of

all patches in the image. The proposed complexity

measure is evaluated on the baboon image and on the

newly collected DIKU Multi-Scale image sequence

database.

2 COMPLEXITY MEASURE

In the following section images are viewed as matri-

ces, hence the image complexity measure transforms

into a matrix complexity measure. Basic matrix prop-

erties are used extensively in the following section,

which can be found in e.g. (Golub and Loan, 1996).

One obvious approach is to approximate a matrix A

with a simpler matrix A

and measure the error (resid-

ual) between the original matrix A and the approxima-

tion A

. Here k is a parameter used for computing the

approximation A

. We assume that, as the parameter

k increases the error between A and A

decrease (or

at least not increase) and as k → ∞ the error becomes

0. The approximation A

should also be simpler than

A. To be able to use this approach, an error measure

between matrices and a matrix complexity measure

must be deﬁned.

2.1 Error Measure - Matrix Norms

To measure the difference between the original im-

age A and a simpler approximation A

of I, it is nat-

ural to use a matrix norm kA− A

k. One of the most

commonly used matrix norms is the Frobenius norm

(which corresponds to the L

-norm). Let A be a m×n

matrix with elements a

, the Frobenius norm of A is

deﬁned as

kAk

= (

∑

j=1

∑

i=1

)

. (1)

Another common type of matrix norms are the so-

called induced matrix norms. Let A be a m× n ma-

trix and x ∈ R

a colon vector (i.e x = (x

, ··· ,x

)

the matrix norm induced by the vector norm kxk is

deﬁned as

kAk = sup

kxk=1

kAxk

kxk

(2)

(or in words the smallest number α such that

kAxk

kxk

≤ α

for all x). The matrix norm is here deﬁned in terms of

a vector norm kxk. The induced matrix norm can be

viewed as how much the matrix A expands the vec-

tors and is actually an operator norm. Different vector

norms can be used to induce different matrix norms,

most common are the p-norms deﬁned as

kxk

= (

∑

i=1

| x

)

(3)

and especially the 2-norm kxk

= (x

. The matrix

norm induced by the 2-norm is

kAk

= sup

kxk

kAxk

kxk

(4)

Both the The Frobenius matrix norm and the matrix

2-norm are invariant under orthogonal transformation

and will be used in the following sections.

2.2 Matrix Complexity Measure -

Matrix Rank

Given a matrix A, a simpler matrix approximation A

of A should be constructed. But ﬁrst one must deﬁne

what ’simpler’ means. A natural approach to quan-

tify complexity of a matrix is by the rank of the ma-

trix, and a simpler approximation of a matrix can be

viewed as a matrix with lower rank.

Let A be a m × n matrix then the rank of A can be

viewed as the dimension of the subspace spanned by

the columns of A = (a

, ··· ,a

rank(A) = dim( span{a

, ··· ,a

} ). (5)

2.3 Optimal Rank k Approximation

It is well known from matrix theory that a m× n ma-

trix A can be decomposed into

A = UΣV

(6)

where U is a m × m orthogonal matrix, V is a n × n

orthogonal matrix and Σ is a m × n diagonal matrix

with elements σ

, ··· ,σ

where l = min{m, n}. This

is the so-called Singular Value Decomposition (SVD),

where the σ

:s are called singular values and the col-

umn vectors u

and v

, of U and V are called sin-

gular vectors. The entries in Σ is ordered such that

≥ σ

≥ ··· ≥ σ

≥ 0.

Using the fact that the Frobenious norms are invariant

under multiplication by orthogonal matrices gives

kAk

= kΣk

∑

i=1

(σ

)

. (7)

A SVD BASED IMAGE COMPLEXITY MEASURE

Let Σ

be the m × n matrix containing the k largest

singular values on the diagonal and let

= UΣ

. (8)

is the so-called Truncated Singular Value Decom-

position (TSVD) approximation of A where the ﬁrst

k singular values are used, and if rank(A) ≥ k then

rank(A

) = k. The image approximation residual is

deﬁned as A − A

and if, again, rank(A) ≥ k then

rank(A− A

) = rank(A) − k.

The reconstruction error or the residual error for the

Frobenious norm is

kA− A

= (

∑

i=k+1

(σ

(i)

)

(9)

and for the 2-norm

kA− A

= σ

k+1

. (10)

The rank(A

) ≤ rank(A), so A

is simpler in the sense

that its’ rank is not larger (and usually the rank is

lower). Furthermore A

is the best rank − k approxi-

mation of A in the sense that

= arg min

rank(B)=k

kA− Bk

(11)

So any matrix B with rank k has at least as large re-

construction error using the 2-norm as A

. A

is also

the best rank k approximation using the Frobenious

norm. Singular Value Decomposition can be viewed

as a method for ﬁnding the optimal basis and is re-

lated to other optimal basis methods such as Indepen-

dent Component Analysis (ICA) (Hyv¨arinen, 1999)

and Karhunen-Lo´eve Expansion (Kirby, 2000).

There are two possibilities to compare images by

comparing the norm of the residual. Either the num-

ber of singular values, k, are ﬁxed and the reconstruc-

tion error kA

− Ak using k singular values are com-

pared. The other possibility is to keep the reconstruc-

tion error ﬁxed, σ

err

, and use as many singular val-

ues that are required for the reconstruction error to be

lower than σ

err

. Either the rank k or the reconstruc-

tion error σ

err

is kept ﬁxed.

Let k

be the number of singular values that should be

used in the reconstruction. The residual error (using

either the 2-norm or Frobenious norm) is

kA− A

k = σ

err

(12)

and σ

err

is called the singular value reconstruction er-

ror using k

singular values.

Let σ

err

be a ﬁxed reconstruction error and let k be the

smallest integer such that

kA− A

k ≤ σ

err

(13)

k is called the singular value reconstruction index

(SVRI) at level σ

err

. The SVRI state the smallest

number of singular values that are required to get a re-

construction with a reconstruction error smaller than

err

2.4 Global Measure

Instead of computing an approximation of the full im-

age, which is not feasible for high resolution images,

a patch based approach is adopted. The singular value

reconstruction error at level σ

err

is computed for each

p× p patch in the image.

Based on the patch complexities an image complexity

measure should be computed. The obvious candidate

is the mean or the mode complexity computed over

all patches in the image. The mean patch complex-

ity is used as the complexity measure for the image.

The interpretation of the mean, is simply the average

number of singular values that are required for an ap-

proximation, such that the reconstruction error is less

than σ

err

, of the patches in the image.

Figure 1: Image sequences - 02, 05 and 08 - from the DIKU

Multi- Scale image database (used in the experiments) at

three capture scales.

3 DIKU MULTI-SCALE IMAGE

DATABASE

The newly collected DIKU Multi-Scale image

database (Gustavsson et al., 2009), contains se-

quences of the same scene captured using varying fo-

cal length - called capture scales -, will be used to

analyze the distribution of singular values in natu-

ral image patches and analyze how the image content

changes over different capture scales.

The database contains sequences of natural images

- both man-made and natural environment - with a

large variety of scenes and distances to the main ob-

ject in the scene. Each sequence contains 15 high res-

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

olution images of the same scene captured using dif-

ferent focal length. The zoom factor is roughly 16x

and the naming convention is that image 1 is the least

zoomed and 15 the most zoomed. Three examples of

sequences are shown in ﬁgure 1.

Furthermore, the part of the scene that is present at

all capture scales has been extracted, resulting in a se-

quence of region containing the same part of scene

captured at different capture scales. The part of the

scene present in the image to the right in ﬁgure 1,

has been extracted from the remaining 14 images (of

which two are shown in the ﬁgure).

Three sequences - 02 building with windows, 05

building without windows and 08 tree trunk - shown

in ﬁgure 1 are used in the experiments. The image

contents are very different on the different capture

scales that can be seen in the 80×80 extracted patches

shown in ﬁgure 2. For example, in the most zoomed

image a brick is almost covering the whole 80 × 80

patch, while in the least zoomed image a large part of

the brick wall is contained in the patch. (The 80 × 80

patches are only shown for visualization of the con-

tents differences, while the complete regions are used

in the experiments.)

4 SINGULAR VALUE

DISTRIBUTION IN NATURAL

IMAGES

The proposed method depends on the distribution of

singular values in natural image patches. The distri-

bution of principal component and independent com-

ponents in natural images has received a lot of atten-

tion for some years, partly because its relation to the

front-end vision (Van Hateren and vad der Schaaff,

1998).

To analyze the distribution of singular values in natu-

ral image patches, 1000 randomly selected 25 × 25

patches from each image in the DIKU Multi-Scale

image database have been selected - approximately

800000 patches - and the corresponding singular val-

ues have been computed.

The ﬁrst, not so surprising, conclusion is that patches

in natural images almost always have full rank - i.e.

the singular values are almost always strictly larger

than 0.

The distribution of singular values σ

and σ

are

shown in ﬁgure 4. The variance for the distribution of

is large, and it is interesting that many patches have

values close to 25. The distribution for σ

is peaked

at zero but also have ’heavy tails’ - values relatively

far from zero. This is also the case for σ

where i > 2.

Figure 2: 80×80 patches extracted from the three sequence

shown in ﬁgure 1 at 3 different scales (index 1, 6 and 15).

The patches show the contents different at the different cap-

ture scales.

In ﬁgure 3 the patches with the largest σ

(top)

and smallest σ

(bottom) in ﬁve different images

are shown. The contents difference in the different

patches are striking - the patches with the largest σ

all contain large variations, while the patches with the

lowest σ

contain no or very little visible variations.

Figure 3: Each column show the patch with the largest (top)

and smallest (bottom) σ

in the same image. The content

difference is striking and clearly indicate the importance for

the small singular values for characterize the image content.

The distribution of the small singular values are

peaked at zero, but also show some variation and

’heavy tails’. Visual comparison of patches with high

and low σ

clearly indicates a content difference,

which implies that singular value reconstruction in-

dex is suitable for measuring image content.

5 EXPERIMENTS

5.1 The Baboon Image

The baboon image is used only for demonstrating the

method. The baboon is a good test image because it

A SVD BASED IMAGE COMPLEXITY MEASURE

0 5 10 15 20 25

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Distribution σ

0 1 2 3 4 5 6 7

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Distribution σ

Figure 4: The distribution of singular values σ

and σ

for

natural images patches of size 25× 25. The variance for the

distribution of σ

is large (as expected), the distributions for

is peaked at zero but also have ’heavy-tails’.

contains both very complex texture and large regions

with geometric structures. In ﬁgure 5 the spatial dis-

tribution of complexity is shown using different patch

sizes and error levels. White regions indicating high

complexity and black indicating low complexity. The

highly stochastic texture returns high complexity val-

ues at all scales and error levels, while the geomet-

ric structures return low complexity. As the patch

size grows larger the spatial distribution of complex-

ity gets smoother.

5.2 DIKU Multi-Scale Image Database

The image complexity measure is computed over the

different capture scales using different patch sizes and

error levels. The results are shown in ﬁgure 6.

The plot to the left and right, in ﬁgure 6, has the same

error level 0.35, but different patch sizes, 15 respec-

tive 25 pixels. Still the shape of the curves are very

similar. On the other hand the plot in the middle and

to the right have same patch sizes - 25 pixels -, but

different error level - 0.05 and 0.35 - and the curves

are very different which indicate that the error level is

more important than the patch size.

For sequence 02 the complexity at error level 0.05

ﬁrst decreases roughly for the ﬁrst 7 capture scales,

Figure 5: Patch based complexity measure of the baboon

image. Different patch size are used in the colon, from left

to right, the sizes are 9,15 and 25 pixels, and different re-

construction errors are used in the rows, from top to bottom,

0.1, 0.3, and 0.5.

and then increases for the last 7 capture scales. For

sequence 08 the complexity at error level 0.05 de-

crease quite rapidly at the ﬁrst scales and then de-

creases slower for the remaining capture scales. For

sequence 05 the complexity decreases with increasing

capture scale.

The average number of singular values required for an

approximation at a ﬁxed error level varies a lot over

the different capture scale. This indicate that the con-

tents in terms of complexity, change over the capture

scales which is clearly visiable from ﬁgure 2.

6 CONCLUSIONS

A patch based image complexity measure based on

the number of singular values that are required to ap-

proximate a patch at a given error level is presented.

The number of singular values is used to character-

ize the image content in terms of geometric structures

and texture.

The proposed method is motivated by the optimal

rank-k property of the truncated singular value ap-

proximation. The distribution of singular values in

patches from natural images seems to be peaked at

zero and have ’heavy-tails’. The image content in

patches with relatively large smallest singular value

are very different from the patches with relatively

small smallest singular value.

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

0 5 10 15

Captured Scale

Image Complexity

seq 02

seq 05

seq 08

0 5 10 15

Captured Scale

Image Complexity

seq 02

seq 05

seq 08

0 5 10 15

Captured Scale

Image Complexity

seq 02

seq 05

seq 08

Figure 6: Complexity measure (y-axis)computed over different capture scales (x-axis) using different patch sizes and error

levels. From left to right: patch size 15 and σ

err

= 0.05, patch size 25 and σ

err

= 0.35, and patch size 15 and σ

err

= 0.05.

ACKNOWLEDGEMENTS

This research was funded by the EU Marie Curie Re-

search Training Network VISIONTRAIN MRTN-CT-

2004- 005439 and the Danish Natural Science Re-

search Council project Natural Image Sequence Anal-

ysis (NISA) 272-05-0256. The authors wants to thank

prof. Christoph Schn¨orr (Heidelberg University) and

PhD. Niels-Christian Overgaard (Lund University)

for sharing their knowledge.

REFERENCES

Buades, A., Coll, B., and Morel, J.-M. (2008). Nonlocal

image and movie denoising. IJCV, 76(2):123–139.

Chan, T. F. and Shen, J. (2005). Variational image inpaint-

ing. Communications on Pure and Applied Mathemat-

ics, 58.

Criminisi, A., P´erez, P., and Toyama, K. (2004). Region

ﬁlling and object removal by exemplar-based image

inpainting. IEEE IP, 13(9):1200–1212.

Cuzol, A., Pedersen, K. S., and Nielsen, M. (2008). Field

of particle ﬁlters for image inpainting. JMIV, 31(2-

3):147–156.

Efros, A. A. and Leung, T. K. (1999). Texture synthesis by

non-parametric sampling. In ICCV, pages 1033–1038.

Golub, G. H. and Loan, C. F. V. (1996). Matrix Computa-

tions. Johns Hopkins, 3rd edition.

Gustavsson, D., Pedersen, K. S., and Nielsen, M. (2007).

Image inpainting by cooling and heating. In SCIA 07,

pages 591–600.

Gustavsson, D., Pedersen, K. S., and Nielsen, M. (2009). A

multi-scale study of the distribution of geometry and

texture in natural images. In Preparation.

Hyv¨arinen, A. (1999). Survey on independent component

analysis. Neural Computing Surveys, 2:94–128.

Kass, M., Witkin, A., and Terzopoulos, D. (1988). Snakes:

Active contour models. IJCV, (4):321–331.

Kirby, M. (2000). Geometric Data Analysis. John Wiley &

Sons, Inc., New York, NY, USA.

Mumford, D. and Shah, J. (1985). Boundary detection by

minimizing functionals. In CVPT, pages 22–26.

Olsen, O. F. and Nielsen, M. (1997). Multi-scale gradi-

ent magnitude watershed segmentation. In ICIAP 97,

pages 6–13.

Randen, T. and Husoy, J. H. (1999). Filtering for tex-

ture classiﬁcation: A comparative study. IEEE PAMI,

21(4):291–310.

Rudin, L. I., Osher, S., and Fatemi, E. (1992). Nonlinear

total variation based noise removal algorithms. Phys.

D, 60(1-4):259–268.

Sethian, J. A. (1999). Level Set Methods and Fast Marching

Methods. Cambridge University Press.

van Hateren, J. H. and vad der Schaaff, A. (1998). Inde-

pendent component ﬁlters of natural images compared

with simple cells in primary visual cortex. Proc. Royal

Soc. Lond. B, 265:359–366.

Weickert, J. (1998). Anisotropic Diffusion in Image Pro-

cessing. ECMI. Teubner-Verlag.

A SVD BASED IMAGE COMPLEXITY MEASURE