EFFICIENT IMAGE REDUCTION FOR FAST INTELLIGIBLE
CLASSIFICATION
Marc Joliveau
CIRRELT, Universit
´
e de Montr
´
eal, C.P. 6128, succ. Centre-Ville, Montr
´
eal, Qu
´
ebec, H3C 3J7, Canada
Keywords:
Image reduction, Image learning, Classification, Dimensionality problem, Image databases.
Abstract:
In the past decades, many domains collected great amounts of data, particularly multimedia files, and stored
them in large databases. Therefore, area such as similarity search for image learning have received much
attention in the recent years. This paper presents an innovative way to strongly reduce dimension and keep
relations between components of an image data set. Our method is validated on the Mnist learning database
containing 70000 pictures of handwritten digits. Results demonstrate that the proposed approach is very
efficient. It allows to accurately classify, learn, and identify digits using very short computation time in
comparison with those obtained with original full-size images.
1 INTRODUCTION
For the past few decades, lots of application domains
collected large amounts of data, resulting in very large
databases. Among them, multimedia files like sounds,
videos or images received lots of interest. These files,
and specifically images, interact with a large variety
of domains like medicine, traffic, security, etc. Aim is
generally to automatically identify elements shown in
images to label and classify them. However, the great
complexity of multimedia files complicates compar-
isons and similarity searches between them.
This paper introduces an innovative way to strongly
reduce dimension of images stored in a database,
while conserving relations between them. Proposed
method is based on the adaptation of a spatio-
temporal dimension reduction method called Space-
Time Principal Component Analysis (STPCA), that
has been introduced in (Joliveau, 2008).
STPCA method has already been accurately exploited
in several contexts with various objectives. First,
in (Joliveau and DeVuyst, 2007) authors proposed
a variant of STPCA that allows for the handling of
missing data with a very weak loss of accuracy. Dif-
ferent applications of STPCA with sensor-based traf-
fic data to interact with Intelligent Transportation Sys-
tems (ITS) have been presented in (Joliveau and De-
Vuyst, 2008; Joliveau, 2008; Bauzer-Medeiros et al.,
2008; Bauzer-Medeiros et al., 2009). More pre-
cisely, STPCA method is used for the derivation of
both typical and atypical spatio-temporal patterns that
emit accurate predictions on traffic atypical behaviour
through a network.
STPCA allows both to reduce dimension and to sum-
marize data. However, until now, mostly summariz-
ing abilities of the method (i.e., the so-called STPCA
estimate denoted by
ˆ
X
n
) have been exploited. This
paper proposes the adaptation of STPCA to databases
that contain images in order to benefit from its accu-
racy of dimensionality reduction, which is provided
by the STPCA very low-dimension descriptor (i.e.,
the so-called reduced-order matrix denoted by
˜
X
n
).
To validate our approach, experiments are performed
on the Mnist data set that contains 70000 images of
handwritten digits.
The paper is organized as follows. Section 2 gives
a brief reminder of STPCA and explains adaptation
of this spatio-temporal method to image processing.
Then, section 3 presents numerical experiments on
Mnist grayscale handwritten images data set. This
section proves the high accuracy of our method in re-
ducing dimension of images while conserving enough
relations between them. It enables us to classify pic-
tures and identify digits drawn on them. Section 4
briefly presents related methods and discusses pos-
sible combinations with our proposed approach. Fi-
nally, section 5 concludes the paper and indicates fur-
ther research directions.
157
Joliveau M. (2010).
EFFICIENT IMAGE REDUCTION FOR FAST INTELLIGIBLE CLASSIFICATION.
In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Artificial Intelligence, pages 157-162
DOI: 10.5220/0002691801570162
Copyright
c
SciTePress
2 SPACE-TIME PRINCIPAL
COMPONENT ANALYSIS
In (Joliveau, 2008), author introduced the Space-Time
Component Analysis (STPCA), a new method to de-
velop descriptors of spatio-temporal data series. This
method is based on the simultaneous application of a
Principal Component Analysis (PCA) (Jolliffe, 1986)
to both spatial and temporal dimensions. A brief sum-
mary of the method and its adaptation to image reduc-
tion is presented in this section.
2.1 Data Conditions to Process Images
with STPCA
To be applied, STPCA method needs some conditions
on the data to be respected. The method was first
introduced in spatio-temporal data reduction context
to process traffic sensor-based time series. In this
context, STPCA assumes that there are data measured
at I different fixed locations (e.g., georeferenced sen-
sor), available for N realizations (e.g., days), and
that data are collected at the same frequency for all
locations on a time duration J. Measurements can
thus be stored into a matrix X
n
, where n symbolizes
the realization index. Each line of X
n
corresponds to
a sensor and each column to a time of measurement.
In order to apply STPCA to a grayscale image data
set, we consider that each realization n corresponds to
an image. Each image can easily be represented by a
matrix X
n
containing the shade of gray of each pixel.
Thus, when applying method to images, I represents
the number of rows in the image, J its number of
columns, and N the number of images in the data set.
As STPCA needs all realization matrices X
n
to have
the same size, the only condition to apply STPCA to
an image data set is that all images have to share the
same resolution.
2.2 STPCA Definition
The STPCA method can be decomposed in three
steps:
1. Assemble realization matrices horizontally (for
realizations row analysis) in a single matrix Y,
and vertically (for realizations column analysis)
in a matrix Z. According to this definition, ma-
trix Y represents a big image corresponding to the
concatenation of all data set images one beside the
other, while matrix Z represents a big image cor-
responding to the concatenation of all data set im-
ages on top of each other. Matrix Y thus contains I
rows and (J ×N) columns, whereas matrix Z con-
tains (I × N) rows and J columns.
2. Compute the singular value decomposition for
matrices Y and Z, as follows. For Gram ma-
trix G
row
= YY
T
, compute the K first eigenvec-
tors (Ψ
k
)
k=1...I
, with K I, storing them in a ma-
trix P.
P = col (Ψ
1
, Ψ
2
, . . . , Ψ
K
)
For Gram matrix G
col
= Z
T
Z, compute the L first
eigenvectors (Φ
l
)
l=1...J
, with L J, storing them
in a matrix Q.
Q = col (Φ
1
, Φ
2
, . . . , Φ
L
)
3. Finally, the STPCA estimate
ˆ
X
n
of a realization
matrix X
n
is defined by:
ˆ
X
n
= PP
T
X
n
QQ
T
.
Let us emphasize that the corresponding reduced-
order coefficient matrix, also called STPCA descrip-
tor, of the realization n is given by:
˜
X
n
= P
T
X
n
Q,
of size K × L where K and L are chosen to be small.
Until now, all applications using STPCA focuses on
STPCA estimate
ˆ
X
n
in order to efficiently summarize
data (Joliveau, 2008; Bauzer-Medeiros et al., 2008),
estimate missing values (Joliveau and DeVuyst, 2007)
or extract knowledge (Joliveau, 2008; Joliveau and
DeVuyst, 2008; Bauzer-Medeiros et al., 2009). How-
ever, reduced-order matrices
˜
X
n
offer a very strong
dimensionality reduction that could be efficiently ex-
ploited for classification or fast similarity search. As
already demonstrated, STPCA can be easily adapted
to pictures. So we propose to test and validate the use
of STPCA reduced-order descriptor
˜
X
n
on a grayscale
images data set.
3 REDUCTION OF GRAYSCALE
IMAGES
The aim of the proposed approach is to efficiently re-
duce dimension of grayscale images while conserv-
ing the most possible of their intelligible relations.
All proposed experiments have been processed using
a 2.8 GHz processor with 2.5 GB of memory.
3.1 The Mnist Database
The Mnist database
1
contains 70000 images of hand-
written digits. This image database is decomposed
1
Mnist database is available on web-
site http://yann.lecun.com/exdb/mnist/
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
158
in two parts: a training set of 60000 pictures and a
test set of 10000 pictures. The size of each image
is 28 × 28.
Figure 1: Example of the 50 first Mnist handwritten digit
images.
Figure 1 illustrates the 50 first images of Mnist
database. Image database comes with a label file con-
taining the corresponding digit of each picture. Mnist
is thus considered as ”a good database for people who
want to try learning techniques and pattern recogni-
tion methods on real-world data” (Y. LeCun, 1998,
http://yann.lecun.com/exdb/mnist/).
3.2 Image Reconstruction
To give a survey of STPCA performance with images,
we first took interest on the STPCA estimate
ˆ
X
n
of
each picture n from the Mnist training set. An STPCA
has thus been performed on this data set with input
parameters I = 28, J = 28 and N = 60000. Reduc-
tion parameter K and L have been determined such
that reduced-order matrices keep the relation between
images height and width. So in our case, we only con-
sider values such that K = L.
Figure 2: STPCA estimates of Mnist images according to
square reduced-order matrices size from 1 × 1 to 28 × 28.
Figure 2 shows the STPCA estimate of the 16 first
Mnist images according to values of reduction param-
eters. Each row in the figure refers to one image and
each column illustrates the STPCA estimate accord-
ing to chosen value for K and L. On the first column,
we can see the image obtained with K = L = 1, on the
second column the image obtained with K = L = 2
and so on.
The initial analysis indicates that using very weak val-
ues of reduction parameters, such as reduced-order
matrices are of size 4 × 4 or 5 × 5, leads to a suf-
ficient degree of precision for a human to recognize
represented digits. These reductions respectively cor-
respond to compression factors of order 50 and 30.
3.3 Inter-Image Relations Conservation
STPCA estimate is interesting as it provides an under-
standable representation of images. However, in order
to think about memory economy and fast computa-
tion, we now will consider only reduced-order ma-
trix (or STPCA descriptor)
˜
X
n
of each image n.
What we measure first is the impact of the reduc-
tion process on the similarity relations between im-
ages. We thus perform a STPCA to Mnist training
set and then we compute the k-nearest neighbors of
each STPCA descriptor matrix
˜
X
n
(for n = 1, . . . , N)
among the N 1 other reduced-order matrices. Dis-
tance between two matrices is computed according to
the Frobenius norm of their difference. The Frobenius
norm kAk
F
of a matrix A is given by:
kAk
F
=
v
u
u
t
I
i=1
J
j=1
(A
i, j
)
2
,
where A
i, j
corresponds to the value at row i and col-
umn j of matrix A.
Quality of the obtained classification is then mea-
sured by the recognition rate τ
rec
(n) of each image n.
Recognition rate of an image n indicates the propor-
tion of neighbors of n that share the same label as
it, i.e., the proportion of its neighbors that shows the
same digit as n. Recognition rate of an image n is
given by:
τ
rec
(n) =
|N (n, L(n))|
|N (n)|
where N (n) is the set that contains the k-nearest
neighbors of n, N (n, L(n)) is a subset of N (n) con-
cerning only the neighbors with label equal to L(n),
the label of n, and |N | symbolizes the number of ele-
ment contained in a set N .
Figure 3 illustrates the average recognition rate ob-
tained on Mnist training set according to the size of
STPCA descriptor matrices for four different values
of k. The corresponding recognition rate when using
the original Mnist images is also represented on each
figure by a dashed line.
These tests validate STPCA descriptor accuracy to
both strongly reduce dimension and to conserve re-
lations between images. For any value of k, since
reduced-order matrices are at least of size 3 × 3,
recognition rate is greater or equal than 80%. Thanks
EFFICIENT IMAGE REDUCTION FOR FAST INTELLIGIBLE CLASSIFICATION
159
Figure 3: Average recognition rate according to reduced-
order matrix size for different values of number of neigh-
bors k.
to noise reduction achieved by our approach, with
very small values of k, STPCA descriptor reaches a
better recognition rate than those obtained with origi-
nal images. When k = 20 and k = 100, since reduced-
order matrices size is equal to 4×4 and 5 ×5, method
manages to reach better recognition rates with origi-
nal data.
Moreover, it must be emphasized that the proposed
values of k are very weak (less or equal than 100)
in comparison to the number of images in the data
set (60000).
3.4 Fast Learning of Digits
Good quality of previous results leads us to
use STPCA descriptor for fast learning tasks. Learn-
ing the digits using STPCA descriptor can be done as
follows.
First, an STPCA is performed on the Mnist training
set. This allows us both to learn the eigenmodes ma-
trices P and Q and to compute STPCA descriptor of
each image in the training set. Then, STPCA de-
scriptor of test set images is computed from matri-
ces P and Q learned in the previous step. We assume
that we know digit associated to each training set im-
age but, in the process, we ignore digits associated to
test set images. The last step consists of computing
the k-nearest training set reduced-order matrices of
each reduced test image. We finally allocate to each
image in the test set the most frequently represented
digit among its neighbors. The obtained classification
quality is measured by the test error rate that indicates
the proportion of irrelevant digit allocations.
Figure 4 illustrates evolution of test error rate ac-
cording to reduced-order matrices size for different
values of k (the number of neighbors considered).
Test error rate level obtained with original images of
size 28 × 28 is also represented by a dashed line on
each subfigure. Once again, results are surprisingly
Figure 4: Test error rate according to reduced-order matri-
ces size for different values of k.
good, as for every values of k test error rate is less
than 5% since reduced-order matrices are at least of
size 4 × 4. Moreover, for all values of k, since re-
duced matrices size is at least 5 × 5, test error rate is
weaker than those obtained with original full-size im-
ages, reaching values less than 2.5%.
Figure 5: k-nearest neighbors computation times according
to reduced-order matrices size.
Figure 5 shows the k-nearest neighbors compu-
tation times (in seconds) according to reduced-order
matrices size. As chosen values of parameter k are
very weak (less than 10) in comparison to the num-
ber of element in the data set, k does not significantly
influence the computation times. On this figure, we
can also see the k-nearest neighbors computation time
when using the original 28 × 28 images represented
by a dashed line.
More than providing an accurate learning of dig-
its, STPCA allows to learn them much faster than
with original images. For example, computing
the k-nearest neighbors of reduced-order matrices of
size 5 × 5 takes 25 times less computation time than
with original images. If we refer to figure 4, learn-
ing digits with k = 1 and reduced-order matrices of
this size leads to a very weak test error rate of or-
der 3%. These results are even more surprising when
we consider that application of STPCA to both learn
eigenmodes and to compute reduced-order matrices is
performed in less than 1 minute !
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
160
4 COMPARISON TO RELATED
WORK
This paper concerns a method to dramatically reduce
images while keeping relations between them. Nu-
merical experiments to validate our approach are done
through the evaluation of its abilities to learn hand-
written digit pictures. This section briefly comments
on related work in image reduction and Mnist images
learning. It presents the different relations between
our work and methods introduced in the literacy.
4.1 Image Reduction
In the last decades, the problem of dimensionality re-
duction i.e., finding meaningful low-dimensional
structures hidden in high-dimensional observations,
received high interests from scientists. When working
with images, the aim of methods solving this problem
is generally to estimate the underlying geometry of
the data set.
Classical techniques principally concern Principal
Component Analysis (PCA) and MultiDimensional
Scaling (MDS). PCA (Jolliffe, 1986) applies a sin-
gular value decomposition on the data, whereas
MDS (Kruskal and Wish, 1978) considers pairwise
distances between data points (i.e., images) and
projects them on a euclidean space such that two sim-
ilar objects are represented by points close to each
other, and two dissimilar objects by faraway points.
More recently, new methods such as Isomap (Tenen-
baum et al., 2000) or Local Linear Embed-
dings (LLE) (Roweis and Saul, 2000) have been intro-
duced to reduce images. Unlike classical approaches,
these methods are capable to discover the non-linear
degrees of freedom that underlie complex natural ob-
servations (e.g., handwritten digits). Isomap applies a
MDS from the matrix of geodesic distances between
points while LLE uses a weighted neighborhood ap-
proach.
Due to the too important level of memory required
by MDS, LLE and Isomap when performed on the
70000 images of Mnist, we can only compare our ap-
proach to PCA. PCA reduces the dimension of objects
in only one direction (i.e., row or column for images)
while STPCA reduces both images directions simul-
taneously, which allows to reach better compression
factor (Joliveau, 2008). This property is validated
by results shown on table 1 that compares perfor-
mances of STPCA and PCA when learning digits of
Mnist. PCA-Row reduces the dimension in rows di-
rection according to reduction parameter K and PCA-
Col reduces the dimension in columns direction ac-
cording to reduction parameter L. For the three pro-
Table 1: Comparison between PCA and STPCA of test er-
ror rate and k-nearest neighbors computation time obtained
while learning digits of Mnist.
PCA-Row PCA-Col STPCA
Test error rate 4.91% 5.06% 3.24%
knn CPU-Time 265 sec 285 sec 57 sec
(a) - Parameters K = 5 & L = 5
PCA-Row PCA-Col STPCA
Test error rate 4.49% 4.89% 2.84%
knn CPU-Time 315 sec 331 sec 77 sec
(b) - Parameters K = 6 & L = 6
PCA-Row PCA-Col STPCA
Test error rate 4.39% 4.88% 2.70%
knn CPU-Time 367 sec 380 sec 100 sec
(c) - Parameters K = 7 & L = 7
posed configurations of reduction parameters, K =
L = 5 (Tab. 1a), K = L = 6 (Tab. 1b) and K = L = 7
(Tab. 1c), STPCA outperforms PCA for both test error
rate and computation time. Thus, STPCA allows to
reduce dimensionality more strongly than PCA while
conserving more intelligible information.
4.2 Digits Learning
Since Mnist data set publication, several methods
have been introduced to learn the handwritten digits
it contains. Authors of (LeCun et al., 1998) give an
assessment of performances of many learning tech-
niques such as linear classifiers, k-nearest neighbors,
virtual SVM, multi-layer neural networks, and convo-
lutionnal net, to identify Mnist digits. Some of these
techniques require preprocessing on images such as
deskewing or subsampling. Other methods have also
been proposed: the combination of trainable fea-
ture extractor with Support Vector Machines (SVM)
in (Lauer et al., 2007), k-nearest neighbors with non-
linear deformation in (Keysers et al., 2007), and
the combination of unsupervised sparse feature and
a SVM in (Labusch et al., 2008). At the moment,
best results are obtained by (Ranzato et al., 2006)
whose large convolution net reached an error test of
only 0.39%.
Compared to the mentioned methods, best test error
obtained with our learning approach (2.42%) is not
as good as those presented in (Ranzato et al., 2006)
but it is better than those obtained by other proposed
approaches, including all multi-layer neural networks
from (LeCun et al., 1998). It must also be noted
that our approach is not directly designed for learn-
ing tasks but to intelligibly reduce dimensionality of
images. Thus, it can be used as preprocessing for
these learning approaches to reduce the data size, to
EFFICIENT IMAGE REDUCTION FOR FAST INTELLIGIBLE CLASSIFICATION
161
improve their computation times, and probably to in-
crease results quality due to the denoising effect of
STPCA.
5 CONCLUSIONS
This paper presents the adaptation of the Space
Time Principal Component Analysis (STPCA), a bi-
dimensional innovative reduction method, to images
data sets. Unlike to the other STPCA applications
proposed in the literacy that essentially exploit sum-
marizing abilities of the method, our approach fo-
cuses more on its dimension reduction capacities pro-
vided by the STPCA descriptor.
Numerical experiments on the Mnist data set contain-
ing 70000 handwritten digit pictures validate accu-
racy of the proposed approach to both strongly reduce
dimension and to conserve relations between images.
Denoising effect when reducing images dimension
even allows more accurate and intelligible classifica-
tion with low-dimension images descriptors than with
original pictures. Identification of represented digits
on Mnist test set images after a learning process on
the Mnist training set also demonstrates a very good
behaviour of the introduced approach. Such an identi-
fication process leads to a weaker test error rate when
it is performed from the 5 × 5 reduced-order matri-
ces computed for each image than those obtained with
original images of size 28 × 28. Using the proposed
approach, we thus obtain results at least as accurate
and intelligible from reduced data than from original
full-size ones with weaker computation times.
Future works concern the use of reduced images as in-
put data in more complex digit learning systems like
support vector machines or large convolutional nets,
that could improve both their computability and accu-
racy, as much as its combination to non-linear fea-
ture extractors such as LLE or Isomap. Moreover,
it could also be interesting to integrate proposed ap-
proach to complex databases or datawarehouses man-
agement systems.
REFERENCES
Bauzer-Medeiros, C., Joliveau, M., Jomier, G., and De-
Vuyst, F. (2008). Managing sensor data on urban traf-
fic. Advances in Conceptual Modeling Challenges
and Opportunities, 5232/2008:385–394.
Bauzer-Medeiros, C., Joliveau, M., Jomier, G., and De-
Vuyst, F. (2009). Managing sensor traffic data and
forecasting unusual behaviour propagation. Geoinfor-
matica.
Joliveau, M. (2008). Reduction of urban traffic time series
from georeferenced sensors, and extraction of spatio-
temporal series. PhD thesis, Ecole Centrale Des Arts
Et Manufactures (Ecole Centrale Paris).
Joliveau, M. and DeVuyst, F. (2007). Space-time sum-
marization of multisensor time series. case of miss-
ing data. In Proc. of 2007 International Workshop
on Spatial and Spatio-Temporal Data Mining, pages
631–636.
Joliveau, M. and DeVuyst, F. (2008). Recherche de mo-
tifs spatio-temporels de cas atypiques pour le trafic
routier urbain. Extraction et Gestion de Connais-
sances EGC 08, Revue des Nouvelles Technologies de
l’Information - RNTI - E11, F. Guillet et B. Trousse,
2:523–534.
Jolliffe, I. (1986). Principal component analysis. Springer-
Verlag, New York.
Keysers, D., Deselaers, T., Gollan, C., and Ney, H. (2007).
Deformation models for image recognition. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, 29(8):1422–1435.
Kruskal, J. B. and Wish, M. (1978). Multidimensional Scal-
ing. Sage Publications.
Labusch, K., Barth, E., and Martinetz, T. (2008). Simple
method for high-performance digit recognition based
on sparse coding. IEEE transactions on neural net-
works, 19:1885–1889.
Lauer, F., Suen, C., and Bloch, G. (2007). A trainable fea-
ture extractor for handwritten digit recognition. Pat-
tern Recognition, 40:1816–1824.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. In Proc. of the IEEE, volume 86, pages 2278–
2324.
Ranzato, M., Poultney, C., Chopra, S., and LeCun, Y.
(2006). Efficient learning of sparse representations
with an energy-based model. In Proc. of Advances
in Neural Information Processing System NIPS 2006.
Roweis, T. and Saul, L. (2000). Nonlinear dimensional-
ity reduction by locally linear embedding. Science,
290:2323–2326.
Tenenbaum, J., de Silva, V., and Langford, J. (2000). A
global geometric framework for nonlinear dimension-
ality reduction. Science, 290:2319–2323.
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
162