Table 4: Grid complexity for various datasets.
Dimension Type Grid size
2D-lin.
std. 1056
acc. 563
2D-nlin.
std. 924
acc. 589
4D
std. 2520
acc. 2157
effectively makes the data more Gaussian, as shown
by the improved performance of Gauss-related classi-
fiers in the transformed space. We believe that the
small increases of the error rate on the nonlinearly
separable data set are due to the fact that in the re-
gions where the adaptive grid is sparse, the displace-
ment vectors of the nonlinear transform are larger –
which follows from the very way the transform is
computed. Test-set points falling in regions of the
feature space covered by a coarse section of the adap-
tive grid will tend to travel further away, potentially
over the linear separation surface, but they do so in
a grouped manner, such that in the case of the SVM,
new support vectors placed there lead to the group be-
ing correctly classified. Still the number of support
vectors is smaller than for the original data, as it may
be observed in Table 2. Conversely the same behavior
works to our advantage on the ”iris” dataset.
The main purpose of the accelerated Gaussian-
ization is to reduce the training time (i.e., the time
needed to compute the parameters of the elastic trans-
form), so that to be able to apply the Gaussianization
to features spaces of higher dimension. The accel-
erated Gaussianization works by reducing the size of
the grid where the elastic transform is computed and
offering better initialization locally for the conjugate-
gradient solver. On the other hand, the parameters
of several gird points (i.e, those also present on the
coarser previous grid) are recomputed each time the
grid turns finer in the respective region. Furthermore,
with the adaptive grid some time is spent during the
computation of the transform with the generation of
the adaptive grid and then with the management of the
adaptive grid. Therefore, the time does not decrease
linearly with the number of grid points. Nevertheless,
as a whole, we are able to reduce the time needed
to train the Gaussianization, because, even if some
grid points are recomputed several times, as a total, a
smaller number of equations needs to be solved. Fur-
thermore, the CG solver coverages in a smaller num-
ber of steps due to the improved initialization.
The process can be speed up even further if we
use faster solvers than the CG. A last resort solution
to achieve a significant reduction in complexity for
problems of very large size would be to reduce the
dimensionality of the feature space before Gaussian-
ization. It remains to be investigated if this is a viable
solution and how should it be implemented precisely.
We have introduced and successfully tested an
adaptive grid setup to speed up the computation of the
parameters of the nonlinear multi-class Gaussianiza-
tion transform. The adaptive grid is computed so that
to ensure that the same number of training-set vec-
tors is present in each hyperrectangle with grid-points
at its corners. The adaptive grid Gaussianization ef-
fectively makes the input data more Gaussian, while
reducing the computational complexity.
REFERENCES
Bezdek, J., Keller, J., Krishnapuram, R., Kuncheva, L., and
Pal, N. (1999). Will the real iris data please stand up?
IEEE Trans. on Fuzzy Systems, 7(3):368–369.
Bishop, C. M. (2009). Pattern recognition and machine
learning. Information Science and Statistics. Springer.
Chen, S. and Gopinath, R. (2000). Gaussianization. In Proc.
of NIPS, Denver, USA.
Condurache, A. P. and Mertins, A. (2011). Elastic-
transform based multiclass gaussianization. IEEE Sig-
nal Processing Letters, 18(8):482–485.
Dias, T. M., Attux, R., Romano, J. M., and Suyama, R.
(2009). Blind source separation of post-nonlinear
mixtures using evolutionary computation and gaus-
sianization. In Proc. of ICA, pages 235–242. Springer.
Gonzales, R. C. and Woods, R. E. (2008). Digital image
processing (Third edition). Pearson Education.
Gopinath, R. A. (1998). Maximum likelihood modeling
with gaussian distributions for classification. In Proc.
of ICASSP, pages 661–664, Seattle, U.S.A.
Hackbusch, W. (1993). Iterative solution of large sparse
systems of equations. Springer-Verlag (New York).
Hogg, R., Craig, A., and McKean, J. (2004). Introduction
to mathematical statistics. Prentice Hall, 6 edition.
Mezghani-Marrakchi, I., Mah, G., Jadane-Sadane, M.,
Djaziri-Larbi, S., and Turki-Hadj-Allouane, M.
(2007). ”gaussianization” method for identification
of memoryless nonlinear audio systems. In Proc. of
EUSIPCO, pages 2316 – 2320.
Modersitzki, J. (2004). Numerical methods for image reg-
istration. Oxford university press.
Rachev, S. T. (1985). The Monge-Kantorovich mass trans-
ference problem and its stochastic applications. SIAM
Theory of Probability and its Applications, 29(4):647
– 676.
Saon, G., Dharanipragada, S., and Povey, D. (2004). Fea-
ture space gaussianization. In Proc. of ICASSP, pages
I – 329–332.
Trottenberg, U., Oosterlee, C. W., and Schller, A. (2001).
Multigrid. Academic Press.
Wesseling, P. (1992). An introduction to multigrid methods.
John Wiley & Sons.
ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods
126