is also a linear combination of functions K
x
1
,... ,K
x
m
.
But the coefficients of these two linear combinations
are different: in the regularized case c
γ
= (K [x] +
γI )
−1
y, while in the non-regularizedone c = K [x]
+
y.
The characterization of regularized solution min-
imization of empirical error in reproducing kernel
Hilbert spaces was derived in (Wahba, 1990) us-
ing Fr´echet derivatives (see also (Cucker and Smale,
2002), (Poggio and Smale, 2003)). Our proof based
on characterization of the adjoint L
∗
x
of the evaluation
operator L
x
is much simpler and it also include the
non regularized case and thus it shows the effect of
regularization.
Theorem 2 shows that increase of “smoothness”
of the regularized solution f
γ
is achieved by merely
changing the coefficients of the linear combination.
In the non regularized case, the coefficients are ob-
tained from the output data vector y using the Moore-
Penrose pseudoinverse of the Gram matrix K [x],
while in the regularized one, they are obtained using
the inverse of the modified matrix K [x] + γI . So the
regularization merely changes amplitudes, but it pre-
serves the finite set of basis functions from which the
solution is composed.
In many practical applications, there are used net-
works with much smaller number n of units than the
size of the training sample of data m. However, char-
acterization of theoretically optimal solutions achiev-
able over networks with large numbers of units (equal
to the sizes m of training data) can be useful in inves-
tigation on dependence of quality of approximation of
such optimal solutions by suboptimal ones obtainable
over smaller models (Vito et al., 2005; K˚urkov´a and
Sanguineti, 2005a; K˚urkov´a and Sanguineti, 2005b).
As mentioned above, for convolution kernels we
have kf
γ
k
K
≤
∑
m
i=1
|c
γ
i
|. Instead of calculating k.k
2
K
norm, it is easier to use as a stabilizer the ℓ
1
-norm of
an output weight vector. For linear combinations of
functions of the form K
x
, this also leads to minimiza-
tion of k.k
2
K
norm.
ACKNOWLEDGEMENTS
This work was partially supported by M
ˇ
SMT grant
COST Intelli OC10047 and the Institutional Research
Plan AV0Z10300504.
REFERENCES
Aronszajn, N. (1950). Theory of reproducing kernels.
Transactions of AMS, 68:337–404.
Bertero, M. (1989). Linear inverse and ill-posed problems.
Advances in Electronics and Electron Physics, 75:1–
120.
Bishop, C. (1995). Training with noise is equivalent
to Tikhonov regularization. Neural Computation,
7(1):108–116.
Cucker, F. and Smale, S. (2002). On the mathematical foun-
dations of learning. Bulletin of AMS, 39:1–49.
Engl, E. W., Hanke, M., and Neubauer, A. (1999). Regular-
ization of Inverse Problems. Kluwer, Dordrecht.
Fine, T. L. (1999). Feedforward Neural Network Methodol-
ogy. Springer-Verlag, Berlin, Heidelberg.
Friedman, A. (1982). Modern Analysis. Dover, New York.
Girosi, F., Jones, M., and Poggio, T. (1995). Regulariza-
tion theory and neural networks architectures. Neural
Computation, 7:219–269.
Girosi, F. and Poggio, T. (1990). Regularization algorithms
for learning that are equivalent to multilayer networks.
Science, 247(4945):978–982.
Groetch, C. W. (1977). Generalized Inverses of Linear Op-
erators. Dekker, New York.
Hansen, P. C. (1998). Rank-Deficient and Discrete Ill-Posed
Problems. SIAM, Philadelphia.
Ito, Y. (1992). Finite mapping by neural networks and truth
functions. Mathematical Scientist, 17:69–77.
Kecman, V. (2001). Learning and Soft Computing. MIT
Press, Cambridge.
K˚urkov´a, V. and Sanguineti, M. (2005a). Error estimates
for approximate optimization by the extended Ritz
method. SIAM Journal on Optimization, 15:461–487.
K˚urkov´a, V. and Sanguineti, M. (2005b). Learning
with generalization capability by kernel methods with
bounded complexity. Journal of Complexity, 13:551–
559.
Michelli, C. A. (1986). Interpolation of scattered data:
Distance matrices and conditionally positive definite
functions. Constructive Approximation, 2:11–22.
Moore, E. H. (1920). Abstract. Bulletin of AMS, 26:394–
395.
Penrose, R. (1955). A generalized inverse for matri-
ces. Proceedings of Cambridge Philosophical Society,
51:406–413.
Poggio, T. and Smale, S. (2003). The mathematics of learn-
ing: dealing with data. Notices of AMS, 50:537–544.
Sch¨olkopf, B. and Smola, A. J. (2002). Learning with Ker-
nels – Support Vector Machines, Regularization, Op-
timization and Beyond. MIT Press, Cambridge.
Tikhonov, A. N. and Arsenin, V. Y. (1977). Solutions of
Ill-posed Problems. W.H. Winston, Washington, D.C.
Vito, E. D., Rosasco, L., Caponnetto, A., Giovannini, U. D.,
and Odone, F. (2005). Learning from examples as an
inverse problem. Journal of Machine Learning Re-
search, 6:883–904.
Wahba, G. (1990). Splines Models for Observational Data.
SIAM, Philadelphia.
INVERSE PROBLEMS IN LEARNING FROM DATA
321