smaller than an imposed ε. Obviously, the smaller ε
for given experiment conditions, the more frequently
one can expect to select the same optimal model com-
plexity via SRM and via cross-validation (again with-
out actually performing it).
For the special case of leave-one-out cross-
validation we observe in the consequence of bounds
we derived that at most a constant difference of order
O(
p
−lnη/2) between C and V can be expected.
Additionally, we showed for what number n of
folds, the bounds (lower and upper) on the difference
are the tightest. Interestingly, as it turns out these op-
timal n values do not depend on the sample size.
Finally, shown are experiments confirming statis-
tical correctness of the bounds.
ACKNOWLEDGEMENTS
This work has been financed by the Polish Govern-
ment, Ministry of Science and Higher Education from
the sources for science within years 2010–2012. Re-
search project no.: N N516 424938.
REFERENCES
Anthony, M. and Shawe-Taylor, J. (1993). A result of vap-
nik with applications. Discrete Applied Mathematics,
47(3):207–217.
Bartlett, P. (1997). The sample complexity of pattern clas-
sification with neural networks: the size of weights is
more important then the size of the network. IEEE
Transactions on Information Theory, 44(2).
Bartlett, P., Kulkarni, S., and Posner, S. (1997). Covering
numbers for real-valued function classes. IEEE Trans-
actions on Information Theory, 47:1721–1724.
Cherkassky, V. and Mulier, F. (1998). Learning from data.
John Wiley & Sons, inc.
Devroye, L., Gyorfi, L., and Lugosi, G. (1996). A Proba-
bilistic Theory of Pattern Recognition. Springer Ver-
lag, New York, inc.
Efron, B. and Tibshirani, R. (1993). An Introduction to the
Bootstrap. London: Chapman & Hall.
Fu, W., Caroll, R., and Wang, S. (2005). Estimating mis-
classification error with small samples via bootstrap
cross-validation. Bioinformatics, 21(9):1979–1986.
Hellman, M. and Raviv, J. (1970). Probability of error,
equivocation and the chernoff bound. IEEE Transac-
tions on Information Theory, IT-16(4):368–372.
Hjorth, J. (1994). Computer Intensive Statistical Methods
Validation, Model Selection, and Bootstrap. London:
Chapman & Hall.
Holden, S. (1996a). Cross-validation and the pac learning
model. Technical Report RN/96/64, Dept. of CS, Uni-
versity College, London.
Holden, S. (1996b). Pac-like upper bounds for the sample
complexity of leave-one-out cross-validation. In 9-th
Annual ACM Workshop on Computational Learning
Theory, pages 41–50.
Kearns, M. (1995a). A bound on the error of cross-
validation, with consequences for the training-test
split. In Advances in Neural Information Processing
Systems 8. MIT Press.
Kearns, M. (1995b). An experimental and theoretical com-
parison of model selection methods. In 8-th Annual
ACM Workshop on Computational Learning Theory,
pages 21–30.
Kearns, M. and Ron, D. (1999). Algorithmic stabil-
ity and sanity-check bounds for leave-one-out cross-
validation. Neural Computation, 11:1427–1453.
Kohavi, R. (1995). A study of cross-validation and boostrap
for accuracy estimation and model selection. In In-
ternational Joint Conference on Artificial Intelligence
(IJCAI).
Krzy
˙
zak, A. et al. (2000). Application of structural risk min-
imization to multivariate smoothing spline regression
estimates. Bernoulli, 8(4):475–489.
M. Korze
´
n, M. and Kl˛esk, P. (2008). Maximal mar-
gin estimation with perceptron-like algorithm. In
L. Rutkowski, R. Tadeusiewicz R., L. Z. J. Z., editor,
Lecture Notes in Artificial Intelligence, pages 597–
608. Springer.
Ng, A. (2004). Feature selection, l1 vs. l2 regularization,
and rotational invariance. In 21-st International Con-
ference on Machine learning, ACM International Con-
ference Proceeding Series, volume 69.
Schmidt, J., Siegel, A., and Srinivasan, A. (1995).
Chernoff-hoeffding bounds for applications with lim-
ited independence. SIAM Journal on Discrete Mathe-
matics, 8(2):223–250.
Shawe-Taylor, J. et al. (1996). A framework for structural
risk minimization. COLT, pages 68–76.
Vapnik, V. (1995a). The Nature of Statistical Learning The-
ory. Springer Verlag, New York.
Vapnik, V. (1995b). Statistical Learning Theory: Inference
from Small Samples. Wiley, New York.
Vapnik, V. (2006). Estimation of Dependences Based on
Empirical Data. Information Science & Statistics.
Springer, US.
Vapnik, V. and Chervonenkis, A. (1968). On the uniform
convergence of relative frequencies of events to their
probabilities. Dokl. Aka. Nauk, 181.
Vapnik, V. and Chervonenkis, A. (1989). The necessary
and sufficient conditions for the consistency of the
method of empirical risk minimization. Yearbook of
the Academy of Sciences of the USSR on Recognition,
Classification and Forecasting, 2:217–249.
Weiss, S. and Kulikowski, C. (1991). Computer Systems
That Learn. Morgan Kaufmann.
A RELATIONSHIP BETWEEN CROSS-VALIDATION AND VAPNIK BOUNDS ON GENERALIZATION OF
LEARNING MACHINES
17