Calandriello et al., 2017Calandriello, D., Lazaric, A., and
Valko, M. (2017). Efficient second-order online ker-
nel learning with adaptive embedding. In Guyon, I.,
Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R.,
Vishwanathan, S., and Garnett, R., editors, Advances
in Neural Information Processing Systems 30, pages
6140–6150. Curran Associates, Inc.
Cavallanti et al., 2007Cavallanti, G., Cesa-Bianchi, N., and
Gentile, C. (2007). Tracking the best hyperplane
with a simple budget perceptron. Machine Learning,
69(2):143–167.
Chang and Lin, 2011Chang, C.-C. and Lin, C.-J. (2011). LIB-
SVM: A library for support vector machines. ACM
Trans. Intell. Syst. Technol., 2(3).
Cortes and Vapnik, 1995Cortes, C. and Vapnik, V. (1995).
Support-vector networks. Machine learning,
20(3):273–297.
Dekel et al., 2008Dekel, O., Shalev-Shwartz, S., and Singer,
Y. (2008). The forgetron: A kernel-based perceptron
on a budget. SIAM J. Comput., 37(5):1342–1372.
Dekel and Singer, 2007Dekel, O. and Singer, Y. (2007). Sup-
port vector machines on a budget. MIT Press.
Fan et al., 2008Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang,
X.-R., and Lin, C.-J. (2008). Liblinear: A library for
large linear classification. J. Mach. Learn. Res., pages
1871–1874.
Freund and Schapire, 1998Freund, Y. and Schapire, R. E.
(1998). Large margin classification using the percep-
tron algorithm. In Proceedings of the Eleventh An-
nual Conference on Computational Learning Theory,
COLT’ 98, New York, NY, USA. ACM.
Glasmachers, 2016Glasmachers, T. (2016). Finite sum accel-
eration vs. adaptive learning rates for the training of
kernel machines on a budget. In NIPS workshop on
Optimization for Machine Learning.
Hsieh et al., 2008Hsieh, C.-J., Chang, K.-W., Lin, C.-J.,
Keerthi, S. S., and Sundararajan, S. (2008). A dual co-
ordinate descent method for large-scale linear SVM.
In Proceedings of the 25th International Conference
on Machine Learning, pages 408–415. ACM.
Joachims, 2006Joachims, T. (2006). Training linear SVMs in
linear time. In Proceedings of the 12th ACM SIGKDD
international conference on Knowledge discovery and
data mining, pages 217–226. ACM.
Kivinen et al., 2003Kivinen, J., Smola, A. J., and Williamson,
R. C. (2003). Online learning with kernels. volume 52,
pages 2165–2176.
Le et al., 2016Le, T., Nguyen, T., Nguyen, V., and Phung, D.
(2016). Dual space gradient descent for online learn-
ing. In Lee, D. D., Sugiyama, M., Luxburg, U. V.,
Guyon, I., and Garnett, R., editors, Advances in Neu-
ral Information Processing Systems 29, pages 4583–
4591. Curran Associates, Inc.
Lin, 2001Lin, C.-J. (2001). On the convergence of the decom-
position method for support vector machines. IEEE
Transactions on Neural Networks, 12(6):1288–1298.
List and Simon, 2005List, N. and Simon, H. U. (2005). Gen-
eral polynomial time decomposition algorithms. In
International Conference on Computational Learning
Theory, pages 308–322. Springer.
Lu et al., 2016Lu, J., Hoi, S. C., Wang, J., Zhao, P., and Liu,
Z.-Y. (2016). Large scale online kernel learning. Jour-
nal of Machine Learning Research, 17(47):1–43.
Lu et al., 2018Lu, J., Sahoo, D., Zhao, P., and Hoi, S.
C. H. (2018). Sparse passive-aggressive learning for
bounded online kernel methods. ACM Trans. Intell.
Syst. Technol., 9(4).
Mohri et al., 2012Mohri, M., Rostamizadeh, A., and Tal-
walkar, A. (2012). Foundations of Machine Learning.
MIT press.
Nesterov, 2012Nesterov, Y. (2012). Efficiency of coordinate
descent methods on huge-scale optimization prob-
lems. SIAM Journal on Optimization, 22(2):341–362.
Nguyen et al., 2017Nguyen, T. D., Le, T., Bui, H., and Phung,
D. (2017). Large-scale online kernel learning with
random feature reparameterization. In Proceedings of
the 26th International Joint Conference on Artificial
Intelligence.
Orabona et al., 2009Orabona, F., Keshet, J., and Caputo, B.
(2009). Bounded kernel-based online learning. J.
Mach. Learn. Res., pages 2643–2666.
Osuna et al., 1997Osuna, E., Freund, R., and Girosi, F. (1997).
An improved training algorithm of support vector ma-
chines. In Neural Networks for Signal Processing VII,
pages 276 – 285.
Quinlan et al., 2003Quinlan, M. J., Chalup, S. K., and Mid-
dleton, R. H. (2003). Techniques for improving vi-
sion and locomotion on the Sony AIBO robot. In In
Proceedings of the 2003 Australasian Conference on
Robotics and Automation.
Rahimi and Recht, 2008Rahimi, A. and Recht, B. (2008).
Random features for large-scale kernel machines. In
Advances in neural information processing systems,
pages 1177–1184.
Shalev-Shwartz et al., 2007Shalev-Shwartz, S., Singer, Y., and
Srebro, N. (2007). Pegasos: Primal estimated sub-
gradient solver for SVM. In Proceedings of the
24th International Conference on Machine Learning,
pages 807–814.
Shigeo, 2005Shigeo, A. (2005). Support Vector Machines for
Pattern Classification (Advances in Pattern Recogni-
tion). Springer-Verlag New York, Inc., Secaucus, NJ,
USA.
Son et al., 2010Son, Y.-J., Kim, H.-G., Kim, E.-H., Choi, S.,
and Lee, S.-K. (2010). Application of support vec-
tor machine for prediction of medication adherence
in heart failure patients. Healthcare Informatics Re-
search, pages 253–259.
Steinwart, 2003Steinwart, I. (2003). Sparseness of support
vector machines. Journal of Machine Learning Re-
search, 4:1071–1105.
Steinwart et al., 2011Steinwart, I., Hush, D., and Scovel, C.
(2011). Training SVMs without offset. Journal of
Machine Learning Research, 12(Jan):141–202.
Wang et al., 2012Wang, Z., Crammer, K., and Vucetic, S.
(2012). Breaking the curse of kernelization: Budgeted
stochastic gradient descent for large-scale SVM train-
ing. J. Mach. Learn. Res., 13(1):3103–3131.
Wang and Vucetic, 2010Wang, Z. and Vucetic, S. (2010). On-
line passive-aggressive algorithms on a budget. In