# ACTIVE LEARNING IN REGRESSION, WITH APPLICATION TO STOCHASTIC DYNAMIC PROGRAMMING

### Olivier Teytaud, Sylvain Gelly, Jérémie Mary

#### Abstract

We study active learning as a derandomized form of sampling. We show that full derandomization is not suitable in a robust framework, propose partially derandomized samplings, and develop new active learning methods (i) in which expert knowledge is easy to integrate (ii) with a parameter for the exploration/exploitation dilemma (iii) less randomized than the full-random sampling (yet also not deterministic). Experiments are performed in the case of regression for value-function learning on a continuous domain. Our main results are (i) efficient partially derandomized point sets (ii) moderate-derandomization theorems (iii) experimental evidence of the importance of the frontier (iv) a new regression-specific user-friendly sampling tool less-robust than blind samplers but that sometimes works very efficiently in large dimensions. All experiments can be reproduced by downloading the source code and running the provided command line.

#### References

- Baeck, T. (1995). Evolutionary Algorithms in theory and practice. New-York:Oxford University Press.
- Barto, A., Bradtke, S., and Singh, S. (1993). Learning to act using real-time dynamic programming. Technical Report UM-CS-1993-002.
- Bertsekas, D. and Tsitsiklis, J. (1996). Neuro-dynamic programming, athena scientific.
- Cervellera, C. and Muselli, M. (2003). A deterministic learning approach based on discrepancy. In Proceedings of WIRN'03, pp53-60.
- Chapel, L. and Deffuant, G. (2006). Svm viability controller active learning. In Kernel machines for reinforcement learning workshop, Pittsburgh, PA.
- Cohn, D. A., Ghahramani, Z., and Jordan, M. I. (1995a). Active learning with statistical models. In Tesauro, G., Touretzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems, volume 7, pages 705-712. The MIT Press.
- Cohn, D. A., Ghahramani, Z., and Jordan, M. I. (1995b). Active learning with statistical models. In Tesauro, G., Touretzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems, volume 7, pages 705-712. The MIT Press.
- Collobert, R. and Bengio, S. (2001). Svmtorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1:143-160.
- Devroye, L., Gyorfi, L., Krzyzak, A., and Lugosi, G. (1994). the strong universal consistency of nearest neighbor regression function estimates.
- Eiben, A. and Smith, J. (2003). Introduction to Evolutionary Computing. springer.
- Gelly, S., Mary, J., and Teytaud, O. (2006). Learning for dynamic programming, proceedings of esann'2006, 7 pages, http://www.lri.fr/~teytaud/lfordp.pdf.
- Gelly, S. and Teytaud, O. (2005). Opendp, a c++ framework for stochastic dynamic programming and reinforcement learning.
- Kearns, M., Mansour, Y., and Ng, A. (1999). A sparse sampling algorithm for near-optimal planning in large markov decision processes. In IJCAI, pages 1324- 1231.
- Larranaga, P. and Lozano, J. A. (2001). Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers.
- LaValle, S. and Branicky, M. (2002). On the relationship between classical grid search and probabilistic roadmaps. In Proc. Workshop on the Algorithmic Foundations of Robotics.
- L'Ecuyer, P. and Lemieux, C. (2002). Recent advances in randomized quasi-monte carlo methods.
- Lewis, D. and Gale, W. (1994). Training text classifiers by uncertainty sampling. In Proceedings of International ACM Conference on Research and Development in Information Retrieval, pages 3-12.
- Liang, F. and Wong, W. (2001). Real-parameter evolutionary sampling with applications in bayesian mixture models. J. Amer. Statist. Assoc., 96:653-666.
- Lindemann, S. R. and LaValle, S. M. (2003). Incremental low-discrepancy lattice methods for motion planning. In Proceedings IEEE International Conference on Robotics and Automation, pages 2920-2927.
- Munos, R. and Moore, A. W. (1999). Variable resolution discretization for high-accuracy solutions of optimal control problems. In IJCAI, pages 1348-1355.
- Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods.
- Owen, A. (2003). Quasi-Monte Carlo Sampling, A Chapter on QMC for a SIGGRAPH 2003 course.
- Procopiuc, O., Agarwal, P., Arge, L., and Vitter, J. (2002). Bkd-tree: A dynamic scalable kd-tree.
- Rust, J. (1997). Using randomization to break the curse of dimensionality. Econometrica, 65(3):487-516.
- Schohn, G. and Cohn, D. (2000). Less is more: Active learning with support vector machines. In Langley, P., editor, Proceedings of the 17th International Conference on Machine Learning, pages 839-846. Morgan Kaufmann.
- Seung, H. S., Opper, M., and Sompolinsky, H. (1992). Query by committee. In Computational Learning Theory, pages 287-294.
- Sloan, I. and Wozniakowski, H. (1998). When are quasiMonte Carlo algorithms efficient for high dimensional integrals? Journal of Complexity, 14(1):1-33.
- Sutton, R. and Barto, A. (1998). Reinforcement learning: An introduction. MIT Press., Cambridge, MA.
- Thrun, S. B. (1992). Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Pittsburgh, Pennsylvania.
- Tuffin, B. (1996). On the use of low discrepancy sequences in monte carlo methods. In Technical Report 1060, I.R.I.S.A.
- Vidyasagar, M. (1997). A Theory of Learning and Generalization, with Applications to Neural Networks and Control Systems. Springer-Verlag.

#### Paper Citation

#### in Harvard Style

Teytaud O., Gelly S. and Mary J. (2007). **ACTIVE LEARNING IN REGRESSION, WITH APPLICATION TO STOCHASTIC DYNAMIC PROGRAMMING** . In *Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,* ISBN 978-972-8865-82-5, pages 198-205. DOI: 10.5220/0001645701980205

#### in Bibtex Style

@conference{icinco07,

author={Olivier Teytaud and Sylvain Gelly and Jérémie Mary},

title={ACTIVE LEARNING IN REGRESSION, WITH APPLICATION TO STOCHASTIC DYNAMIC PROGRAMMING},

booktitle={Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},

year={2007},

pages={198-205},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0001645701980205},

isbn={978-972-8865-82-5},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,

TI - ACTIVE LEARNING IN REGRESSION, WITH APPLICATION TO STOCHASTIC DYNAMIC PROGRAMMING

SN - 978-972-8865-82-5

AU - Teytaud O.

AU - Gelly S.

AU - Mary J.

PY - 2007

SP - 198

EP - 205

DO - 10.5220/0001645701980205