when hitting boundaries). A main advantage of EAS
is that it works in high dimensionalities in which
very few published papers have good active-results.
3. In some cases, GLD is far better than GLDfff,
but in one case GLD is the worst algorithm and
GLDfff is (very significantly) the best one. As the
main difference between these two algorithms is that
GLD samples more strongly the frontier, this points
out the simple fact that the frontier of the state-
space can be very important: sometimes it is very
relevant (e.g. stock management, in which marginal
costs have a first approximation thanks to cost-to-go
at corners) and sometimes it is pointless and expen-
sive (GLD puts 2
d
points on the corners among the
2
d
+ 1 first points!).
REFERENCES
Baeck, T. (1995). Evolutionary Algorithms in theory and
practice. New-York:Oxford University Press.
Barto, A., Bradtke, S., and Singh, S. (1993). Learning to
act using real-time dynamic programming. Technical
Report UM-CS-1993-002.
Bertsekas, D. and Tsitsiklis, J. (1996). Neuro-dynamic pro-
gramming, athena scientific.
Cervellera, C. and Muselli, M. (2003). A deterministic
learning approach based on discrepancy. In Proceed-
ings of WIRN’03, pp53-60.
Chapel, L. and Deffuant, G. (2006). Svm viability controller
active learning. In Kernel machines for reinforcement
learning workshop, Pittsburgh, PA.
Cohn, D. A., Ghahramani, Z., and Jordan, M. I. (1995a).
Active learning with statistical models. In Tesauro, G.,
Touretzky, D., and Leen, T., editors, Advances in Neu-
ral Information Processing Systems, volume 7, pages
705–712. The MIT Press.
Cohn, D. A., Ghahramani, Z., and Jordan, M. I. (1995b).
Active learning with statistical models. In Tesauro, G.,
Touretzky, D., and Leen, T., editors, Advances in Neu-
ral Information Processing Systems, volume 7, pages
705–712. The MIT Press.
Collobert, R. and Bengio, S. (2001). Svmtorch: Support
vector machines for large-scale regression problems.
Journal of Machine Learning Research, 1:143–160.
Devroye, L., Gyorfi, L., Krzyzak, A., and Lugosi, G.
(1994). the strong universal consistency of nearest
neighbor regression function estimates.
Eiben, A. and Smith, J. (2003). Introduction to Evolution-
ary Computing. springer.
Gelly, S., Mary, J., and Teytaud, O. (2006). Learning for
dynamic programming, proceedings of esann’2006, 7
pages, http://www.lri.fr/∼teytaud/lfordp.pdf.
Gelly, S. and Teytaud, O. (2005). Opendp, a c++ frame-
work for stochastic dynamic programming and rein-
forcement learning.
Kearns, M., Mansour, Y., and Ng, A. (1999). A sparse
sampling algorithm for near-optimal planning in large
markov decision processes. In IJCAI, pages 1324–
1231.
Larranaga, P. and Lozano, J. A. (2001). Estimation of
Distribution Algorithms. A New Tool for Evolutionary
Computation. Kluwer Academic Publishers.
LaValle, S. and Branicky, M. (2002). On the relation-
ship between classical grid search and probabilistic
roadmaps. In Proc. Workshop on the Algorithmic
Foundations of Robotics.
L’Ecuyer, P. and Lemieux, C. (2002). Recent advances in
randomized quasi-monte carlo methods.
Lewis, D. and Gale, W. (1994). Training text classifiers by
uncertainty sampling. In Proceedings of International
ACM Conference on Research and Development in In-
formation Retrieval, pages 3–12.
Liang, F. and Wong, W. (2001). Real-parameter evolution-
ary sampling with applications in bayesian mixture
models. J. Amer. Statist. Assoc., 96:653–666.
Lindemann, S. R. and LaValle, S. M. (2003). Incremen-
tal low-discrepancy lattice methods for motion plan-
ning. In Proceedings IEEE International Conference
on Robotics and Automation, pages 2920–2927.
Munos, R. and Moore, A. W. (1999). Variable resolution
discretization for high-accuracy solutions of optimal
control problems. In IJCAI, pages 1348–1355.
Niederreiter, H. (1992). Random Number Generation and
Quasi-Monte Carlo Methods.
Owen, A. (2003). Quasi-Monte Carlo Sampling, A Chapter
on QMC for a SIGGRAPH 2003 course.
Procopiuc, O., Agarwal, P., Arge, L., and Vitter, J. (2002).
Bkd-tree: A dynamic scalable kd-tree.
Rust, J. (1997). Using randomization to break the curse of
dimensionality. Econometrica, 65(3):487–516.
Schohn, G. and Cohn, D. (2000). Less is more: Active
learning with support vector machines. In Langley, P.,
editor, Proceedings of the 17
th
International Confer-
ence on Machine Learning, pages 839–846. Morgan
Kaufmann.
Seung, H. S., Opper, M., and Sompolinsky, H. (1992).
Query by committee. In Computational Learning The-
ory, pages 287–294.
Sloan, I. and Wo
´
zniakowski, H. (1998). When are quasi-
Monte Carlo algorithms efficient for high dimensional
integrals? Journal of Complexity, 14(1):1–33.
Sutton, R. and Barto, A. (1998). Reinforcement learning:
An introduction. MIT Press., Cambridge, MA.
Thrun, S. B. (1992). Efficient exploration in reinforcement
learning. Technical Report CMU-CS-92-102, Pitts-
burgh, Pennsylvania.
Tuffin, B. (1996). On the use of low discrepancy sequences
in monte carlo methods. In Technical Report 1060,
I.R.I.S.A.
Vidyasagar, M. (1997). A Theory of Learning and Gener-
alization, with Applications to Neural Networks and
Control Systems. Springer-Verlag.
ACTIVE LEARNING IN REGRESSION, WITH APPLICATION TO STOCHASTIC DYNAMIC PROGRAMMING
205