# NONLINEAR PROGRAMMING IN APPROXIMATE DYNAMIC PROGRAMMING - Bang-bang Solutions, Stock-management and Unsmooth Penalties

### Olivier Teytaud, Sylvain Gelly

#### Abstract

Many stochastic dynamic programming tasks in continuous action-spaces are tackled through discretization. We here avoid discretization; then, approximate dynamic programming (ADP) involves (i) many learning tasks, performed here by Support Vector Machines, for Bellman-function-regression (ii) many non-linear-optimization tasks for action-selection, for which we compare many algorithms. We include discretizations of the domain as particular non-linear-programming-tools in our experiments, so that by the way we compare optimization approaches and discretization methods. We conclude that robustness is strongly required in the non-linear-optimizations in ADP, and experimental results show that (i) discretization is sometimes inefficient, but some specific discretization is very efficient for ”bang-bang” problems (ii) simple evolutionary tools out-perform quasi-random in a stable manner (iii) gradient-based techniques are much less stable (iv) for most high-dimensional ”less unsmooth” problems Covariance-Matrix-Adaptation is first ranked.

#### References

- Auger, A., Jebalia, M., and Teytaud, O. (2005). Xse: quasirandom mutations for evolution strategies. In Proceedings of EA'2005, pages 12-21.
- Bäck, T., Hoffmeister, F., and Schwefel, H.-P. (1991). A survey of evolution strategies. In Belew, R. K. and Booker, L. B., editors, Proceedings of the 4th International Conference on Genetic Algorithms, pages 2-9. Morgan Kaufmann.
- Bäck, T., Rudolph, G., and Schwefel, H.-P. (1993). Evolutionary programming and evolution strategies: Similarities and differences. In Fogel, D. B. and Atmar, W., editors, Proceedings of the 2nd Annual Conference on Evolutionary Programming, pages 11-22. Evolutionary Programming Society.
- Bäck, T. and Schütz, M. (1995). Evolution strategies for mixed-integer optimization of optical multilayer systems. In McDonnell, J. R., Reynolds, R. G., and Fogel, D. B., editors, Proceedings of the 4th Annual Conference on Evolutionary Programming. MIT Press.
- Bertsekas, D. (1995). Dynamic Programming and Optimal Control, vols I and II. Athena Scientific.
- Bertsekas, D. and Tsitsiklis, J. (1996). Neuro-dynamic programming, athena scientific.
- Beyer, H.-G. (2001). The Theory of Evolutions Strategies. Springer, Heidelberg.
- Beyer, H.-G., Olhofer, M., and Sendhoff, B. (2004). On the impact of systematic noise on the evolutionary optimization performance - a sphere model analysis, genetic programming and evolvable machines, vol. 5, no. 4, pp. 327 360.
- Broyden., C. G. (1970). The convergence of a class of double-rank minimization algorithms 2, the new algorithm. j. of the inst. for math. and applications, 6:222- 231.
- Byrd, R., Lu, P., Nocedal, J., and C.Zhu (1995). A limited memory algorithm for bound constrained optimization. SIAM J. Scientific Computing, vol.16, no.5.
- Cervellera, C. and Muselli, M. (2003). A deterministic learning approach based on discrepancy. In Proceedings of WIRN'03, pp53-60.
- Collobert, R. and Bengio, S. (2001). Svmtorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1:143-160.
- Conn, A., Scheinberg, K., and Toint, L. (1997). Recent progress in unconstrained nonlinear optimization without derivatives.
- DeJong, K. A. (1992). Are genetic algorithms function optimizers ? In Manner, R. and Manderick, B., editors, Proceedings of the 2nd Conference on Parallel Problems Solving from Nature, pages 3-13. North Holland.
- Fitzpatrick, J. and Grefenstette, J. (1988). Genetic algorithms in noisy environments, in machine learning: Special issue on genetic algorithms, p. langley, ed. dordrecht: Kluwer academic publishers, vol. 3, pp. 101 120.
- Fletcher, R. (1970). A new approach to variable-metric algorithms. computer journal, 13:317-322.
- Gagné, C. (2005). Openbeagle 3.1.0-alpha.
- Gelly, S., Ruette, S., and Teytaud, O. (2006). Comparisonbased algorithms: worst-case optimality, optimality w.r.t a bayesian prior, the intraclass-variance minimization in eda, and implementations with billiards. In PPSN-BTP workshop.
- Goldfarb, D. (1970). A family of variable-metric algorithms derived by variational means. mathematics of computation, 24:23-26.
- Hansen, N. and Ostermeier, A. (1996). Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaption. In Proc. of the IEEE Conference on Evolutionary Computation (CEC 1996), pages 312-317. IEEE Press.
- Hickernell, F. J. (1998). A generalized discrepancy and quadrature error bound. Mathematics of Computation, 67(221):299-322.
- Hooke, R. and Jeeves, T. A. (1961). Direct search solution of numerical and statistical problems. Journal of the ACM, Vol. 8, pp. 212-229.
- Jin, Y. and Branke, J. (2005). Evolutionary optimization in uncertain environments. a survey, ieee transactions on evolutionary computation, vol. 9, no. 3, pp. 303 317.
- Kaupe, A. F. (1963). Algorithm 178: direct search. Commun. ACM, 6(6):313-314.
- Keijzer, M., Merelo, J. J., Romero, G., and Schoenauer, M. (2001). Evolving objects: A general purpose evolutionary computation library. In Artificial Evolution, pages 231-244.
- LaValle, S. M., Branicky, M. S., and Lindemann, S. R. (2004). On the relationship between classical grid search and probabilistic roadmaps. I. J. Robotic Res., 23(7-8):673-692.
- L'Ecuyer, P. and Lemieux, C. (2002). Recent advances in randomized quasi-monte carlo methods. pages 419- 474.
- Lindemann, S. R. and LaValle, S. M. (2003). Incremental low-discrepancy lattice methods for motion planning. In Proceedings IEEE International Conference on Robotics and Automation, pages 2920-2927.
- Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. SIAM.
- Owen, A. (2003). Quasi-Monte Carlo Sampling, A Chapter on QMC for a SIGGRAPH 2003 course.
- Sendhoff, B., Beyer, H.-G., and Olhofer, M. (2004). The influence of stochastic quality functions on evolutionary search, in recent advances in simulated evolution and learning, ser. advances in natural computation, k. tan, m. lim, x. yao, and l. wang, eds. world scientific, pp 152-172.
- Shanno, D. F. (1970.). Conditioning of quasi-newton methods for function minimization. mathematics of computation, 24:647-656.
- Sloan, I. and Wozniakowski, H. (1998). When are quasiMonte Carlo algorithms efficient for high dimensional integrals? Journal of Complexity, 14(1):1-33.
- Sutton, R. and Barto, A. (1998). Reinforcement learning: An introduction. MIT Press., Cambridge, MA.
- Tsutsui, S. (1999). A comparative study on the effects of adding perturbations to phenotypic parameters in genetic algorithms with a robust solution searching scheme, in proceedings of the 1999 ieee system, man, and cybernetics conference smc 99, vol. 3. ieee, pp. 585 591.
- Tuffin, B. (1996). On the use of low discrepancy sequences in monte carlo methods. In Technical Report 1060, I.R.I.S.A.
- Wasilkowski, G. and Wozniakowski, H. (1997). The exponent of discrepancy is at most 1.4778. Math. Comp, 66:1125-1132.
- Wright, M. (1995). Direct search methods: Once scorned, now respectable. Numerical Analysis (D. F. Griffiths and G. A. Watson, eds.), Pitman Research Notes in Mathematics, pages 191-208. http://citeseer.ist.psu.edu/wright95direct.html.
- Zhu, C., Byrd, R., P.Lu, and Nocedal, J. (1994). L-BFGS-B: a limited memory FORTRAN code for solving bound constrained optimization problems. Technical Report, EECS Department, Northwestern University.

#### Paper Citation

#### in Harvard Style

Teytaud O. and Gelly S. (2007). **NONLINEAR PROGRAMMING IN APPROXIMATE DYNAMIC PROGRAMMING - Bang-bang Solutions,
Stock-management and Unsmooth Penalties** . In *Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,* ISBN 978-972-8865-82-5, pages 47-54. DOI: 10.5220/0001645800470054

#### in Bibtex Style

@conference{icinco07,

author={Olivier Teytaud and Sylvain Gelly},

title={NONLINEAR PROGRAMMING IN APPROXIMATE DYNAMIC PROGRAMMING - Bang-bang Solutions,
Stock-management and Unsmooth Penalties},

booktitle={Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},

year={2007},

pages={47-54},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0001645800470054},

isbn={978-972-8865-82-5},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,

TI - NONLINEAR PROGRAMMING IN APPROXIMATE DYNAMIC PROGRAMMING - Bang-bang Solutions,
Stock-management and Unsmooth Penalties

SN - 978-972-8865-82-5

AU - Teytaud O.

AU - Gelly S.

PY - 2007

SP - 47

EP - 54

DO - 10.5220/0001645800470054