# Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

### Saba Q. Yahyaa, Madalina M. Drugan, Bernard Manderick

#### Abstract

We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandits problem, where KG outperforms UCB1.

#### References

- Drugan, M. and Nowe, A. (2013). Designing multiobjective multi-armed bandits algorithms: A study. In Proceedings of the International Joint Conference on Neural Networks (IJCNN).
- Eichfelder, G. (2008). Adaptive Scalarization Methods in Multiobjective Optimization. Springer-Verlag Berlin Heidelberg, 1st edition.
- I.O. Ryzhov, W. P. and Frazier, P. (2011). The knowledgegradient policy for a general class of online learning problems. Operation Research.
- Miettinen, K. (1999). Nonlinear Multiobjective Optimization. Springer, illustrated edition.
- P. Auer, N. C.-B. and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235-256.
- P.I. Frazier, W. P. and Dayanik, S. (2008). A knowledgegradient policy for sequential information collection. SIAM J. Control and Optimization, 47(5):2410-2439.
- Powell, W. B. (2007). Approximate Dynamic Programming: Solving the Curses of Dimensionality. John Wiley and Sons, New York, USA, 1st edition.
- Sutton, R. and Barto, A. (1998). Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). The MIT Press, Cambridge, MA, 1st edition.
- Yahyaa, S. and Manderick, B. (2012). The exploration vs exploitation trade-off in the multi-armed bandit problem: An empirical study. In Proceedings of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). ESANN.
- Zitzler, E. and et al. (2002). Performance assessment of multiobjective optimizers: An analysis and review. IEEE Transactions on Evolutionary Computation, 7:117-132.

#### Paper Citation

#### in Harvard Style

Q. Yahyaa S., M. Drugan M. and Manderick B. (2014). **Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms** . In *Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,* ISBN 978-989-758-015-4, pages 74-83. DOI: 10.5220/0004796600740083

#### in Bibtex Style

@conference{icaart14,

author={Saba Q. Yahyaa and Madalina M. Drugan and Bernard Manderick},

title={Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms},

booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},

year={2014},

pages={74-83},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0004796600740083},

isbn={978-989-758-015-4},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,

TI - Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

SN - 978-989-758-015-4

AU - Q. Yahyaa S.

AU - M. Drugan M.

AU - Manderick B.

PY - 2014

SP - 74

EP - 83

DO - 10.5220/0004796600740083