Reinforcement Learning for Multi-purpose Schedules

Kristof Van Moffaert, Yann-Michaël De Hauwere, Peter Vrancx, Ann Nowé


In this paper, we present a learning technique for determining schedules for general devices that focus on a combination of two objectives. These objectives are user-convenience and gains in energy savings. The proposed learning algorithm is based on Fitted-Q Iteration (FQI) and analyzes the usage and the users of a particular device to decide upon the appropriate profile of start-up and shutdown times of that equipment. The algorithm is experimentally evaluated on real-life data to discover that close-to-optimal control policies can be learned on a short timespan of a only few iterations. Our results show that the algorithm is capable of proposing intelligent schedules depending on which objective the user placed more or less emphasis on.


  1. Busoniu, L., Babuska, R., De Schutter, B., and Ernst, D. (2010). Reinforcement Learning and Dynamic Programming using Function Approximators, volume 39 of Automation and Control Engineering Series. CRC Press.
  2. Castelletti, A., Pianosi, F., and Restelli, M. (2012). Treebased fitted q-iteration for multi-objective markov decision problems. In Proceedings International Joint Conference on Neural Networks (IJCNN 2012).
  3. Dalamagkidis, K., Kolokotsa, D., Kalaitzakis, K., and Stavrakakis, G. (2007). Reinforcement learning for energy conservation and comfort in buildings. Building and Environment, 42(7):2686 - 2698.
  4. Ernst, D., Geurts, P., and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556.
  5. Haight, F. (1967). Handbook of the Poisson distribution. Publications in operations research. Wiley.
  6. Khalili, A., Wu, C., and Aghajan, H. (2009). Autonomous learning of users preference of music and light services in smart home applications. In Proceedings Behavior Monitoring and Interpretation Workshop at German AI Conf.
  7. Liu, Z. and Elhanany, I. (2006). A reinforcement learning based mac protocol for wireless sensor networks. Int. J. Sen. Netw., 1(3/4):117-124.
  8. Mihaylov, M., Tuyls, K., and Nowé, A. (2010). Decentralized learning in wireless sensor networks. Lecture Notes in Computer Science, 4865:60-73.
  9. Riedmiller, M. (2005). Neural fitted q iteration - first experiences with a data efficient neural reinforcement learning method. In In 16th European Conference on Machine Learning, pages 317-328. Springer.
  10. Tsitsiklis, J. (1994). Asynchronous stochastic approximation and q-learning. Journal of Machine Learning, 16(3):185-202.
  11. Watkins, C. (1989). Learning from Delayed Rewards. PhD thesis, University of Cambridge.

Paper Citation

in Harvard Style

Van Moffaert K., De Hauwere Y., Vrancx P. and Nowé A. (2013). Reinforcement Learning for Multi-purpose Schedules . In Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-8565-39-6, pages 203-209. DOI: 10.5220/0004187202030209

in Bibtex Style

author={Kristof Van Moffaert and Yann-Michaël De Hauwere and Peter Vrancx and Ann Nowé},
title={Reinforcement Learning for Multi-purpose Schedules},
booktitle={Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},

in EndNote Style

JO - Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Reinforcement Learning for Multi-purpose Schedules
SN - 978-989-8565-39-6
AU - Van Moffaert K.
AU - De Hauwere Y.
AU - Vrancx P.
AU - Nowé A.
PY - 2013
SP - 203
EP - 209
DO - 10.5220/0004187202030209