Reinforcement Learning for Multi-purpose Schedules
Kristof Van Moffaert, Yann-Michaël De Hauwere, Peter Vrancx, Ann Nowé
2013
Abstract
In this paper, we present a learning technique for determining schedules for general devices that focus on a combination of two objectives. These objectives are user-convenience and gains in energy savings. The proposed learning algorithm is based on Fitted-Q Iteration (FQI) and analyzes the usage and the users of a particular device to decide upon the appropriate profile of start-up and shutdown times of that equipment. The algorithm is experimentally evaluated on real-life data to discover that close-to-optimal control policies can be learned on a short timespan of a only few iterations. Our results show that the algorithm is capable of proposing intelligent schedules depending on which objective the user placed more or less emphasis on.
References
- Busoniu, L., Babuska, R., De Schutter, B., and Ernst, D. (2010). Reinforcement Learning and Dynamic Programming using Function Approximators, volume 39 of Automation and Control Engineering Series. CRC Press.
- Castelletti, A., Pianosi, F., and Restelli, M. (2012). Treebased fitted q-iteration for multi-objective markov decision problems. In Proceedings International Joint Conference on Neural Networks (IJCNN 2012).
- Dalamagkidis, K., Kolokotsa, D., Kalaitzakis, K., and Stavrakakis, G. (2007). Reinforcement learning for energy conservation and comfort in buildings. Building and Environment, 42(7):2686 - 2698.
- Ernst, D., Geurts, P., and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556.
- Haight, F. (1967). Handbook of the Poisson distribution. Publications in operations research. Wiley.
- Khalili, A., Wu, C., and Aghajan, H. (2009). Autonomous learning of users preference of music and light services in smart home applications. In Proceedings Behavior Monitoring and Interpretation Workshop at German AI Conf.
- Liu, Z. and Elhanany, I. (2006). A reinforcement learning based mac protocol for wireless sensor networks. Int. J. Sen. Netw., 1(3/4):117-124.
- Mihaylov, M., Tuyls, K., and Nowé, A. (2010). Decentralized learning in wireless sensor networks. Lecture Notes in Computer Science, 4865:60-73.
- Riedmiller, M. (2005). Neural fitted q iteration - first experiences with a data efficient neural reinforcement learning method. In In 16th European Conference on Machine Learning, pages 317-328. Springer.
- Tsitsiklis, J. (1994). Asynchronous stochastic approximation and q-learning. Journal of Machine Learning, 16(3):185-202.
- Watkins, C. (1989). Learning from Delayed Rewards. PhD thesis, University of Cambridge.
Paper Citation
in Harvard Style
Van Moffaert K., De Hauwere Y., Vrancx P. and Nowé A. (2013). Reinforcement Learning for Multi-purpose Schedules . In Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-8565-39-6, pages 203-209. DOI: 10.5220/0004187202030209
in Bibtex Style
@conference{icaart13,
author={Kristof Van Moffaert and Yann-Michaël De Hauwere and Peter Vrancx and Ann Nowé},
title={Reinforcement Learning for Multi-purpose Schedules},
booktitle={Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2013},
pages={203-209},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004187202030209},
isbn={978-989-8565-39-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Reinforcement Learning for Multi-purpose Schedules
SN - 978-989-8565-39-6
AU - Van Moffaert K.
AU - De Hauwere Y.
AU - Vrancx P.
AU - Nowé A.
PY - 2013
SP - 203
EP - 209
DO - 10.5220/0004187202030209