Beni, G. and Wang, J. (1993). Swarm intelligence in cellular
robotic systems. In Robots and B iological Systems:
Towards a New Bionics?, pages 703–712. Springer.
Chrisman, L. (1992). Reinforcement learning with percep-
tual aliasing: The perceptual distinctions approach. In
AAAI, pages 183–188.
Grefenstette, J. J. (1988). Credit assignment in rule disco-
very systems based on genetic algorithms. Machine
Learning, 3(2-3):225–245.
Iima, H. and Kuroe, Y. (2008). Swarm reinforcement le-
arning algorithms based on sarsa method. In SICE
Annual Conference, 2008, pages 2045–2049. I EEE.
Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996).
Reinforcement l earning: A survey. Journal of artifi-
cial intelligence research, 4:237–285.
McCallum, R. A. (1993). Overcoming incomplete percep-
tion with utile distinction memory. In Proceedings of
the Tenth International Conference on Machine Lear-
ning, pages 190–196.
McCallum, R. A. (1995). Instance-based utile distinctions
for reinforcement learning with hidden state. In ICML,
pages 387–395.
Miyazaki, K. and Kobayashi, S. (2003). An extention of
profit sharing to partially observable markov decision
processes : Proposition of ps-r* and its evaluation.
Journal of Japanese Society for Artificial Intelligence,
18(5):286–296.
Miyazaki, K., Yamamura, M., and Kobayashi, S. (1994).
A theory of profit sharing i n reinforcement learning.
Journal of Japanese Society for Artificial Intelligence,
9(4):580–587. (in Japanese).
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T.,
Harley, T., Silver, D., and Kavukcuoglu, K. (2016).
Asynchronous methods for deep reinforcement lear-
ning. In International conference on machine lear-
ning, pages 1928–1937.
Nomura, T. and Kato, S. (2015). Dynamic subgoal genera-
tion using evolutionary computation for reinforcement
learning under pomdp. International Symposium on
Artificial Life and Robotics, 2015(22):322–327.
Peng, J. and Williams, R. J. (1994). Incremental multi-st ep
q-learning. In Machine Learning Proceedings 1994,
pages 226–232. Elsevier.
Poupart, P., Vlassis, N., Hoey, J., and Regan, K. (2006).
An analytic solution t o discrete bayesian reinforce-
ment learning. In Proceedings of the 23rd internatio-
nal conference on Machine learning, pages 697–704.
ACM.
Ross, S., Chaib-draa, B., and Pineau, J. (2008). Bayesian
reinforcement learning in continuous pomdps. In In-
ternational Conference on Robotics and Automation
(ICRA).
Sallab, A. E., Abdou, M., Perot, E., and Yogamani,
S. (2017). Deep reinforcement learning frame-
work for autonomous driving. Electronic Imaging,
2017(19):70–76.
Sridharan, M., Wyatt, J., and Dearden, R. (2010). Plan-
ning to see: A hierarchical approach to planning vi-
sual actions on a robot using pomdps. Artificial Intel-
ligence, 174(11):704–725.
Sutton, R. S. (1996). Generalization in reinforcement le-
arning: Successful examples using sparse coarse co-
ding. In Advances in neural information processing
systems, pages 1038–1044.
Suzuki, K. and Kato, S. ( 2017). Hybrid learning using profit
sharing and genetic algorithm for partially observable
markov decision processes. In International Confe-
rence on Network-Based Information Systems, pages
463–475. Springer.
Thomson, B. and Young, S. (2010). Bayesian update of dia-
logue state: A pomdp framework for spoken dialogue
systems. Computer Speech & Language, 24(4):562–
588.
Uemura, W., Ueno, A., and Tatsumi, S. (2005). An episode-
based profit sharing method for pomdps. IEICE Tran-
sactions on Fundamentals of Electronics, Communi-
cations and Computer Sciences, 88(6):761–774. (in
Japanese).
Watkins, C. J. and Dayan, P. (1992). Q-learning. Machine
learning, 8(3-4):279–292.
Whitehead, S. D. and Ballard, D. H. (1990). Active percep-
tion and reinforcement learning. Neural Computation,
2(4):409–419.
Whitehead, S. D. and Ballard, D. H. (1991). Learning to
perceive and act by trial and error. Machine Learning,
7(1):45–83.
Wiering, M. and Schmidhuber, J. (1997). Hq-learning.
Adaptive Behavior, 6(2):219–246.
Wiering, M. and Schmidhuber, J. (1998). Fast online q (λ).
Machine Learning, 33(1):105–115.
Yamamura, M., Miyazaki, K., and Kobayashi, S. (1995). A
survey on learning for agents. The Japanese Society
for Artificial I ntelligence, 10(5):683–689.