lative simple one. Our setting of learning methods
relatively depend s on the random walk of ε-greedy
heuristics because of the activities of targets. In ad-
dition, we set a number of ta rgets so that almost half
of the areas are occupied, since unfairness depends on
resource capacities. Investigations of different classes
of problems will be future works.
Several studies such a s Nash-Q learning (Hu an d
Wellman, 2003) have addressed the individuality of
agents. While such studies mainly foc us on selfish
agents, we are interested in co operative a ctions by
considerin g u nfairness. As a first study, we addressed
the effects and influence of action selection based on
leximin that can be applied in a dec e ntralized man ner.
On the other hand, the results reveal the necessity
of dedicate d learning rules; in simple cases our pro-
posed approach has some effects. Since unfairness
depends on the values in Q-tables, more discussion s
for the case of leximin are necessary. In particular,
some normalization methods, for different progre sses
of leaning in individual agents, are possibly important
for the case of fairness.
We employed an exact solution method to select
joint ac tions. However, for large and c omplex pro-
blems, approximation methods are necessary, since
the time and space complexity of the exact method ex-
ponen tially incr eases with the nu mber of separators.
Such approximation is a lso considered as a ch allen-
ging problem.
7 CONCLUSIONS
We addressed action selection b ased on unfairness
among agents in a decentralized reinfo rcement lear-
ning framework and experimentally investigated the
effect and the influen ce of leximin criterion in action
selection. Even thoug h the proposed approach ef-
fectively worked in relatively simple settings, our re-
sults uncovered several exploration and learning is-
sues. Our future works will analyze the rela tionship
between the proposed cooperative action and learning
rules and applications to other problem domains. Im-
provement of learning rules to man a ge informatio n
of unfairness, and scalable solution methods for joint
action selection will also be im portant challenges.
ACKNOWLEDGEMENTS
This work was supported in par t by JSPS KA K ENHI
Grant Number JP16K00301.
REFERENCES
Bouveret, S. and Lemaˆıtre, M. (2009). Computing leximin-
optimal solutions in constraint networks. Artificial In-
telligence, 173(2):343–364.
Farinelli, A., Rogers, A., Petcu, A., and Jennings, N. R.
(2008). Decentralised coordination of low-power em-
bedded devices using the max-sum algorithm. In 7th
International Joint Conference on Autonomous Agents
and Multiagent Systems, pages 639–646.
Hu, J. and Wel lman, M. P. (2003). Nash q-learning for
general-sum stochastic games. J. Mach. Learn. Res.,
4:1039–1069.
Matsui, T. and Matsuo, H. (2014). Complete dist r ibuted
search algorithm for cyclic factor graphs. In 6th In-
ternational Conference on Agents and Artificial Intel-
ligence, pages 184–192.
Matsui, T., Silaghi, M., Hirayama, K., Yokoo, M., and Mat-
suo, H. (2014). Leximin multi ple objective optimiza-
tion for preferences of agents. In 17th International
Conference on Principles and Practice of Multi-Agent
Systems, pages 423–438.
Matsui, T., Silaghi, M., Okimoto, T., Hirayama, K., Yokoo,
M., and Matsuo, H. (2015). Leximin asymmetric mul-
tiple objective DCOP on factor graph. In Principles
and Practice of Multi-Agent Systems - 18th Internati-
onal Conference, pages 134–151.
Modi, P. J., Shen, W., Tambe, M., and Yokoo, M. (2005).
Adopt: Asynchronous distributed constraint optimi-
zation with quality guarantees. Artificial Intelligence,
161(1-2):149–180.
Moulin, H. (1988). Axioms of Cooperative Decision Ma-
king. Cambridge : Cambridge University Press.
Netzer, A. and Meisels, A. (2013a). Distributed Envy Mi-
nimization for Resource Allocation. In 5th Internati-
onal Conference on Agents and Artificial Intelligence,
volume 1, pages 15–24.
Netzer, A. and Meisels, A. (2013b). Distributed Local Se-
arch for Minimizing Envy. I n 2013 IEEE/WIC/ACM
International Conference on Intelligent Agent Techno-
logy, pages 53–58.
Nguyen, D. T., Yeoh, W., Lau, H. C., Zilberstein, S., and
Zhang, C. (2014). Decentralized multi-agent r einfor-
cement l earning in average-reward dynamic dcops. In
28th AAAI Conference on Artificial Intelligence, pages
1447–1455.
Petcu, A. and Faltings, B. (2005). A scalable method for
multiagent constraint optimization. In 19th Internati-
onal Joint Conference on Artificial Intelligence, pages
266–271.
Zhang, C. and Lesser, V. (2011). Coordinated multi-
agent reinforcement learning in networked distributed
pomdps. In 25th AAAI Conference on Artificial Intel-
ligence, pages 764–770.
Zivan, R. (2008). Anytime local search for distri buted con-
straint optimization. In Twenty-Third AAAI Confe-
rence on Artificial Intelligence, pages 393–398.