COALITION FORMATION WITH UNCERTAIN TASK EXECUTION
Hosam Hanna
GREYC - University of Caen
Bd Mar
´
echal Juin
14032 Caen - France
Keywords:
Coalition formation, Uncertainty, Group decision, Markov decision process.
Abstract:
We address the problem of coalition formation in environments where tasks’ executions are uncertain. Al-
though previous works provide good solutions for coalition formation problem, the uncertain task execution
problem is not taken into account. In environments where task execution is uncertain, an agent can’t be sure
whether he will be able to execute all the subtasks that are allocated to him or he will ignore some of them.
That is why forming coalition to maximize the real reward is an unrealizable operation. In this paper, we
propose a theoretical approach to form coalition with uncertain task execution. We view the formation of a
coalition to execute a task as (1) a decision to make and (2) as an uncertain source of gain. We associate then
the allocation of a task to a coalition with an expected reward that represents what agents expect to gain by
forming this coalition to execute this task. Also, the agents’ aim is to form coalition to maximize the expected
reward instead of the real reward. To reach this objective, we formalize the coalition formation problem by
a Markov Decision Process (MDP). We consider the situation where decisions are taken by one agent that
develops and solves the corresponding MDP. An optimal coalition formation which maximizes the agents’
expected reward is then obtained.
1 INTRODUCTION
Coalition formation is an important cooperation
methode for applications where an agent can’t effi-
ciently execute a task by himself. The coalition for-
mation problem has widely been studied and many
approaches have been proposed. In game theory, we
find some works that treated this problem without
taking into account the limited time calculation (Au-
mann, 1959), (Bernheim et al., 1987), and (Kahan and
Rapoport, 1984). In cooperative environments, many
algorithms were suggested to answer the question of
group formation (Shehory and Kraus, 1998). In mul-
tiagent systems, there are several coalition formation
mechanisms that include a protocol as well as strate-
gies to be implemented by agents given the proto-
col (Klusch and Shehory, 1996), (Shehory and Kraus,
1998), (Zlotkin and Rosenschein, 1994), (Learman
and Shehory, 2000). All these works have common
assumptions: resources consumption is perfectly con-
trolled by agents and the formation of a coalition to
execute a task is a certain source of reward. In other
words, an agent can exactly determine the quantity
of resources he will consume to execute any subtask,
and the formation of a coalition to execute a task
is sufficient to obtain the corresponding reward. In
this study, we relax this assumption in order to adapt
coalition formation to more real cases, and we inves-
tigate the problem of formation coalition in environ-
ments where agents have uncertain behaviors.
Several works have investigated the coalition for-
mation problem where coalition value is uncertain
or known only to a limited degree of certainty. In
(Ketchpel, 1994), author considered the case where
agents do not have access to coalition value function,
and he proposed a two-agents auction mechanism that
allows to determine coalitions of agents that will work
together, and to decide how to reward the agents. In
(Blankenburg et al., 2003), authors studied situations
where coalition value is known only to a limited de-
gree of certainty. They proposed to use fuzzy quan-
tities instead of real numbers in order to express the
coalitions value. A fuzzy Kernel concept has been in-
troduced in order to yield stable solutions. Although
the complexity of the fuzzy kernel is exponential, it
has been shown that this complexity can be reduced
to polynomial complexity by placing a cap on the size
of coalitions. The uncertainty on coalition value can
be due to the unknown execution cost. In fact, when
agents reason in term of utility, the net benefits of a
164
Hanna H. (2006).
COALITION FORMATION WITH UNCERTAIN TASK EXECUTION.
In Proceedings of the Eighth International Conference on Enterprise Information Systems - AIDSS, pages 164-169
DOI: 10.5220/0002459601640169
Copyright
c
SciTePress
coalition is defined as the coalition value minus the
execution cost of all the coalition’s members. When
an agent of the coalition does not know with certainty
the execution costs of the other members, it is uncer-
tain regarding both the coalition’s net benefits and its
net benefits. A protocol allowing agents to negoti-
ate and form coalition in such a case has been pro-
posed in (Kraus et al., 2003) and (Kraus et al., 2004).
Another source of uncertainty on coalition value can
be the imperfect or deceiving information. A study
for this case has been proposed in (Blankenburg and
Klusch, 2004). In (Chalkiadakis and Boutilier, 2004),
authors proposed a reinforcement learning model to
allow agents to refine their beliefs about others’ capa-
bilities.
Although these previous works deal with an impor-
tant uncertainty issue (uncertain coalition value), they
have several restrictive assumptions regarding another
possible sources of uncertainty as the uncertain re-
sources consumption (uncertain task execution) that
can be due to the uncertain agent’s behavior and to
the environment’s dynamism. In addition, they do not
take into account the effects of forming a coalition on
the future possible formations, a long-term coalition
formation planning can not then be provided. In ap-
plications as planetary rovers, for example, an agent is
confronted with an ambiguous environment where he
can not control his resources consumption when exe-
cuting tasks as good as he does in laboratory. A coali-
tion formation planning is important so that agents
adapt coalition formation to their uncertain behaviors.
The problem is more complex when resources con-
sumption is uncertain for all the agents. Unfor-
tunately, in such a system, an agent can’t be sure
whether he (or another agent) will be able to execute
all the subtasks that are allocated to him or he will
ignore some of them. So, forming coalitions to max-
imize the agents’ real reward is a complex (even un-
realizable) operation. In fact, a task is considered as
non executed if at least one of its subtasks is not ex-
ecuted. That is why, forming a coalition to execute
a task is a necessary but not sufficient constraint to
obtain a reward, and the agents’ reward must be sub-
jected to the task execution and not only to the coali-
tion formation and task allocation. In this paper, we
take into account these issues and we present a prob-
abilistic model, based on Markov Decision Process
(MDP), that provides a coalition formation planning
for environments where resources consumption is un-
certain. We will show that according to each possible
resources consumption, agents can decide by an opti-
mal way which coalition they must form.
We begin in Section 2 with a presentation of our
framework. In section 3, we sketch our solution ap-
proach. We explain how to form coalition via MDP
in Section 4.
2 FRAMEWORK
We consider a situation where a set of m fully-
cooperative agents, A = {a
1
,...,a
m
} have to coop-
erate to execute a finite set of tasks T = {T
1
,...,T
n
}
in an uncertain environment. The tasks will be
allocated in a commonly known order: without
loss of generality, we assume that this ordering is
T
1
,T
2
, ··· ,T
n
. Each agent a
k
has a bounded quan-
tity of resources R
k
that he uses to execute tasks.
Each task consists of subtasks: for simplicity, we
assume that every task T
i
T is composed by q
subtasks such as T
i
= {t
1
i
,...,t
q
i
}. Agent a
k
is
able to perform only a subset E
k
i
T
i
of the sub-
tasks of a given task T
i
. We assume that each task
T
i
T satisfies the condition T
i
⊆∪
a
k
A
E
k
i
, oth-
erwise it is an unrealizable task. For each subtask
t
l
i
,T
i
, T,l =1,...,q, we can define the set of
agents, AE(t
l
i
), that are able to perform t
l
i
as follows:
AE(t
l
i
)={a
k
A|t
l
i
E
k
i
}. Since an agent can’t
execute a task T
i
by himself, a coalition of agents
must be formed in order to execute this task. Such
a coalition can be defined as a qtuple: a
1
,...,a
q
where agent a
l
A executes subtask t
l
i
E
a
l
i
.We
let C(T
i
) denote the set of all possible coalitions that
can perform task T
i
, it can be defined as follows:
C(T
i
)={a
1
,...,a
q
|a
l
A, t
l
i
T
i
,t
l
i
E
a
l
i
,l =
1,...,q}. A task is considered as realized if and only
if all its subtasks have been performed. For each re-
alized task T
i
, agents obtain a reward. We consider
a general situation where the tasks can be executed
with different qualities. For example, two agents can
take photos for the same object, but the resolution can
be different. The reward corresponding to the execu-
tion of a task depends then on the coalition that ex-
ecutes the task. We assume that agents have a func-
tion w(T
i
,c) that expresses the reward that can be ob-
tained if the coalition c executes task T
i
.
3 SOLUTION APPROACH
The key idea, in our approach, is to view the forma-
tion of a coalition to execute a task as a decision to
make that provides an expected reward instead of a
real gain. What one expects to gain by forming col-
lation c to execute task T
i
? In fact, when T
i
is allo-
cated to c, the agents expect to obtain two values. The
first one is the value w(T
i
,c) which is subjected to
the execution of task T
i
. The second expected value
expresses the gain that can be obtained from future
formation and allocation taking into consideration re-
sources quantity consumed to execute T
i
. Indeed,
when a coalition executes a task, the agents’ available
resources is reduced. The chances to execute another
COALITION FORMATION WITH UNCERTAIN TASK EXECUTION
165
tasks can then be reduced. As the resources collection
consumed to execute task T
i
depends on the coalition
c executing T
i
, the gain the agents can obtain from
future formation and allocation depends also on coali-
tion c. Finally, the expected reward associated to the
formation of a coalition to execute a task is sum of
these two expected values. It is necessary to recall
here that our expected reward definition is different
from the expected coalition value defined in (Chalki-
adakis and Boutilier, 2004) for a Bayesian reinforce-
ment learning model. In fact, the expected coalition
value notion is used to express what an agent, basing
on his expectation regarding the capabilities of other
agents, beliefs about the value of any coalition. In
addition, this notion doesn’t allow agents to take into
account the impact of the formation of a coalition on
the gain that can be obtained from the formation of
another coalitions (our second expected value).
Differently from known coalition formation meth-
ods that maximize the agents’ real gain
1
, the goal of
our agents is defined as follows: for each task T
i
, form
a coalition c by such a way that it maximizes agents’
long-term expected reward. To realize this objective,
we have to treat the uncertain resources consumption
and to formalize the expected reward associated to
coalition formation. We will use a discreet representa-
tion of resources consumption and then define an exe-
cution probability distribution. Finally, we formalize
the coalition formation problem by a Markov deci-
sion process (MDP). It is well known that solving a
MDP allows to determine an optimal policy maximiz-
ing the long-term expected reward (Bellman, 1957;
Puterman, 1994).
3.1 Uncertain Resource
Consumption
In order to deal with the uncertain resources con-
sumption, we assume that the execution of subtask
t
l
i
T
i
by agent a
k
can consume one quantity of re-
sources from a finite set R
t
l
i
k
of possible quantities of
resources. For simplicity, we assume that there are p
resources quantities in the set R
t
l
i
k
. Agent a
k
doesn’t
know which quantity of resources will be consumed,
but he can anticipate it using some probability distri-
bution:
Definition 3.1 With each agent a
k
A is associ-
ated an execution probability distribution PE
k
where
t
l
i
T
i
, r R
t
l
i
k
,PE
k
(r, t
l
i
) represents the proba-
bility to consume the resources quantity r at the time
of the execution of subtask t
l
i
by agent a
k
.
1
Or another type of gain that doesn’t include the impact
of the formation of a coalition on the future formations.
If a coalition c = a
1
,...,a
q
∈C(T
i
) executes task
T
i
, a resources collection such as r
1
,...,r
q
can be
consumed, where agent a
k
consumes quantity r
k
to
perform subtask t
k
i
. Since one of p resources quan-
tities can be consumed by each agent a
k
to execute
subtask t
k
i
, then the execution of T
i
by c consumes
one collection from p
q
resources collections. We let
H
c
i
denote the set of all these resources collection.
The probability Pr(r
1
,...,r
q
,T
i
) to consume col-
lection r
1
,...,r
q
∈H
c
i
at the time of the execution
of T
i
by c is then the probability that each agent a
k
consumes the quantity r
k
. Using definition (3.1), this
probability can be defined as follows
2
:
Pr(r
1
,...,r
q
,T
i
)=
q
k=1
PE
a
k
(r
k
,t
k
i
) (1)
3.2 Coalition Expected Reward
In our context, a specific agent, “controller”, is
charged to form coalitions and to allocate tasks. Con-
troller views the formation of a coalition to execute a
task as a decision to make. When such a decision is
made, a coalition is formed, a task is allocated to this
coalition, and a resources collection will be consumed
to execute the allocated task. As we have shown in
Section 3, the decision to form a coalition to execute
a task is associated with an expected reward. In the
following, we show how controller can calculate this
expected reward.
The controller observes the state of the system as
the couple of available resources of all the agents and
the set of formed coalitions and allocated tasks. Be-
ing in a state S, the decision that consists in forming a
coalition c to execute a task T
i
drives the system into
a new state S
h
in which task T
i
has been allocated to
coalition c and a resources collection h H
c
i
is an-
ticipated to be consumed when c executes T
i
.Inor-
der to take into account the uncertain task execution,
controller must anticipate all the possible resources
collections that can be consumed when c executes T
i
;
each possible consumption drives the system into a
different state. If agents of coalition c have enough
resources to execute T
i
(collection h is less than c’s
agents available resources), then the system receives
in state S
h
an immediate gain w(T
i
,c) (first expected
value), else it receives zero. From state S
h
another
decision can be made and another reward can be so
obtained (second expected value). We let V [S
h
] de-
note the gain in state S
h
and we define it as the sum
2
Since PE
k
is a distribution probability on R
t
l
i
k
,we
have
PE
k
(r, t
l
i
)=1, r R
t
l
i
k
. It is easy to ver-
ify that Pr represents a distribution probability on H
c
i
:
hH
c
i
Pr(h, T
i
)=1.
ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
166
of both last rewards (see Section 4.3 for mathemat-
ical definition). Being in state S, the probability to
gain V [S
h
], if coalition c is formed to execute T
i
,
is expressed by the probability to consume resources
collection h because the system reaches state S
h
if
collection h has been consumed. This probability is
defined by equation 1. We can say now that being
in state S the decision to form coalition c to execute
T
i
drives to state S
h
and allows to gain V [S
h
] with
probability Pr(h, T
i
), where h H
c
i
. The expected
reward of this decision can be defined as follows:
E(Forming c to execute T
i
) =
hH
c
i
Pr(h, T
i
)×V [S
h
]
(2)
We note that the expected reward associated to a
decision made in the state S depends on the gain that
can be obtained in each state S
h
, and so on. The ques-
tion is then: being in a state S and knowing that there
are |C(T
i
)| coalitions capable to execute T
i
, which
decision controller has to make in order to maximize
his long-term expected reward? To answer this ques-
tion, we formalize our coalition formation problem
using a probabilistic model called Markov Decision
Process (MDP). We will show that the MDP allows to
determine an optimal formation coalition policy that
define for each system state the coalition to form in
order to maximize the system’s long-term expected
reward.
4 COALITION FORMATION
The coalition formation can be viewed as a sequential
decision process. At each step of this process, the de-
cision to form a coalition to execute a task has to be
made. In the next step, another decision concerning
the next task is made, and so on. The formation of
a coalition changes the system’s current state into a
new one. As it has been shown in the previous sec-
tion, the probability to transit between the system’s
current state and a new state only depends on the sys-
tem’s current state and on the made decision. So, this
process is a Markovian one (Papoulis, 1984; Bellman,
1957).
A Markov decision process consists of a set of all
system’s states S, a set of actions AC and a model of
transition (Bellman, 1957). With each state is associ-
ated a reward function and with each action is asso-
ciated an expected reward. In the following, we de-
scribe our MDP via: the states, the actions, the transi-
tion model and the expected reward.
4.1 States Representation
A state S of the set S represents a situation of coali-
tion formation and resources consumption for all the
agents. We let S
i
=
B
i
, R
1
i
,...,R
m
i
denote the
system state at time i where:
B
i
is the set of couples task-coalition represent-
ing the coalition formation until time i: B
i
=
{(T
f
,c
f
)|f =1,...,i, coalition c
f
is formed to
execute task T
f
} ;
R
k
i
,k =1,...,m is the available resources of the
agent a
k
at time i.
At time 0 the system is in the initial state S
0
=
(, R
1
,...,R
m
), where R
k
is the initial resources
of agent a
k
. At time n (number of tasks), system
reaches a final state S
n
where there are no more tasks
to allocate or no more resources to execute tasks.
4.2 Actions and Transition Model
With each state S
i1
∈Sis associated a set of ac-
tions AC(S
i1
) ⊂AC. An action of AC(S
i1
)
consists in forming coalition c C(T
i
) to exe-
cute task T
i
and in anticipating the resources col-
lection which can be consumed to execute T
i
.We
denote such an action by Form(c, T
i
). So, the set
AC(S
i1
) contains |C(T
i
)| actions. Being in state
S
i1
=
B
i1
, R
1
i1
,...,R
m
i1
, the application
of action Form(c, T
i
) drives the system into a new
state S
h
i
which can be any state from the following
states:
S
h
i
=
B
h
i
, R
1
i
,...,R
m
i
(3)
where :
c = a
1
,...,a
q
•∀h = r
1
,...,r
q
∈H
c
i
B
h
i
= B
i1
∪{(c, T
i
)}
•∀a
k
A, a
k
∈ c, R
k
i
= R
k
i1
•∀a
l
= a
k
c,
R
k
i
=
R
k
i1
r
l
, if R
k
i1
r
l
0, if r
l
>R
k
i1
In fact, there are |H
c
i
| possible future states because
the execution of T
i
by coalition c can consume one
resources collection of the set H
c
i
. The case where
r
l
>R
k
i1
corresponds to the situation when agent
a
l
= a
k
try to execute task t
l
i
and he consumes all
his resources R
k
i1
but t
l
i
is not completely performed
because it necessitates more resources (r
l
). The a
l
’s
available resources is then 0 and task T
i
can’t be con-
sidered as a realized task. If cs agents have enough
resource to execute T
i
, an immediate gain equal to
w(T
i
,c) will be received in state S
h
i
. In the other
case (cs agents available resources are not sufficient
to completely execute T
i
), the immediate gain is equal
COALITION FORMATION WITH UNCERTAIN TASK EXECUTION
167
to 0. We let α(S
h
i
) denote the immediate gain in state
S
h
i
, thus:
α(S
h
i
)=
w(T
i
,c), if a
l
= a
k
c, r
l
R
k
i1
0, otherwise: a
l
= a
k
c, r
l
>R
k
i1
(4)
Furthermore, the probability of the transition from
state S
i1
to a state S
h
i
knowing that the action
Form(c, T
i
) is applied can be expressed by the prob-
ability to consume resources collection h by coalition
c, thus Pr(S
h
i
|S
i1
,Form(c, T
i
)) = Pr(h, T
i
). It’s
important to know that state S
h
i
is inevitably differ-
ent from the state S
i1
. In fact, the task to allocate
in S
i1
was T
i
, while in any state S
h
i
,h H
c
i
we
form a coalition to execute task T
i+1
. In other words,
being in a state S at time i, there are no actions that
can drive the system to a state S
which was the sys-
tem’s state at time i
i. Consequently, the devel-
oped MDP doesn’t contain loops, it is a finite horizon
MDP (Sutton and Barto, 1998). This is a very impor-
tant property as we will show in the following.
4.3 Expected Reward
The decision to apply an action depends on the re-
ward that the system expects to obtain by applying
this action. We denote by E(Form(c, T
i
),S
i1
) the
expected reward associated to the action Form(c, T
i
)
applied in state S
i1
. We recall that this expected
reward represents what the system, being in state
S
i1
, expects to gain if coalition c is formed to
execute task T
i
. A policy π to follow is a map-
ping from states to actions. For state S
i1
∈S,
π(S
i1
) is an action from AC(S
i1
) to apply. The
expected reward of a policy π(S
i1
)=Form(c, T
i
)
is E(Form(c, T
i
),S
i1
). An optimal policy is the
policy that maximizes the expected reward at each
state. In state S
i1
an optimal policy π
(S
i1
) is
then the action whose expected reward is maximal.
Formally,
π
(S
i1
) = arg max
cC(T
i
)
{E (Form(c, T
i
) ,S
i1
)}
(5)
Solving equation 5 allows to determine an opti-
mal coalition formation policy at each state S
i1
.
To do this, the expected reward associated to ac-
tion Form(c, T
i
) has to be defined. Defining this
expected reward requires, basing on equation 2, the
definition of the reward associated with each state.
We define the reward V [S
i1
] associated with a state
S
i1
=
B
i1
, R
1
i1
,...,R
m
i1
as an immediate
gain α(S
i1
) accumulated by the expected reward of
the followed policy (reward-to-go). We can formulate
V [S
i1
] and E(Form(c, T
i
),S
i1
) using Bellman’s
equations (Bellman, 1957), thus:
i1
:
V [S
i1
]= α(S
i1
)

immediate gain
+ E(π
(S
i1
))

reward-to-go
according to π
(6)
E(π
(S
i1
)) = max
cC(T
i
)
{E(Form(c, T
i
),S
i1
)}
(7)
E (Form(c, T
i
),S
i1
)=
hH
c
i
Pr(h, T
i
)×V
S
h
i
(8)
where state S
h
i
corresponds to the consumption of
resources collection h.
n
:
V [S
n
]=α(S
n
) (9)
Since the obtained MDP is a finite horizon with no
loops, several known algorithms, as Value Iteration
and Policy Iteration, solve B
ELLMANs equations in a
finite time (Puterman, 1994), and an optimal policy is
obtained.
4.4 Optimal Coalition Formation
An optimal coalition formation can be obtained by
solving B
ELLMANs equations and then applying the
optimal policy at each state starting from initial state
S
0
. Here, we distinguish two cases according to the
execution model. The first case corresponds to the
execution model where tasks must be sequentially ex-
ecuted in the allocation order (T
1
,T
2
,...,T
n
). In this
case, a coalition to execute task T
i+1
is formed at the
end of T
i
s execution. Let π
(S
i1
)=Form(c, T
i
)
be the optimal policy to apply in the state S
i1
. The
application of this policy means that the coalition c
must be formed to execute task T
i
. Assuming that re-
sources collection h has been consumed by c to exe-
cute T
i
, system then reaches the state S
i
= S
h
i
defined
by equation 3. From this new state S
i
, controller ap-
plies the calculated optimal policy π
(S
i
), and so on.
The second case corresponds to the execution
model where controller form all the possible coali-
tions before agents start the execution. In this case,
after each coalition formation, controller has to an-
ticipate the state the system will reach when execut-
ing the allocated task. Let π
(S
i1
)=Form(c, T
i
)
be the optimal policy to apply in the state S
i1
.By
applying this optimal policy, coalition c is formed
to execute T
i
. As the execution is not immedi-
ate, controller anticipates the state S
i
the system
will reach when c executes T
i
. This state S
i
can
be any state S
h
i
,h H
c
i
. The state that the sys-
tem has big chances to reach is the state correspond-
ing to the resources collection that can be consumed
with a maximal probability. Formally, the state S
h
i
ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
168
the system has a big probability to reach when c
executes T
i
is the state corresponding to the con-
sumption of the resources collection h that satisfies:
Pr(h, T
i
) = max
hH
c
i
{Pr(h, T
i
)}. From this new state
S
i
= S
h
i
, controller applies the calculated optimal
policy π
(S
i
), and so on until reaching a terminal
state S
n
=(B
n
, R
1
n
,...,R
m
n
). Finally, the set
B
n
contains the formed coalitions and their allocated
tasks.
5 CONCLUSION
Approaches that proposed solution for coalition for-
mation problem with uncertain coalition value do not
take into account the uncertain task execution and the
impact of the formation of a coalition on the gain
that can be obtained from the formation of another
coalitions. In this paper, we addressed the problem
of coalition formation in environments where the re-
sources consumption is uncertain. We showed that
in such an environment, forming a coalition to exe-
cute a task have impacts on the possibility to form
another coalitions. Thus, this issue must be taken into
account at each time agents decide to form a coali-
tion. We introduced the notion of expected reward
that represents what agents expect to gain by form-
ing a coalition. The expected reward is defined as
the sum of (1) what agents immediately gain if the
coalition executes the task and (2) what they expects
to gain by future formation. Our key idea is to view
the formation of coalitions as a decision to make that
provides, due to the uncertain task execution, an ex-
pected reward. Agents’ aim is then to form coalition
by a way that maximizes their long-term expected re-
ward instead of real reward. The coalition formation
problem has been formalized by a Markov decision
process. Since the obtained MDP is a finite horizon,
it can be solved in a finite time using known algo-
rithms as value-iteration and policy iteration. After
solving the MDP, the controller agent can optimally
decide, for each task, which coalition must be formed.
In other words, it can make optimal decisions about
the coalition formation.
REFERENCES
Aumann, R. (1959). Acceptable points in general cooper-
ative n-person games. volume IV of Contributions to
the Theory of Games. Princeton University Press.
Bellman, R. E. (1957). A markov decision process. journal
of Mathematical Mechanics, pages 6:679–684.
Bernheim, B., Peleg, B., and Whinson, M. (1987).
Coalition-proof nash equilibria: I concepts. Journal
of Economic Theory, 42(1):1–12.
Blankenburg, B. and Klusch, M. (2004). On safe kernel
stable coalition formation among agents. In Proceed-
ings of International Joint conference on Autonomous
Agents & Multi-Agent Systems, AAMAS04.
Blankenburg, B., Klusch, M., and Shehory, O. (2003).
Fuzzy kernel-stable coalition formation between ra-
tional agents. In Proceedings of International Joint
conference on Autonomous Agents & Multi-Agent Sys-
tems, AAMAS03.
Chalkiadakis, G. and Boutilier, C. (2004). Bayesian rein-
forcement learning for coalition formation under un-
certainty. In Proceedings of International Joint con-
ference on Autonomous Agents & Multi-Agent Sys-
tems, AAMAS04.
Kahan, J. and Rapoport, A. (1984). Theories of Coalition
Formation. Lawrence Erlbaum Associations Publish-
ers.
Ketchpel, S. (1994). Forming coalition in the face of uncer-
tain rewards. In Proceedings of AAAI, pages 414–419.
Klusch, M. and Shehory, O. (1996). A polynomial kernel-
oriented coalition formation algorithm for rational in-
formation agents. In Proceedings of ICMAS, pages
157–164.
Kraus, S., Shehory, O., and Taase, G. (2003). Coalition for-
mation with uncertain heterogeneous information. In
Proceedings of the Second International Joint confer-
ence on Autonomous Agents and Multi-Agent Systems,
AAMAS03, Australia.
Kraus, S., Shehory, O., and Taase, G. (2004). The ad-
vantages of compromising in coalition formation with
incomplete information. In Proceedings of Inter-
national Joint conference on Autonomous Agents &
Multi-Agent Systems, AAMAS04.
Learman, K. and Shehory, O. (2000). Coalition forma-
tion for large-scale electronic markets. In Proceedings
of the Fourth International Conference on Multiagent
Systems.
Papoulis, A. (1984). Signal Analysis. International student
edition, McGraw Hill Book Company.
Puterman, M. L. (1994). Markov Decision Processes. John
Wiley & Sons, New York.
Shehory, O. and Kraus, S. (1998). Methods for task allo-
cation via agent coalition formation. Artificial Intelli-
gence, 101:165–200.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learn-
ing: An Introduction. MIT Press, Cambridge MA.
ISBN 0-262-19398-1.
Zlotkin, G. and Rosenschein, J. (1994). Coalition, cryptog-
raphy, and stability: mechanisms for coalition forma-
tion in task orientd domains. In Proceedings of AAAI,
pages 432–437.
COALITION FORMATION WITH UNCERTAIN TASK EXECUTION
169