2 RELATED WORK
A number of strategies have been implemented to
escape and avoid local optima in distributed search
which has resulted in several algorithms. The Dis-
tributed Breakout Algorithm (DBA)(Yokoo and Hi-
rayama, 1996) is a hill climbing algorithm that in-
creases the weight of violated constraints in order to
raise the importance of constraints that are violated
at local optima, thus forcing the search to focus on
their resolution. The original algorithm has been stud-
ied extensively and a number of improved versions
proposed. Multi-DB (Hirayama and Yokoo, 2002) is
an extension of DBA for Complex Local Problems -
i.e. DisCSPs with more than one variable per agent.
Multi-DB was then improved in (Eisenberg, 2003) by
increasing constraint weights only at global optima
with the new algorithm being called DisBO. (Basharu
et al., 2007b) proposed DisBO-wd an improvement
on DisBO where constraint weights are decayed over
time. Thus, at each step, the constraint weight is in-
creased if the constraint is violated and it is decayed if
the constraint is satisfied. SingleDBWD (Lee, 2010)
is a version of DisBO-wd for DisCSP with one vari-
able per agent.
Another DisCSP algorithm is Distributed Stochas-
tic algorithm (DSA)(Zhang et al., 2002), a random-
ized local search algorithm that uses probability, to
decide whether to maintain its current assignment or
to change its value at local optima. A hybrid of DSA
and DBA was proposed in Distributed Probabilistic
Protocol (DPP) (Smith and Mailler, 2010) where the
weights of constraints violated at local optima are in-
creased and agents find better assignments by the use
of probability distributions.
The Stochastic Distributed Penalty Driven
Search(StochDisPeL) (Basharu et al., 2006) is an
iterative search algorithm for solving DisCSPs where
agents escape local optima by modifying the cost
landscape by imposing penalties on domain values.
Whenever an agent detects a deadlock (quasi-local
optima), StochDisPeL either imposes a temporary
penalty (with probability p) to perturb the solution or
increases the incremental penalty (with probability
1 − p) to learn about the bad value combination.
Incremental penalties are small and remain imposed
until they are reset while temporary penalties are
discarded immediately after they are used. The
penalties on values approach has been shown to
outperform the weights on constraints approach of
escaping local optima (Basharu et al., 2007a).
Asynchronous Weak Commitment Search
(AWCS) (Yokoo, 1995) is a complete asynchronous
backtracking algorithm that dynamically prioritises
agents. An agent searches for values in its domain
that satisfy all constraints with higher priority neigh-
bours, and from these values it selects the value that
minimises constraint violations with lower priority
neighbours. When an agent does not find a consistent
assignment, the agent sends messages called no-good
to notify its neighbours and then increases its priority
by 1. The use of priority changes the importance of
satisfaction of an Agent.
3 DynAPP: DYNAMIC AGENT
PRIORITISATION AND
PENALTIES
We propose Dynamic Agent Prioritisation and Penal-
ties (DynAPP) [Algorithms 1-4] - a new algorithm
that combines two strategies: penalties on values and
agent prioritisation. At local optima, the priority of
inconsistent agents (whose current variable assign-
ment leads to constraint violations) is increased and,
at the same time, a diversification of the search is
encouraged by penalising values which lead to con-
straint violations.
In DynAPP, variables are initialised with random
values and all agent priority values are set to their
agent ID. It should be noted that the lower the agent
ID, the higher the actual priority of the agent. An
agent then sends its initial variable assignment to its
neighbours. Each agent takes turns to update their
AgentView (their knowledge of the current variable
assignments) with the messages received and selects
a value for its variable that minimises the following
cost function:
c(d
i
) = viol(d
i
) + p(d
i
) i ∈ [1..|domain|]
where d
i
is the ith value in the domain, viol(d
i
) is
the number of constraints violated if d
i
is selected
and p(d
i
) is the incremental penalty imposed on d
i
.
When a temporary penalty is imposed, the penalty is
used to select another improving value and immedi-
ately reset. A QLO is detected when an AgentsView
does not change in two consecutive iterations. At
QLO, an agent (like StochDisPeL
1
) imposes a tem-
porary penalty (with probability p) or it increases the
incremental penalty (with probability 1 − p) and also
changes its priority value to the priority of the highest
priority neighbour with whom it shares a constraint
violation thus elevating itself among its neighbours.
The neighbour with the highest priority then reduces
1
Further details on how penalties are imposed can be
found in (Basharu et al., 2006).
DynamicAgentPrioritisationwithPenaltiesinDistributedLocalSearch
277