New Flow-based Heuristic for Search Algorithms Solving Multi-agent

Path Finding

Jiri Svancara

and Pavel Surynek

1,2

Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic

National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan

Keywords:

Multi-agent Path Finding, A*, Heuristic Function, Multi-commodity Flow, Network Flow, Maximum Flow,

Makespan Optimality.

Abstract:

We address the problem of optimal multi-agent path ﬁnding (MAPF) in this paper. The task is to ﬁnd a set

of actions for each agent in know terrain so that each agent arrives to its desired destination from a given

starting position. Agents are not allowed to collide with each other along their paths. Furthermore, a solution

that minimizes the total time is required. In this paper we study search-based algorithms that systematically

explore state space. These algorithms require a good heuristic function that can improve the computational

effectiveness by changing the order in which the states are expanded. We propose such new heuristic, which

is based on relaxation of MAPF solving via its reduction to multi-commodity ﬂow over time expanded graph.

The multi-commodity ﬂow is relaxed to single commodity ﬂow, which can be solved in polynomial time.

We show that our new heuristic is monotone and therefore can be used in search-based algorithms effectively.

We also give theoretical analysis of the new heuristic and compare it experimentally with base-line heuristics

that are often used.

1 INTRODUCTION

Multi-agent path ﬁnding (MAPF) is the task of ﬁnd-

ing collision free paths for a set of mobile agents so

that each agent can reach its goal position by fol-

lowing the determined path (Kornhauser et al., 1984;

Surynek, 2009; Sharon et al., 2013). The MAPF prob-

lem recently attracted considerable attention from re-

search community and many concepts and techniques

have been devised to address this problem.

An abstraction in which an environment with

agents is represented by a graph is often used in the

literature (Ryan, 2008). Agents in this abstraction are

items placed in the nodes of the graph. Edges repre-

sent passable regions. Physical space occupancy of

agents is represented by the restriction that at most

one agent can be placed in each node. The time is dis-

crete which means that agents can do a single move in

a time step.

We address the problem of generating optimal so-

lution to MAPF which is computationally hard as

shown in (Ratner and Warmuth, 1990) but well mo-

tivated. Optimal solutions are important in navigation

domains when we need to minimize time consump-

tion (see (Sharon et al., 2013) for the detailed survey).

We speciﬁcally concentrate on search-based algo-

rithms for MAPF based on A*. Recent developments

in A* algorithms for MAPF shows that a signiﬁcant

progress has been made by integrating sophisticated

heuristics into A*. Our contribution follows this di-

rection as well. We are trying to improve base-line A*

algorithm for MAPF by incorporating a novel heuris-

tic that is inspired by network ﬂows. Network ﬂows

conceptually resemble the MAPF problem where we

don’t care about agents identities (that is, agents are

anonymous).

The positive aspect of network ﬂows is that many

efﬁcient algorithms exist in this domain. Similar ob-

servation has been already made by Ma and Koenig in

(Ma and Koenig, 2016) who successfully integrated

network ﬂow algorithms into a search-based optimal

MAPF algorithm called conﬂict-based search (CBS)

(Sharon et al., 2012). We are doing a similar attempt

of network ﬂow integration but within the framework

of A* algorithm.

2 PROBLEM DEFINITION

Our task is to ﬁnd a sequence of actions for each agent

Svancara J. and Surynek P.

New Flow-based Heuristic for Search Algorithms Solving Multi-agent Path Finding.

DOI: 10.5220/0006184504510458

In Proceedings of the 9th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2017), pages 451-458

ISBN: 978-989-758-220-2

451

that leads that agent from its initial position to its ﬁnal

desired position without colliding with other agents.

When we say position, we address only one agent, in

contrast when we say state, we refer to all agents and

their positions (for example by state of the graph we

mean placement of all agents into nodes of the graph).

In general we can track positions of every agents by

function α

: A → V that gives us position (a node in

graph) of an agent in time step k. Formally we can

deﬁne an instance of MAPF as follows.

Deﬁnition 1. An instance of MAPF is an ordered 4-

tuple (G, A, α

, α

), where G = (V, E) is a graph, A

is a set of agents, and α

and α

are initial and ﬁnal

state, respectively.

The solution of MAPF is a sequence of steps that

form permissible path for each agent. There are many

different approaches to what a permissible path is. In

this paper we will allow an agent to move from node

u to an unoccupied node v if there exists a directed

edge hu, vi. Furthermore, we allow agent to move to

an occupied node v if the agent in node v is moving

to another node in the same time step. This deﬁnition

follows the variant from (Yu and LaValle, 2013b). It

allows agents to move in one direction on a fully oc-

cupied cycle. A prohibited move is to swap two adja-

cent agents with each other.

As indicated before, we will consider movement

of agents as follows. In each time step every agent

moves according to allowed moves. Staying in the

same position (no-op) is also an allowed move. The

number of these steps it takes to get all agents to their

ﬁnal positions is referred to as makespan. The differ-

ence in agent movements is important when we talk

about optimal solution to the MAPF problem. We de-

ﬁne optimal solution as a solution in which each agent

is in its ﬁnal position in the minimal time step, i.e. we

want to ﬁnd a solution with minimal makespan. Find-

ing of such optimal solution is NP-Hard (Ratner and

Warmuth, 1990; Yu and LaValle, 2013b).

In next chapters, we will discuss state search algo-

rithms that use heuristic functions with the following

important property.

Deﬁnition 2. A heuristic function h() is monotone,

iff it satisﬁes the following condition

h(a) ≤ m(a, b) + h(b),

where m(a,b) is the actual cost from state a to state b.

3 RELATED WORK

There are many approaches to solving optimal MAPF

problem. The solutions may be acquired via reduction

of the problem to satisﬁability (Kautz and Selman,

1999) or other NP-hard problem. Other approach are

search-based algorithms (Sharon et al., 2013; Sharon

et al., 2012; Boyarski et al., 2015).

In this paper we will focus mainly on algorithms

that improve the basic A* algorithm (Hart et al.,

1968), for which we propose new heuristic. We will

also remind of the solution of MAPF via its reduction

to multi-commodity ﬂow, which is the inspiration for

our heuristic.

3.1 Operator Decomposition

The standard A* algorithm when applied to MAPF

has a branching factor that is exponential in the num-

ber of agents (Silver, 2005). Each agent can choose

one of its neighbors in a non-colliding way and then

all the agents proceed according to their choice which

results in a new state. Such approach is impractical

and therefore a technique of operator decomposition

(OD) (Standley, 2010) has been developed to reduce

the branching factor.

Instead of moving all the agents to their next posi-

tions at once, agents advance to the next position one

by one in a ﬁxed order within the OD concept. The

original operator for obtaining the next state is thus

decomposed into a sequence of operators with small

branching factor (the branching factor is bounded by

the degree of a node). Under this representation,

there are two conceptually different states - standard

and intermediate as denoted by Standley. Intermedi-

ate state correspond to the situation when not all the

agents ﬁnished their move while standard states cor-

respond to states in the original representation with no

OD.

The major strength of OD lies in the fact that

top-level A* algorithm does not need to distinguish

between standard and intermediate states. The next

node for expansion is selected among both standard

and intermediate states while the cost function ap-

plies to both types of states. It may thus happen that

a certain intermediate state is not expanded towards

a standard state because other states turned out to

be better according to the cost function (denoted as

c() = g()+h(), where g() is the actual cost from start

to current state and h() is heuristic function).

The value of g() is simply 0 for the initial state.

For every other state, the value is g() of its prede-

cessor plus one (we assume unit costs of actions). We

can notice that the standard states are those, whose g()

value is divisible by number of agents. As a heuristic

function, Standley proposes the sum of shortest paths

of each agent from its current position to its ﬁnal po-

sition.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

452

Treatment of collisions between agents when they

are advanced to their next position need to be done

with a special care. As we allow movements of agents

into nodes that are vacated by other agents, the OD

must be allowed to temporarily move agents into po-

sitions occupied by agents that have not yet ﬁnished

their move. Temporary collisions are eventually re-

solved after all the agents ﬁnish their move and the

standard state is reached. For further details about

precise implementation of OD we refer the reader to

(Standley, 2010).

3.2 Independent Subproblems

Another method proposed in (Standley, 2010) to fur-

ther improve performance of the A* algorithm is

called independence detection. The main idea behind

this technique is that difﬁculty of optimal MAPF solv-

ing grows exponentially with the number of agents. It

would be ideal, if we could divide the problem into a

series of smaller sub problems, solve them indepen-

dently, and then combine them.

The simple approach, called simple independence

detection (SID), assigns each agent to a group so that

every group consists of exactly one agent. Then, for

each of these groups, an optimal solution is found in-

dependently. Every pair of these solutions is evalu-

ated and if the two groups solutions are in conﬂict, the

groups are merged and replanned together. If there are

no conﬂicting solutions, the solutions can be merged

to a single optimal solution of the original problem.

This approach can be further improved by trying

to avoid the merging of groups. Generally, each agent

has more than one possible optimal path. However,

SID considers only one of these paths. The improve-

ment of SID known as independence detection (ID) is

as follows. Let there be two conﬂicting groups G

and

. First, try to replan G

so that the new solution has

the same cost and the steps that are in conﬂict with G

are forbidden. If no such solution is possible, try to

similarly replan G

. If this is not possible, merge G

and G

into a new group. In case either of the replan-

ning was successful, that group needs to be evaluated

with every other group again. This can lead to inﬁ-

nite cycle. Therefore, if two conﬂicting groups have

already been in conﬂict before, merge them without

trying to replan. For further details about ID and an-

other improvement called conﬂict avoidance table we

refer the reader to (Standley, 2010).

Both SID and ID do not solve MAPF on its own,

they only divide problem into smaller sub problems

that are solved by other algorithm.

3.3 Multi-commodity Flow

A reduction to multi-commodity ﬂow problem has

been proposed in (Yu and LaValle, 2012). The re-

duction shows correspondence between MAPF and

multi-commodity ﬂow over a time expanded graph.

First, we will deﬁne multi-commodity ﬂow problem.

Deﬁnition 3. Given a directed graph G = (V, E),

where each edge (u, v) ∈ E has an integer capacity

cap(u,v) and positive cost cost(u, v). Let have k com-

modities K

, . . . , K

deﬁned as K

= (s

, d

), where

∈ V are source and sink, respectively and d

is the

demand. The ﬂow of commodity i along edge (u, v) is

: E → N. Find an assignment of ﬂow which satisﬁes

the constraints:

1. Capacity constraint

∑

i=1

(u, v) ≤ cap(u, v)

2. Conservation of ﬂows

∑

w∈V

(u, w) =

∑

w∈V

(w, u)

for u 6= s

3. Demand satisfaction

∑

w∈V

, w) =

∑

w∈V

(w, t

) = d

For this deﬁnition, we want to ﬁnd the minimum

cost multi-commodity ﬂow. Which means minimizing

∑

(u,v)∈E

(cost(u, v)

∑

i=1

(u, v)).

This problem with two or more commodities is it-

self an NP-hard problem (Even et al., 1976).

We also need to deﬁne the time expanded graph

that will be used. We follow the idea known from do-

main independent planning and SAT-based approach

to MAPF where representation of states is expanded

over all the possible time steps (Kautz and Selman,

1999; Yu and LaValle, 2012; Surynek, 2015). A vi-

sualization of such graph with i layers and copies of

nodes u and v with appropriate new edges is shown in

Figure 1.

Deﬁnition 4. For an input directed graph G = (V, E)

we deﬁne time expended graph with i layers as G

, E

), where V

contains nodes {v

, v

, . . . , v

, v

} for

each v ∈ V . E

contains edges (u

, v

j+1

) for each j =

1, . . . , i − 1, iff (u, v) ∈ E. In addition, we also add

edges (u

, u

j+1

) and (u

, u

) for each u and j. All

nodes with the same index j are called the j −th layer

of the time expanded graph.

New Flow-based Heuristic for Search Algorithms Solving Multi-agent Path Finding

453

...

Figure 1: A visualization of time expanded graph with i

layers.

Assume we have an instance of MAPF problem

(G, A, α

, α

) that can be solved in makespan T . We

construct a time expanded graph G

T +1

with T +1 lay-

ers and deﬁne an instance of multi-commodity ﬂow

on this graph as follows. Let there be |A| commodi-

ties (i.e. one for each agent) and label nodes in the

ﬁrst layer as sources according to α

and similarly

label nodes in the T + 1 − th layer as sinks accord-

ing to α

. For example, if agent’s a starting position

is in node v and its goal is in node u in the original

graph G then a commodity that represents this agent

has source in node v

and sink in node u

T +1

in the

time expanded graph. Every edge is given unit ca-

pacity and unit cost. For each commodity we have a

demand of one.

If we ﬁnd solution to such multi-commodity ﬂow

problem, then the edges with unit ﬂow represent paths

of agents. Thus if there is edge such as f

, v

j+1

) =

1 then that means agent k moved from node u to node

v in time-step j. If there is edge such as f

, u

j+1

) =

1 then that means agent k did not move in time-step j.

Edges inside of a layer are there to make it impossible

for more agents to occupy one node at the same time.

Since all the capacities are one, the ﬂow between two

nodes inside a layer can be zero or one. This means

only one edge coming to u

can have assigned ﬂow of

one.

Let us note that such deﬁnition of time expanded

graph gives correct solutions only for directed graphs.

3 4

Figure 2: An example of multi-commodity ﬂow solving

MAPF with two agents (blue and green). The original prob-

lem is in the top part, where nodes are indicated by numbers

and goal positions are represented by ﬂags. The solution on

time expanded graph is in the bottom part, where colored

edges represent the path of both agents.

If G is undirected, there exists a ﬂow that swaps po-

sitions of two agents connected by an edge, which

is prohibited in the standard MAPF deﬁnition. In

their publication, Yu and LaValle (Yu and LaValle,

2012) suggested a graph construction that excludes

such ﬂows. However, in our paper we have no need

for this construction.

It is shown in (Yu and LaValle, 2013a) that this

reduction of MAPF instance that can be solved in

makespan T to multi-commodity ﬂow problem is cor-

rect.

4 FLOW HEURISTIC

Now we will describe our new heuristic that shall be

used with the OD algorithm. The ﬂow-based heuris-

tic (or just ﬂow heuristic) is obtained by relaxation

of the previously described multi-commodity ﬂow re-

duction. First, we shall deﬁne ﬂow and ﬂow network.

Deﬁnition 5. Let G = (V, E) be a directed graph,

where each edge (u, v) ∈ E has an integer capacity

cap(u, v). Let there be nodes s, t ∈ V , denoted as

source and sink. The 4-tuple (G, cap, s, t) is a ﬂow

network. Flow is a function f : E → N with following

properties:

1. Capacity constraint

0 ≤ f (u, v) ≤ cap(u, v)

2. Conservation of ﬂows

∑

w∈V

f (u, w) =

∑

w∈V

f (w, u)

for u 6= s,t

The construction of the time expanded graph is the

same as before. To deﬁne the maximum ﬂow problem

over this graph G

, we need to assign capacities and

one source and one sink. The source is a new node

s added to the graph with new edges (s, v

), where v

are appropriate nodes occupied by agents. Similarly,

the sink is added as node t with edges (v

,t), where v

are goals of all agents. Capacities of all edges in the

graph are unit.

In the following description of the ﬂow heuris-

tic, we will refer to the pseudo code in Algorithm

1. The heuristic needs the original graph G, the set

of the agents (or just its size), and the current state

and the ﬁnal state. We initialize the time expanded

graph as empty and set the maximal ﬂow as 0. The

variable i denotes number of layers that the time ex-

panded graph should have. The ﬁrst iteration is with

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

454

two layers, since α

is not the ﬁnal state, there has to

be at least one agent that needs to move.

In the while loop, we try to ﬁnd the minimal num-

ber of layers that would yield permissible paths for

each agent to any goal, not necessarily its own. The

cycle can be terminated when the found ﬂow is equal

to the number of agents. This means we found a path

for each agent without them colliding. The ﬂow can

never be bigger than the number of agents, because

there are only |A| edges of unit capacity connected to

the source.

Over constructed ﬂow network a maximal ﬂow is

found by any ﬂow algorithm. For example in our

implementation, we use Dinitz’s algorithm (Dinitz,

1970). If the maximal ﬂow is not sufﬁcient then a

next iteration is performed over time expanded graph

with additional layer.

The output of the heuristic is simply a number that

states how many steps (including no-op) all the agents

need to do, until the ﬁnal state is reached. This num-

ber corresponds to the h() value in OD algorithm. The

steps are represented by edges with non-zero ﬂow that

are between the layers. Thus the pathLengths can be

simply calculated as (i − 1) ∗ |A|.

Algorithm 1: Flow Heuristic.

1: procedure FLOWHEURISTIC (G, A, α

, α

)

2: expandedG ←

3: ﬂow = 0

4: i = 1

5: while ﬂow < |A| do

6: i = i + 1

7: expandedG ← buildExpandedGraph(G, i)

8: addSourceAndSink(expandedG, α

, α

)

9: ﬂow = maxFlow(expandedG)

10: return pathLengths

Figure 3: An example of paths planned by the ﬂow heuristic

for instance of MAPF from Figure 2. Note that the paths do

not lead each agent to its goal node. However, the time

estimation is correct.

4.1 Detailed Graph Construction

This heuristic is used with the OD algorithm with

standard and intermediate states. The heuristic was

described for standard state. In case α

is an inter-

mediate state, where the ﬁrst l agents moved, small

changes must be made.

When adding the source s, the added edges will

be either (s, v

) for agents that have not yet moved,

or (s, v

) for agents that have moved. This means

the edges are connected to corresponding nodes in the

second layer. There is alway at least two layers, there-

fore it is possible. This is done to compensate for the

agents movement in this time step.

An improvement in performance of the heuristic

is to start with more than only two layers. A good

estimation of how many layers to start with is to com-

pute for each agent the shortest path (ignoring other

agents) in the original graph and take the longest of

these paths. The length of this path plus one is the

number of layers to start with. This is because this

agent has to travel at least this distance. Others may

have to travel shorter distance, but since we are inter-

ested in makespan, the others have to wait for the last

one to arrive. These distances can be computed one

time in advance for every pair of nodes. If the cur-

rent state is intermediate and the agent already moved,

then the distance is reduced by one.

Another improvement is to build the graph efﬁ-

ciently. Due to its structure, we can add one layer

in each iteration and change the sink accordingly, in-

stead of building the whole graph. We can even save

the graph for next call of the heuristic. If more lay-

ers are needed, then we add them. If less layers are

needed, we add the sink to the appropriate layer.

4.2 Properties of the Heuristic

For efﬁcient use in the OD algorithm, we need to

prove the following theorem.

Theorem 1. The ﬂow heuristic is monotone.

Proof. To show that the heuristic is monotone, we

want to show that the inequality

h(α

) ≤ m(α

, α

) + h(α

holds true for any state α

and its descendant state α

As was shown in (Pearl, 1984), we need to assume

only direct descendants of state α

, i.e. α

= α

a+1

We can compute m(α

, α

a+1

) as difference of the g()

value of the two states. Thus due to the properties of

the OD algorithm, the following relation always holds

true for two subsequent states.

m(α

, α

a+1

) = g(α

a+1

) − g(α

) = 1

When state α

changes to state α

a+1

, exactly one

agent made one step and the value of heuristic func-

tion can change in two possible ways. The ﬁrst case

New Flow-based Heuristic for Search Algorithms Solving Multi-agent Path Finding

455

is, that the action is the one planned by the ﬂow

heuristic in state α

. In this case, the value of h(α

a+1

)

is exactly one smaller than the value of h(α

). Thus

the inequality holds.

The second case is that the action of the agent was

not one of the possible actions planed by the heuristic.

In this case, the value can either increase or remain the

same. Thus the inequality still holds.

By relaxing the multi-commodity ﬂow to a single-

commodity ﬂow, we anonymize the agents. This

means that the heuristic plans a path for each agent

in such a way that they do not collide, but they are not

navigated to their desired goals. Instead an agent is

navigated to any goal.

So far we assumed that the input graph for the

MAPF problem is directed. This was important for

our construction of the time expanded graph for multi-

commodity ﬂow problem. Otherwise, there would be

allowed swaps, which are prohibited by deﬁnition of

MAPF solution. Now we will show that when we re-

lax the multi-commodity ﬂow to a single-commodity

ﬂow to use it as a heuristic, it still gives the same

results for undirected graphs even without the graph

construction for directed graphs proposed in (Yu and

LaValle, 2012). This way we solve the ﬂow problem

over smaller time expanded graph with less nodes and

edges and thus require less computational time.

Theorem 2. The value of the ﬂow heuristic is the

same when agents are allowed to swap in time ex-

panded graph as when swaps are forbidden.

Proof. The single-commodity ﬂow anonymize both

goals and agents. Therefore, every agent is inter-

changeable with any other. When the heuristic plans

paths that make two agents swap their positions, it is

equivalent to path, where both agents stay in the same

position. If we found a time expanded graph with

minimal number of layers with desired ﬂow, where

two agents swap, we can ﬁnd equivalent ﬂow, where

these two agents stay in the same position instead.

Such correction can be seen in Figure 4.

It is important to remember that the heuristic gives

us only numerical value, not the planned paths. We

introduce the correction of the paths only to show

that these paths are equivalent and the heuristic, as

described, gives sames results on any type of graph.

j+1

Figure 4: A correction of paths for anonymized agents on

undirected graph.

5 EXPERIMENT

We performed experimental comparison of the men-

tioned search-based techniques. Altogether we com-

pare four combinations of techniques, where the basic

for all of them is OD. The list of the combinations:

• OD with the baseline heuristic (OD+baseline)

• OD with the ﬂow heuristic (OD+ﬂow)

• OD+baseline with ID (OD+ID+baseline)

• OD+ﬂow with ID (OD+ID+ﬂow)

The experiments were conducted over two types

of graphs. The ﬁrst one is a grid map 7x7 with obsta-

cles in the middle. This type of graph was selected for

its simple representation. The obstacles in the middle

of the graph are there to ensure high interaction be-

tween agents. This is further inﬂuenced by the start-

ing positions of the agents and their goal positions.

The goal positions are always in one cluster. The

starting positions are either in a cluster or in random

places. This will be further referenced as centered and

scattered respectively.

Since the algorithms are designed to work with

any type of graph, we also added randomized ori-

ented strongly biconnected graphs as a second type of

graph. We chose this type of graph because it is guar-

anteed to have solution, if there are at least two unoc-

cupied nodes (Botea and Surynek, 2015). For these

graphs, both starting and goal positions are random-

ized. Therefore there are three sets of test instances.

For all of these instances, we create differently dif-

ﬁcult problems by increasing the number of agents

from 2 up to 12 agents. Each type of problem (i.e.

graph, starting positions, number of agents) is created

multiple times. When testing, we are interested in two

values. The total number of visited states - this is the

exact number of how many times a heuristic is com-

puted. The second value is the elapsed time during

computation. For each instance of problem, there is

two minute timeout. In the following graphs, only in-

stances solved within the time limit are included. All

of the following graphs have a logarithmic y-axis to

better show differences even in smaller instances.

The ﬁst type is grid graph with centered starting

positions (see Figure 5). These instances enforce high

interaction between agents, since they have to pass

through the small gap as a group. This is the rea-

son why adding ID does not improve the measured

values as much as in other instances. We can see

that the OD+ﬂow outperforms OD+baseline in both

visited states and elapsed time. The other pair with

added ID performs similarly in terms of visited states.

In terms of computation time, the ﬂow heuristic is

slightly worse since one call of the heuristic is more

difﬁcult to compute than the baseline heuristic.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

456

0 10 20 30 40

Instance number

Number of visited states

Grid - centered

OD+baseline

OD+ﬂow

OD+ID+baseline

OD+ID+ﬂow

0 10 20 30 40

−2

−1

Instance number

Elapsed time [s]

OD+baseline

OD+ﬂow

OD+ID+baseline

OD+ID+ﬂow

Figure 5: Results of experiments on 7x7 grid with obstacles

and clustered starting positions. Top graph shows visited

states for ordered instances. Bottom graph shows computa-

tional time.

0 20 40

Instance number

Number of visited states

Grid - centered

OD+baseline

OD+ﬂow

OD+ID+baseline

OD+ID+ﬂow

0 20 40

−2

−1

Instance number

Elapsed time [s]

OD+baseline

OD+ﬂow

OD+ID+baseline

OD+ID+ﬂow

Figure 6: Results of experiments on 7x7 grid with obstacles

and randomized starting positions. Graphs are organized as

in Figure 5.

The second type is grid graph with random start-

ing positions (see Figure 6). These instances do not

enforce as high interaction between agents. This

means that ID is much more effective, as can be seen.

Both variants with ﬂow heuristic outperforms their

counterparts in hard instances. For easier instances,

where the difference of searched states is much lower,

the easier heuristic outperforms ﬂow heuristic in com-

putational time. In this example we can see that ﬂow

heuristic is orthogonal improvement to ID.

0 10 20 30 40

50 60

Instance number

Number of visited states

Grid - centered

OD+baseline

OD+ﬂow

OD+ID+baseline

OD+ID+ﬂow

0 10 20 30 40

50 60

−2

−1

Instance number

Elapsed time [s]

OD+baseline

OD+ﬂow

OD+ID+baseline

OD+ID+ﬂow

Figure 7: Results of experiments on strongly biconnected

graph with randomized starting and goal positions. Graphs

are organized as in Figure 5.

The last type is randomized strongly biconnected

graph with randomized both starting and goal posi-

tions (see Figure 7). The results are similar to the

previous example with the same explanation.

6 DISCUSSION

We described a new heuristic for search-based algo-

rithms solving MAPF. The heuristic is based on net-

work ﬂow over time expanded graph. The heuris-

tic was obtained as a relaxation of multi-commodity

ﬂow, that can be used to solve MAPF, but is an NP-

hard problem. We showed that this heuristic is mono-

tone and therefore can be used effectively with search

algorithms. Further, we showed that this heuristic can

be used both for oriented and undirected graphs with

smaller construction of the time expanded graph than

is commonly used.

It can be noted that a single call of the ﬂow heuris-

tic is harder to compute than other heuristics. If the

heuristic causes the search algorithm to expand less

states, it can still be beneﬁcial to the overall comput-

ing time. However, in small instances where there is

not much room for improvement in terms of searched

states, the computational time may be worse. This

also affects usage of ID, since we start with smaller

groups of agents and thus easier problems. Future re-

search can focus on using both heuristic - the baseline

heuristic for instances with smaller number of agents

and the ﬂow heuristic for instances with many agents.

New Flow-based Heuristic for Search Algorithms Solving Multi-agent Path Finding

457

ACKNOWLEDGEMENT

This paper is based on results obtained from SVV

project number 260 333, project commissioned by the

New Energy and Industrial Technology Development

Organization Japan, and the joint program for coop-

eration of the Israeli and Czech ministries of science

number 8G15027.

REFERENCES

Botea, A. and Surynek, P. (2015). Multi-agent path ﬁnding

on strongly biconnected digraphs. In Bonet, B. and

Koenig, S., editors, Proceedings of the Twenty-Ninth

AAAI Conference on Artiﬁcial Intelligence, January

25-30, 2015, Austin, Texas, USA., pages 2024–2030.

AAAI Press.

Boyarski, E., Felner, A., Stern, R., Sharon, G., Tolpin,

D., Betzalel, O., and Shimony, S. E. (2015). ICBS:

improved conﬂict-based search algorithm for multi-

agent pathﬁnding. In (Yang and Wooldridge, 2015),

pages 740–746.

Dinitz, E. (1970). Algorithm for solution of a problem of

maximal ﬂow in a network with power estimation. So-

viet Math. Dokl., 11:1277–1280.

Even, S., Itai, A., and Shamir, A. (1976). On the complex-

ity of timetable and multicommodity ﬂow problems.

SIAM J. Comput., 5(4):691–703.

Hart, P. E., Nilsson, N. J., and Raphael, B. (1968). A formal

basis for the heuristic determination of minimum cost

paths. IEEE Trans. Systems Science and Cybernetics,

4(2):100–107.

Kautz, H. A. and Selman, B. (1999). Unifying sat-based

and graph-based planning. In Dean, T., editor, Pro-

ceedings of the Sixteenth International Joint Confer-

ence on Artiﬁcial Intelligence, IJCAI 99, Stockholm,

Sweden, July 31 - August 6, 1999. 2 Volumes, 1450

pages, pages 318–325. Morgan Kaufmann.

Kornhauser, D., Miller, G. L., and Spirakis, P. G. (1984).

Coordinating pebble motion on graphs, the diame-

ter of permutation groups, and applications. In 25th

Annual Symposium on Foundations of Computer Sci-

ence, West Palm Beach, Florida, USA, 24-26 October

1984, pages 241–250. IEEE Computer Society.

Ma, H. and Koenig, S. (2016). Optimal target assignment

and path ﬁnding for teams of agents. In Proceedings

of the 2016 International Conference on Autonomous

Agents & Multiagent Systems, Singapore, May 9-13,

2016, pages 1144–1152.

Pearl, J. (1984). Heuristics - intelligent search strategies for

computer problem solving. Addison-Wesley series in

artiﬁcial intelligence. Addison-Wesley.

Ratner, D. and Warmuth, M. K. (1990). Nxn puzzle

and related relocation problem. J. Symb. Comput.,

10(2):111–138.

Ryan, M. R. K. (2008). Exploiting subgraph structure in

multi-robot path planning. J. Artif. Intell. Res. (JAIR),

31:497–542.

Sharon, G., Stern, R., Felner, A., and Sturtevant, N. R.

(2012). Conﬂict-based search for optimal multi-agent

path ﬁnding. In Hoffmann, J. and Selman, B., editors,

Proceedings of the Twenty-Sixth AAAI Conference on

Artiﬁcial Intelligence, July 22-26, 2012, Toronto, On-

tario, Canada. AAAI Press.

Sharon, G., Stern, R., Goldenberg, M., and Felner, A.

(2013). The increasing cost tree search for optimal

multi-agent pathﬁnding. Artif. Intell., 195:470–495.

Silver, D. (2005). Cooperative pathﬁnding. In Proceedings

of the First Artiﬁcial Intelligence and Interactive Digi-

tal Entertainment Conference, June 1-5, 2005, Marina

del Rey, California, USA, pages 117–122.

Standley, T. S. (2010). Finding optimal solutions to cooper-

ative pathﬁnding problems. In Fox, M. and Poole, D.,

editors, Proceedings of the Twenty-Fourth AAAI Con-

ference on Artiﬁcial Intelligence, AAAI 2010, Atlanta,

Georgia, USA, July 11-15, 2010. AAAI Press.

Surynek, P. (2009). A novel approach to path planning

for multiple robots in bi-connected graphs. In 2009

IEEE International Conference on Robotics and Au-

tomation, ICRA 2009, Kobe, Japan, May 12-17, 2009,

pages 3613–3619. IEEE.

Surynek, P. (2015). Reduced time-expansion graphs and

goal decomposition for solving cooperative path ﬁnd-

ing sub-optimally. In (Yang and Wooldridge, 2015),

pages 1916–1922.

Yang, Q. and Wooldridge, M., editors (2015). Proceedings

of the Twenty-Fourth International Joint Conference

on Artiﬁcial Intelligence, IJCAI 2015, Buenos Aires,

Argentina, July 25-31, 2015. AAAI Press.

Yu, J. and LaValle, S. M. (2012). Multi-agent path plan-

ning and network ﬂow. In Frazzoli, E., Lozano-P

erez,

T., Roy, N., and Rus, D., editors, Algorithmic Founda-

tions of Robotics X - Proceedings of the Tenth Work-

shop on the Algorithmic Foundations of Robotics,

WAFR 2012, MIT, Cambridge, Massachusetts, USA,

June 13-15 2012, volume 86 of Springer Tracts in Ad-

vanced Robotics, pages 157–173. Springer.

Yu, J. and LaValle, S. M. (2013a). Planning optimal paths

for multiple robots on graphs. In 2013 IEEE Interna-

tional Conference on Robotics and Automation, Karl-

sruhe, Germany, May 6-10, 2013, pages 3612–3617.

IEEE.

Yu, J. and LaValle, S. M. (2013b). Structure and intractabil-

ity of optimal multi-robot path planning on graphs. In

desJardins, M. and Littman, M. L., editors, Proceed-

ings of the Twenty-Seventh AAAI Conference on Arti-

ﬁcial Intelligence, July 14-18, 2013, Bellevue, Wash-

ington, USA. AAAI Press.

ICAART 2017 - 9th International Conference on Agents and Artiﬁcial Intelligence

458