Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup
and Delivery Problems
Daiki Shimada, Yuki Miyashita
a
and Toshiharu Sugawara
b
Waseda University, Shinjuku, Tokyo 1698555, Japan
Keywords:
Multi-Agent Pickup and Delivery, Cooperation, Path Finding.
Abstract:
We propose an efficient method for determining subsequent movements based on temporarily generated short-
est paths in the multi-agent pickup and delivery (MAPD ) problem. The MAPD problem involves multiple
agents (such as carrier robots) continuously performing transportation tasks in a vast environment wi th obsta-
cles while avoiding collisions with other agents. Our method is an extension of the decentralized path-finding
algorithms, priority inheritance with backtracking (PIBT), and can be efficient in environments w ith narrow
one-way paths and few detours. Our method, PIBT with provisional booking (PIBT-PB), not only secures the
next node as in PIBT but also provisionally books some nodes in advance based on dynamic priorities between
agents to detect possible conflict earlier. Therefore, it reduces the number of “turning back” and wasted “wait-
ing” actions in environments. Our experiments show that PIBT-PB is more efficient than the baselines, PIBT
and windowed P IBT, and even in less restrictive environments, it performs as efficiently as PIBT.
1 INTRODUCTION
Significant interest has been growing in the field of
multi-agent systems (MAS) designed for executing
sophisticated collaborative and coordinated tasks. Ex-
amples o f such applications include coo perative se-
curity monitoring using multiple patrolling robots,
material transportation using carrier robots in ware-
houses (Ma et al., 2017) , construction sites (Miyashita
et al., 2023; Ankit et al., 2022), rescue environ-
ments (Jennings et al. , 1997), and automated cater-
ing rob ots in restaurants (Wang, 2023). In these sys-
tems, agents must individually determine appropriate
actions based on their r ecognition of the surroundings
to fulfill their o bjectives while avoiding negative inter-
ference, such as collision and resource conflict. Par-
ticularly, we address the problem of the m ulti-agent
pickup and delivery (MAPD) (Ma et al., 2017) in
a c onstruction site context. Here, numerous agents
(e.g., carrier robots) continuously pick up heavy and
large materials and deliver them to the required desti-
nations during the night, ready for the next day’s in-
stallation work by human builders. This process is
carried out while avoiding collisions and deadlocks.
Therefore, it can be viewed as a repe ated multi-agent
a
https://orcid.org/0000-0002-1676-9346
b
https://orcid.org/0000-0002-9271-4507
path finding ( MAPF) problem, where multiple agents
generate collision-free paths between their start and
goal locations.
In a construction site, installation mater ia ls are
stored in a few storage areas and carried to specific
locations where the materials will be installed. An-
other example of an MAPD situation is where mate-
rials are carried to a nd from trucks in the loading b ay.
This suggests that agents in the MAPD problem tend
to gather in a few areas, restrictin g the paths that they
can select and easily leading to deadlo c k and conges-
tion. These problems can be mitigated in automated
warehouses because they are specially designed to
accommodate the movement of carrier a gents, with
wide paths and many detours of similar lengths. How-
ever, this is not th e case for constru c tion sites; agents
must move along relatively narrow paths that d o not
allow them to pass each other. The topological struc-
ture of the environment may change because of new
walls and doors installed the previous day. Further-
more, agents may take detours to avoid head-on col-
lisions, but some detour s are relatively longer. T here-
fore, the agent has to find a possible collision as early
as possible to generate collision-free paths.
Numerous studies on MAPF and MAPD prob-
lems (Ma et al., 2017; Okumura et al., 2019a; Sha ron
et al., 2015; Yamauchi et al., 2022; Li et al., 2020)
have been conducted beca use of their many appli-
528
Shimada, D., Miyashita, Y. and Sugawara, T.
Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems.
DOI: 10.5220/0013158300003890
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 17th International Conference on Agents and Artificial Intelligence (ICAART 2025) - Volume 3, pages 528-537
ISBN: 978-989-758-737-5; ISSN: 2184-433X
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
cations, and our study focuses on a decentralized
path-finding algorithm, such as priority inheritance
with backtracking (PIBT) (Okumura et al., 2019a)
and windowed priority inheritance with backtracking
(winPIBT) (Okumura et al., 2019b) algorithms, be-
cause they guarantee reachability in a decentralized
manner with fewer constraints on the arrangement o f
destinations. Agents using these algorithms can deter-
mine paths for their tasks individually based on their
priorities, thus reducing the planning time by localiz-
ing the co mputation. Although PIBT is usually effi-
cient, it may delay the d e te ction of po ssible collisions
because the agent communic a te s with oth er age nts
two nodes away and can only secure the next position
toward its destination. To a ddress this issue, winPIBT
exclusively secures some nodes with a fixed time win-
dow. A longer window enables agents to detect pos-
sible collisions earlier, but the overall efficiency may
decrease because the low-priority agents are restricte d
from moving by the secured nod e s, even if no colli-
sions occur.
Thus, we propo se a method called PIBT with pro-
visional booking (PIBT-PB). In this method, agents
secure their immediate n ext nodes as in PIBT, but re-
frain from securing subsequent nodes. Instead, they
tentatively boo k severa l nodes in advance, which are
referred to as provisional nodes. This strategy al-
lows for earlier detection of potential head-on col-
lisions and prevents the obstruction of other agents.
The number of provisional nodes is flexible, and can
be adjusted based on the topological struc ture of the
environment. We show th at PIBT-PB maintains the
reachability of all agents in the relaxed bi-connected
area, as in PIBT, and demonstrate that it performs
as efficiently as PIBT. Our experimental results indi-
cate that PIBT-PB reduces the makespan the total
time required to complete all tasks, including plan-
ning time compared to the baseline methods, PIBT
and winPIBT, in our test environment. In addition, we
discuss the benefits and limitations of the proposed
approa c h.
2 RELATED WORK
Many studies have been conducted on M APF and
MAPD problems (Ma et al., 2017; Okumura et al.,
2019a ; Sharon et al., 2015; Standley, 2010 ; Golden-
berg et al., 2014; Wagner and Choset, 2015 ; Silver,
2005; Yamauchi et al., 2022; Li et al., 2020). They
can be roughly classified into centralized and dec e n-
tralized approaches. In ce ntralized appr oaches, the in-
formation on the environment and all agents, are col-
lected to a single agent/server and it calculates and
distributes all reasonable collision-free paths to all
agents ( Sharon et al., 2015; Luna and Bekris, 2011).
For example, in conflict-based search (CBS) (Sharon
et al., 2015), the path planning process is divided
into high- and low-level search subprocesses. In low-
level search, agents independently generate the short-
est p aths to their destinations and send them to the
centralized server. The server then modifies all paths
to eliminate possible collisions and distribute them to
individual agents. Although this approach is likely
to control all agents optimally in terms of travel path
length, th e computational cost increases as the num-
ber of agents increases. Moreover, we must con-
sider the system’s flexibility issue in the sense that if
one age nt cannot move as planned for any reason, all
agents will b e affected by the replanning process for
the entire control.
In a decentralized approach (Ma et al., 2017; Oku-
mura et al., 2019a; Ma et a l., 2019 ; Yamauchi et al.,
2022; Li et al., 2020; Farinelli et al., 2020; Miyashita
et al., 2023 ), agents autonomously decide their own
paths. Although this approach is more flexible, as
agents have only local inf ormation, it has some issues
such as the optimality of generated paths and reac ha-
bility. For example, in token passing (TP) (Ma et al.,
2017), the agent accesses the token, a type of shared
memory, to refer to its content, generates the shortest
collision-free path, and stores the information on that
path into the token. TP can guarantee reachability un-
der a reasonable assumption for an au tomated ware-
house but its performance degrades in our target en-
vironm ent because it has only a few endpo ints. Li et.
al (Li et al., 2020) prop osed the rolling-horizon co lli-
sion resolution (RHCR) in which agents replan their
paths at regular intervals while che cking possible col-
lisions within a certain window size. However, in con-
gested situations, path generation becomes costly and
cannot guarantee reachability.
Meanwhile, PIBT can reach the destination with
only local communication, and reachability is guar-
anteed in an environment whose topological structure
is relaxed bi-connected. However, its short-sighted
algorithm delays the detection of collisions, reduc-
ing the efficiency. To overcome this drawback, win-
PIBT (Okumura et al., 20 19b) secures a fixed num-
ber of nodes in advance to detect possible collisions
earlier. However, it is not always safe, meaning that
agents may unnecessarily r estrict the movement of
other lower-priority agents. Our proposed method
can be co nsidered an extension of PIBT, where some
nodes are provisionally booked in advance if the
nodes are safe to book. Although we already reporte d
the abstract of this extension (Shimada et al., 2025),
we pr ovide a detailed explanation of the algorithm
Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems
529
Figure 1: Example environment (Env. 3).
(a) Deadlock
(b) Priority inheritance
(c) 1 step later
Figure 2: Procedure of Priority Inheritance.
and discusses the th e oretical completeness of the al-
gorithm and the results of experimental evaluation.
3 PROBLEM DESCR IP TION
3.1 MAPD
Let A = {a
1
, a
2
, ..., a
N
} be the set of N ( N) agents,
where N is the set of non-negative integers. A
MAPD e nvironment can be expressed by a connected
undirected graph G = (V , E) embeddable in a two-
dimensional Euclidean space, where V and E are the
sets of nodes and edge connecting nodes, resp ectively.
We introduce the discrete time t N. An agent can
move along e dge (u, v) E connectin g two nodes
v, u V . As G is undirected, if (v, u) E, (u, v) E.
We assume that the length of all edges is 1, meaning
that any agent can move to an adjacent node in one
time step (by adding dummy nodes if ne c essary). An
example of the grid-based environment is shown in
Fig. 1, where red circles are age nts that can move up,
down, left, and right if there is space.
The location of agent a
i
at time t is denoted by
v
i
(t) V. At e a ch time t, a
i
can m ove to v
i
(t + 1)
N
v
i
(t)
or stay at the sam e node v
i
(t), where N
u
= {v
V | (u, v) E} is the set of the n eighbor nodes o f
u V . A collision occurs when two agents stay at one
node or cross the same edge simultaneously. The re-
fore, the following conditions must be satisfied for
a
i
, a
j
A (i 6= j).
v
i
(t) 6= v
j
(t) (v
i
(t) 6= v
j
(t + 1) v
i
(t + 1) 6= v
j
(t))
(1)
Meanwhile, synchronized circular movements with-
out collisions are assumed to be possible, that is,
v
i
(t + 1) = v
j
(t) v
j
(t + 1) = v
k
(t) . . .
v
l
(t + 1) = v
i
(t)
(2)
is allowed for different agents a
i
, a
j
, a
k
, . . . , a
l
A.
Let Γ = {τ
1
, τ
2
, ...} b e a finite set of tasks. In
MAPD, task τ
j
is represented by a pair of pickup node
g
1
and delivery node g
2
(g
1
6= g
2
). When τ
j
is as-
signed to a
i
at t, τ
j
is removed fro m Γ and a
i
travels
from v
i
(t) to g
1
, g
2
by setting its destination in turn.
After both destinations are visited, another task in Γ
is assigned to a
i
if Γ 6= . This a lso means that all
agents individually determine their next destinations
when they have reached the curr ent de stin a tions. We
denote the set of th e current destinations of all agents
as D = {d
1
, . . . , d
N
}, where d
i
is the destinatio n of a
i
and we define d
i
= nil if a
i
has no destination.
3.2 Priority Inheritance with
Backtracking (PIBT)
According to PIBT (Okumura et al., 2019a), each
agent a
i
has a priority that is updated at each time
step. The age nt interacts with nearby agents through
a recursive process involving priority inheritance (PI)
and backtracking (BT) to determine its next move
based on the priority order.
Definition 1. (Relaxed bi-connected graph) Graph
G = (V, E) is a relaxed bi-connected grap h iff G is a
connected g raph, and there is a cyclic path of at least
length 3 between any pair of neig hbor nodes in G.
PIBT has been proven to ensure that all agents can
reach their destinations without deadlo c k, livelock,
or collision, if G = (V, E) is a re la xed bi-connected
graph and |A| |V |. We briefly explain PIBT. Agent
a
i
A has a priority
p
i
(t) = ε
i
+ η
i
(t) (3)
for the m ovement along its planned path. Here, ε
i
(0 ε < 1) is the base value of the unique priority as-
signed to a
i
initially, and η
i
(t) ( N) is the elapsed
time after a
i
updates its destination. Agent a
i
sets
η
i
(t) to 0 when arriving at its destination. Evidently,
agents have different priorities.
First, agent a
i
ranks the nodes in N
v
i
(t)
{v
i
(t)},
usually at time t, according to the estimated distance
to its destination from v N
v
i
(t)
{v
i
(t)} usin g some
path-finding methods for MAS, such as c ooperative
A* search (CA*) (Silver, 2005). Then, a
i
secures the
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
530
(a) Deadlock in PI (b) PI after BT
(c) BT (d) 1 step later
Figure 3: Example Procedure of Pri ority Inheritance and
Backtracking.
next node in the order of priority as follows. Suppo se
that a
i
has the highest priority in its local area. If agent
a
j
s.t. p
j
(t) < p
i
(t) occupies node a
i
and wants to
move next, a
i
pushes a
j
and a
j
must yield its current
location by moving a certain direc tion. However, this
may result in a potential collision with other agents.
In this case, a
i
inherits its priority to a
j
to avoid it.
An example is shown in Fig. 2, where the arrow ind i-
cates the potential moves and p
1
(t) < p
2
(t) < p
3
(t).
In Fig. 2a, a
3
attempts to secure a
1
s current node, and
thus, a
1
has to move left, but as a
2
tries to move to the
same node, a
1
cannot determine the next node. How-
ever, by PI, the priority of a
3
is inherited b y a
1
, thus,
a
1
has a higher priority than a
2
(Fig. 2b) a nd can se-
cure the left node. Meanwhile, a
2
does not move right
but chooses to “stay” at its current nod e, which is the
next ranked node toward its destination , as shown in
Fig. 2c.
Backtracking (BT) is a process where the results
of the priority inhe ritance (PI) process, which at-
tempted to secure a node, are returned in reverse or-
der. Particularly, an agent that fails to secure a node
returns the result directly to the agent that inherited
its priority. Upon receiving th is infor mation of fail-
ure, the priority inherited agent attempts to move to
the next be st node if possible. If there are no other
agents at that node, it secure s that node; otherwise,
it r ecursively executes PI and attempts to secure that
node. It then repeats this op eration. If all relevant
agents cannot determine th e next location through the
PI and BT processes, they decide that the PI is stuck
and cho ose to remain at the current nodes. Note that
in environments that meet the condition of the relaxed
bi-connected graph, it is demonstrated that all agents
can determine their next nodes by PI and BT (Oku-
mura et al., 2019a) .
An example procedure of BT is shown in Fig. 3,
in which the priority of each agen t is assumed to
be greater in descendin g order of subscript numbers.
First, the highest priority agent a
6
attempts to secure
the next node where a
5
is located. Figu re 3a shows
(a) Moves at time t (b) At time t + 1
(c) Conflict and push (d) Push back again
Figure 4: Inefficient path planning in PIBT.
that the priority of a
6
is inherited alon g a
5
, a
1
, a
3
, and
a
4
. However, because a
4
could not secure the next
nodes, a
4
and a
3
backtrack by returning the failure
results, as shown in Fig. 3b, a nd remaining in their
current nodes. Subsequently, a
1
can select another
node to move by pushin g a
2
because a
1
has hig her
priority than a
2
owing to PI. Th is result of suc c ess is
then delivered by backtracking along a
2
, a
1
, a
5
, and
a
6
(Fig. 3 c). After that, all involved agen ts can decide
their next nodes. Figu re 3d shows their positions at
the next time step, wherein a
3
and a
4
cannot move,
but they can stay a t the same positions.
3.3 winPIBT (windowed PIBT)
The winPIBT (Okumura et al., 2019b) algo rithm e n-
hances the PIBT by incorporating a time window, al-
lowing agents to secure multiple nodes for c onsecu-
tive time steps. This extension aims to mitigate the
inefficiencies in PIBT, particularly those stemming
from delayed collision detection.
An example of inefficiency in PIBT is illustrated
in Fig. 4 at time t, where agen ts a
1
and a
2
move
to their destinations indicated by flags of the same
colors. Here, agent a
2
has a higher priority than a
1
(p
2
(t) > p
1
(t)), and the agents’ secured nodes for the
next moves are shown in the sam e colors of agents.
In Figs. 4a and 4b, both agents secure and move ac-
cording to the shortest paths that they individually
generate. After that, they encounter and recognize
the possible collision. Therefore, a
2
must push b ack
the lower-priority agent a
1
a few times, as shown in
Figs. 4c and 4d. These moves by a
1
are wasteful and
inefficient.
For this problem, in winPIBT, th e information of
the secured n odes is shared among all agents (at least,
close agents), and the highest priority agents that have
Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems
531
(a) a
2
secures path (b) collision detected
(c) a
1
replans a path (d) 3 steps later
Figure 5: Movements in winPIBT.
not yet dete rmined the next node attempt to secure
some future nodes whose number is specified by the
fixed size of the time-window, W (> 0). We illus-
trate how winPIBT mitigates this wasted movement
problem using Fig. 5 , where W = 3. At t, a
2
secures
the nodes for up to three time steps ahead (Fig. 5a).
Then, a
1
attempts to secure three nodes but can se-
cure only one ne ighbor node (Fig. 5b). If there were
no other path, a
1
would stay at this node for three
time steps, but in this situation, there is anothe r path,
and the node after thre e steps will be the same or
closer to its destination. Thus, it secures three nodes
along an alternative path at t (Fig. 5c). Figure 5d
shows the next reservation at t + 3, in dicating that
both agents can reach their destination without unnec-
essary movement. Note that it is obvious that when
W = 1, winPIBT is eq uivalent to PIBT.
winPIBT introduces the disentangled condition,
which intuitively requires that agents secure paths in
advance while holding Condition (1). This condi-
tion includes that agent a
i
cannot secure the nu m-
ber of nodes at t
0
that must be equal or less than the
number of the nodes already secured by agent a
j
if
p
j
(t
0
) > p
i
(t
0
) and canno t secure the node that has
been secured by a
j
to pass before a
j
. Thus, they must
satisfy the following condition,
v
i
(t) 6= v
j
(t + k) f or t
0
t t
0
+ w an d 0 k t
ω
j
(4)
where π
i
= (v
i
(t
0
), ..., v
i
(t
0
i
+ w)) is the nodes secured
by a
i
(w W), v
j
(t + k) is the n ode at time t + k se-
cured b y a
j
, and t
ω
j
is the last time when a
j
already
secured the node.
Although winPIBT has the reachability to destina-
tions for all agents, the condition expressed by Eq. (4)
indicates that the agents are likely to block the moves
of other ag e nts with the lower priorities even if no col-
lision occurs, resulting in inefficiency. Figure 6a ex-
(a) A blocking situation.
(b) Secured and provisional
nodes.
Figure 6: Comparison between wi nPIBT and PIBT-PB.
presses an example situation in which p
1
< p
2
< p
3
for agents a
1
, a
2
, and a
3
, and they move to their desti-
nations with the same colors. Because a
3
first secures
some nodes earlier, including a crossing node, a
2
and
a
1
must wait for a
1
to pass, even if no collision occurs.
If the fixed length of time window is short, agents may
delay in detecting possible collisions. In contrast, a
longer time win dow allows earlier collision detection
but incre ases the likelihood of blocking other agents.
Therefore, we aim to address this problem while ef-
fectively detecting possible collisions.
4 PROPOSED METH OD
4.1 Maximal Bi-Connected Component
and Node Classification
First, we analyze the characteristics of a bi-connected
graph. Inefficient and wasteful behavior in PIBT
occurs when a lower-priority agent a
i
encounters a
higher-priority agent a
j
and a
i
has no other way ex-
cept turning back. This situation occurs in a narrow
path, and the cost of going bac kward is significant if
the distance is long.
First, let us consider the characteristics of the re-
laxed bi-connected graph.
Proposition 1. If G = (V, E) is a relaxed bi-connected
graph, |N
v
| 2 for v V .
Proof. If u V s.t. N
u
is the singleton set, i.e., N
u
=
{u
}. Clearly, the pair (u, u
) does not have a cyclic
path.
We can classify nodes in V into one-way and
crossing nodes from Proposition 1.
Definition 2. Node v V is called a one-way node if
|N
v
| = 2; otherwise, v is called a crossing node.
We denote the sets of one-way and cro ssing nodes
as V
ow
= {v | |N
v
| = 2} and V
crs
= {v | |N
v
| 3}, re-
spectively. We also investigate the structure of a re-
laxed bi-connected graph.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
532
Definition 3. Graph G = (V, E) is a bi-connec ted
graph iff any pair of nodes, v
1
, v
2
V , has a cyclic
path in G.
For graph G = (V, E) and V
V , we can nat-
urally generate subgraph G
= (V
, E
), where E
=
{(v
1
, v
2
) E | v
1
, v
2
V
}.
Definition 4. Subgraph G
= (V
, E
) generated from
graph G = (V, E) is maximal bi-connected component
iff G
is b i-connected an d v V \ V
, the subgraph
naturally generated from V
{v} is not bi-conn ected.
Proposition 2. (Structure of a relaxed bi-
connected graph) G = (V, E) is a conn e cted graph
and G
1
= (V
1
, E
1
), . . . , G
K
= (V
k
, E
K
) ar e all maximal
bi-connected subgraphs of G. G = (V, E) is a r elaxed
bi-connected graph iff G consists only of a number of
subgrap hs of maximal bi-conn ected components, i.e.,
V = V
1
··· V
K
and E = E
1
···E
K
. Furthermor e ,
any pairwise intersections of maximal bi-connected
subgrap hs have at most one node (Miyashita et al.,
2023), and this intersection node has at least four
neighboring nodes, with two of them belonging to
the same maximal bi-connected node.
Proof. The sufficient condition is straightforward.
Suppose that G is a relaxed bi-connected graph. If
there exists v V \V
1
··· V
K
such that v
N
v
and v
V
1
···V
K
(if not, G is not connected), then
v and v
have a cyclic path. Thus, they belong to
a maximal bi-connected subgraph, which also leads
to a contradiction. The latter part is also evident be-
cause if their intersection has two distinct nodes, the
union of these sub graphs forms a larger bi-connected
graph (Miy ashita et al., 2023), which contradicts that
these subgraphs are maximal. Furth ermore, suppose
there is o nly one neighboring node v
of an intersec-
tion nod e v belonging to a maximal bi-co nnected sub-
graph G
k
. Because v
, v V
k
, there is a cyclic path
connecting these nodes, mean ing that v has another
neighboring node v
′′
V
k
, which leads to a contradic-
tion. This also means that any intersection node of
maximal b i-connected g raphs has at least four neigh-
boring nodes.
Proposition 2 shows that if a node u belongs to the
intersection of two maximal bi- c onnected subgraphs,
then u V
crs
. Thus, a sequence of one-way nodes
belongs to a maximal bi-connected component.
4.2 PIBT with Provisional Booking
The ce ntral idea of PIBT-PB for MAPD in a relaxed
bi-connected area is that an agent first secures the next
node. If it does not inherit the priority from other
Algorithm 1: PIBT-PB at current time t
0
for at least securing
the node at t
0
+ 1.
1 p
i
(t
0
) is defined by Eq. 3. Agent a
i
at v
i
(t
0
) will
secure a node for t
0
+ 1.
2 π
S
i
[t] V : Secured node by a
i
for time t; a
i
definitely visits to it at t.
3 π
P
i
[t] V : Provisional nodes booked by a
i
for t in
advance.
4 procedure PIBT-PB(A, t
0
):
5 // t
0
suggests the current time.
6 S A and is sorted by p
i
(t
0
) in descending
order.
7 for a
i
S do
8 if π
S
i
[t
0
+ 1] = nil then
9 // a
i
has no secured node at t
0
+ 1. if
(π
P
i
[t
0
+ 1] 6= nil) then
10 mPIBT(a
i
, a
i
,t
0
)
11 else
12 mPIBT(a
i
, nil,t
0
)
13 end
14 end
15 end
16 end
agents, it attempts to book further nodes provisionally
if the nodes are safe.
The algo rithms are listed in Algs. 1 , 2, and 3. We
denote the array of nodes that agent a
i
has secured
at t a s π
S
i
[t] V , and the arr ay of nodes that a
i
pro-
visionally booked at t as π
P
i
[t] V . In the pr oposed
method (functio n PIBT-PB in Alg. 1), agents’ priori-
ties are calculated using Eq. 3 every time. Suppose
that at current time t
0
the highest priority agent a
i
A
does not secure its next node yet (π
S
i
[t
0
] = nil). It first
attempts to secure the next node using a PIBT-like
method, mPIBT. Subsequently, it provisionally books
additional nodes using addProvisionalNodes to detect
a po ssible collision earlier. We describe how agents
secure/provisionally book nodes in deta il.
4.2.1 Securing next Node
In PIBT-PB (Alg. 1), if a
i
has already provisionally
booked the next node (Line 9 in Alg. 1)) it invokes
mPIBT(a
i
, a
i
,t
0
) to simp ly check the conflict. Other-
wise, a
i
invokes mPIBT(a
i
, nil,t
0
) (Line 12).
In Alg. 2, when a
j
6= nil, a
i
s priority is inherited
from a
j
(including the case a
i
= a
j
) and a
i
releases the
provisional nodes because it is unin te ntionally push e d
by another agent (Line 3). If a
i
has no pr ovisionally
booked node at t
0
+ 1, it calculates the sho rtest path
to d
i
starting f rom π
S
i
(t
0
) (= v
i
(t
0
)) using CA* (or a n-
other similar algorithm ) by setting the already secured
node at t
0
+1 and the provisional nodes booked by the
higher priority agents thus far as obstacles (Line 7).
This is te mporarily stored to array Π. Note that agent
Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems
533
Algorithm 2: mPIBT at time t
0
to decide the next node to
move.
1 procedure mPIBT(a
i
, a
j
, t
0
):
2 if (a
j
6= nil a
j
6= a
i
) then
3 π
P
i
, p
i
(t
0
) p
j
(t
0
)
4 end
5 if (π
P
i
= ) then
6 // CA
(a
i
) searches the shortest path to d
i
by setting the secured and provisional
nodes as obstacles.
7 Π CA*(a
i
), π
P
i
[t
0
+ 1] Π[t
0
+ 1]
8 end
9 while (π
P
i
6= ) do
10 // If there is a potential conflict, PI
occurs.
11 if (a
k
A s.t. (p
k
(t
0
) <
p
i
(t
0
)) (π
P
i
[t
0
+ 1] = π
S
k
[t
0
])) then
12 if mPIBT(a
k
, a
i
,t
0
) is nil then
13 // As a
k
secured a node even if
mPIBT returns nil, the
following CA* returns another
path if exists.
14 Π CA (a
i
)
15 π
P
i
[t
0
+ 1] Π[t
0
+ 1]
16 continue
17 end
18 end
19 π
S
i
[t
0
+ 1] π
P
i
[t
0
+ 1]
20 if a
j
= a
i
then
21 return valid
22 end
23 if (a
k
A s.t. π
P
k
[t
0
+ 1] = π
S
i
[t
0
+ 1])
then
24 π
P
k
25 end
26 if (a
j
= nil) (π
S
i
[t + 1] 6= d
i
) then
27 addProvisionalNodes(a
i
, Π,t
0
)
28 end
29 return valid
30 end
31 π
S
i
[t
0
+ 1] π
S
i
[t
0
]
32 // stay if a
i
finds no node to move next.
33 return nil
34 end
a
k
s.t. p
k
(t
0
) < p
i
(t
0
) initially by Eq. 3 may now have
a higher priority by PI. In this case, a
k
has already se-
cured a node at t
0
+ 1 but has no provisional nodes. If
CA* c annot generate the path, it returns and jumps
to Line 33. Otherw ise, a
i
continues with another gen-
erated path Π and π
P
i
[t
0
+ 1] (= Π[t
0
+ 1]).
If this is not the case, a
i
secures the provisional
node (Line 19) and returns valid” if a
i
= a
j
. Sub-
sequently, if a
k
, s.t. π
P
k
[t + 1] = π
S
i
[t + 1], a
k
release
the provision a l nodes because p
k
(t) < p
i
(t). Further-
more, if a
i
is not pushed by another agent, it invokes
addProvisionalNodes to add itionally book some pro-
visional nodes if possible. If a
i
cannot find any neigh-
boring node, it remains at the current node (Line 33).
Algorithm 3: To find some provisional node after t
0
+ 2.
1 procedure addProvisionalNodes(a
i
, Π, t
0
):
2 for (t {t
0
+ 1, . . . }) do
3 // Try to provisionally book π
P
i
[t
0
+ 2]
and after that.
4 // If a
i
reached d
i
at t or the next node is
crossing, exit from the for-loop.
5 if (π
P
i
[t] = d
i
) (Π[t + 1] V
crs
) then
6 return valid
7 end
8 // If a possible collision is detected,
9 if (a
k
A s.t. (Π[t] = π
P
k
[t]) (Π[t] =
π
P
k
[t + 1] Π[t + 1] = π
P
k
[t])) then
10 // The lower-priority agent releases
all provisional nodes.
11 if p
k
(t
0
) < p
i
(t
0
) then
12 π
P
k
, π
P
i
[t + 1] Π[t + 1]
13 else
14 π
P
i
15 return nil
16 end
17 end
18 end
19 end
4.2.2 Provisional Booking
In the function addProvisionalNodes (Alg. 3), agent
a
i
attempts to provisionally book some nodes until a
i
has secured or provisionally booked its destination d
i
,
or before a crossing node (Line 5). If a
i
detects a
potential collision (Line 9), the provisional nodes of
the lower-priority agent will be released. If a
i
is the
higher-priority agent, the next node in Π[t + 1] is pro-
visionally booked and stored to π
p
i
[t + 1].
Figure 6b illustrates the situation in which agents
utilizing PIBT-PB avoid the unnecessary blo c king of
other agents shown in Fig. 6a. Initially, a
3
secures the
right node and provisionally books another right node
(depicted as the two red nodes in Fig. 6b) en route
to its destination by u sin g the mPIBT and addPro-
visionalNodes functions. Next, a
2
secures an upper
node without booking any pr ovisional nodes. Sub-
sequently, a
1
secures the left node and provisionally
books five left nodes, in c luding its destination node,
based on the shortest path to the destination.
4.3 Property of PIBT-PB
Each agent has unique priority. Hence, one agent a
i
always has the highest priority. This agent con tin-
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
534
ues to have top priority until it arrives at its destina-
tion. Furthermo re, from mPIBT in PIBT-PB, a
i
can
always secure the highest ranked v N
v
i
(t)
at t based
on the shortest path to its destination because (1) pair
(v
i
(t), v) has a cycle path, and (2) whe n a
i
calcu-
lates the shortest path, other agents do not secure the
next node, and the provisional nodes booked by other
agents are not considered as obstacles. Therefore, the
following proposition is evident fr om Pro p. 2.
Proposition 3. An agent with the highest priority can
reach its destination in a finite time if the environment
G is the relaxed bi- connected graph.
When an agent reaches its destination, its prior-
ity becomes the lowest. Furthermore, if p
i
(t) > p
j
(t)
for a
i
, a
j
A, the sam e inequality holds at t + 1 un-
less a
i
reaches its destination at t + 1. In addition, if
there is an age nt that cannot reac h its design ation, its
priority will monotonically increase and become the
highest, thus allowing it to fin ally reach the destina-
tion. Therefore, the following theorem holds.
Theorem 1. When environment G is a relaxed bi-
connected gr aph, a ll agents can reach their destina-
tions in a finite time.
When all nodes are crossing, PIBT-PB opera te s in
the same manner as PIBT, as the process for determin-
ing agents’ subseque nt nodes is fundamentally equiv-
alent to PIBT without the use of provisio nal node
booking. This means that agents using PIBT-PB fol-
low the same paths as those using PIBT. If the envi-
ronment has some one-way node s, agents can book
provisional nodes and dete ct possible head-on colli-
sions earlier than with PIBT.
Similar to winPIBT, PIBT-PB requires all agents
to have access to shared memory containing infor-
mation ab out locations, time, and agents of all se-
cured/provisional n odes. However, in o ur proposed
approa c h, agent a
i
only p rovisionally books one-way
nodes. Consequently, a
i
utilizes the node informa-
tion solely for the connected one-way path that termi-
nates at a provisionally unboo ked crossing node. This
approa c h reduces the cost of accessing shared mem-
ory by organizing it according to the graph structure
(Proposition 2) because the provisional node s are con-
tained within a spe cific maximal bi-connected com-
ponen t of G.
5 EXPERIMENTAL EVALUATION
5.1 Experimental Environment
We evaluate the propo sed method by conducting ex-
periments in three environments shown in Fig. 7 (En-
Figure 7: Environment 1 (randomly assigned destinations).
Figure 8: Environment 2.
vironm ent 1 Env. 1), Fig. 8 (Environment 2
Env. 2), and Fig. 1 (Env. 3) and compared the perfor-
mance of agents using PIBT-PB with that of agents
using the baselines, PIBT and winPIBT. Env. 1 is a
model of a warehouse, which is o ften used for experi-
ments in some papers. Pickup and delivery nodes are
selected randomly when a task is generated. In this
type of environment, we cannot identify the clear dif-
ference in performance between PIBT and winPIBT;
however, we aim to demonstrate the super ior perfor-
mance of our method. Env. 2 is a simple model de-
signed to illustrate the differences between PI BT and
PIBT-PB. Although only four corners are one-way
nodes, PIBT-PB outperforms PIBT. In Env. 2, green
and blue nodes denote pickup and delivery points, re-
spectively. Thus, the agent first moved to a green
node and then proceeded to a blue node, located ei-
ther on the same or opposite side. Env. 3 (Fig. 1) sim-
ulates a construction site where agents pick up ma-
terials at one of the storage areas ( green nodes) and
deliver them to a node in the installation areas (blue
nodes). Their pickup and delivery locations are un-
balanced, thus congestion ma y occur near pickup and
delivery areas if the number of agents is large.
The initial positions of the agents were randomly
assigned to each environment, and 1000 (600) MAPD
tasks were allocated to Envs. 1 and 3 (E nv. 2). Each
task τ comprises pickup and delivery points τ =
(g
1
, g
2
). The assigned agent sets g
1
and then g
2
as
its destinations. Upon task completion, another task
is assigned from Γ, which is a set of MAPD tasks. The
efficiency metric for MAPD is the makespan, which
measures the n umber of steps from the start of the
experiment to the completion of all task s; a lower
Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems
535
makespan indicates a better performance. The win-
dow size of winPIBT W (> 1) was optimized for the
shortest makespan and varied according to the envi-
ronment. Experiments were conducted by varying the
number of agents N to assess the performance impact
on PIBT-PB and the baselines. The results p resented
below are the averages of 20 independent trials.
5.2 Experimental Results
The results are presented in Fig. 9. Across all three
environments, the findings de monstrate that PIBT-PB
outperformed the oth e r methods, achieving shorter
makespan time steps regar dless of agent quantity. As
shown in Fig ure 9a, PIBT and winPIBT displayed
nearly identical performances in Env. 1, where the
agent numbers ranged from 20 to 100, and the win-
PIBT window sizes were set at W = 5 and 10. The
value W = 10 was larger in other environment. Thus,
agents could find possible conflicts mu c h earlier than
those o f agents using PIBT. However, they restricted
other agents’ movement. Because the advantages
and disadvantages cancel each other out, the values
of makespans for winPIBT were almost identical to
those of PIBT. In contrast, PIBT-BP did not book any
crossing nodes as provisional no des in advance and
needlessly restricted other agents’ movements, wh ile
it could detect possible conflicts effectively. For ex-
ample, the makespan of PIBT-PB (1040.8) was ap-
proxim ately 14.3% lower than that of PIBT (1214.7)
when N = 100.
The makespans in Env. 2 are p lotted in Fig. 9b ,
in which we varied the number of agents from 10 to
50. With 48 nodes in the moving area (white nodes),
N = 50 means the environment is extremely crowded.
Figure 9b shows that the agents using PIBT-PB ex-
hibited shorter makespans than those using the base-
lines, even in such a simple environment. Because the
pickup and delivery nodes could be set to four co rner
nodes, agents could reach them with the final step by
provisional booking. Even in a grid-like environment,
such as Env. 2, age nts using winPIBT restricted other
agents by securing some nodes o n the opposite side,
making the pe rformance of winPIBT optimal when
W = 3, a relatively small window size.
In Env. 3, PIBT-PB demonstrated superior perfor-
mance, whereas PIBT a nd winPIBT (W = 5) showed
nearly identical re sults, with winPIBT having a slight
edge. In this setting , winPIBT agents f requently
blocked other agents, causing them to wa it at in te r-
section points in the cen tral areas. The benefits and
drawbacks of winPIBT compared to PIBT balance
each other, similar to Env. 1, resulting in a compa-
rable efficiency. However, PIBT-PB agents success-
fully avoided unnecessary blockag e s at intersections
where agents could potentially hinder each other ’s
movements, thus enhancing the efficiency of detect-
ing head-on collisions.
5.3 Discussion
Our experiments revealed that PIBT-BP impr oved
MAPD problem-solving efficacy compared with tra-
ditional pr iority-based method s, including PIBT and
winPIBT. WinPIBT agents increased path-plann ing
efficiency by using time windows for e a rly collision
detection, unlike PIBT. However, winPIBT also im-
posed strict movement constraints on other agents, of-
ten hindering them at intersections by securing addi-
tional nodes. To mitigate th is, we avoided securing
extra nodes at intersections, where movement might
be restricted. Critical scenario s to prevent potential
collisions in narrow, one -way paths wher e agents can-
not pass ea ch other and must backtrack and waste
time. Therefore, PIBT-PB agents books as many pro-
visional nodes as possible on one-way routes to detect
future collisions. PIBT-BP is expected to be particu-
larly effective in maze-like env ironments, with lim-
ited or lengthy alternative routes and warehouse set-
tings.
6 CONCLUSION
We proposed PIBT with provisional booking (PIBT-
BP), which extends PIBT to detect possible collisions
earlier. First, agents using PIBT- BP secure the next
node similar to PIBT, and then attempt to provision-
ally book additional one-way nodes in advance, as
booking a crossing node often blocks o ther ag ents’
actions. Subsequently, if no c ollision is detected, the
agent p roceeds through the provisional nod es step-by-
step. If a higher-priority ag e nt a pproaches, it releases
the booked nodes and generates an alternative path by
treating the approaching agent as an additional obsta-
cle. Nonetheless, agents always reach their destina-
tions, because the first step follows a method that is
essentially identical to PIBT.
One of the future works is to improve the dynamic
prioritization used for co ntrolling agents. In this
study, the prioritization depends on the time elapsed
since the destination was updated a nd restarted.
Therefore, an agent with a nearby destination may
have to take a detour owing to its priority level. To
avoid this situation, we propose utilizing informatio n,
such as the distance between the agent and the des-
tination, as well as the structural informa tion of one-
way paths in the environment. We also need to adapt
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
536
(a) Env. 1 (b) Env. 2
(c) Env. 3
Figure 9: Experimental Results of Makespan.
our method to accommodate flexible environments,
such as construction sites. These environments u n-
dergo changes where paths ar e generated or elimi-
nated owing to the con struction of new walls, doors,
polls, and changes in material storage areas.
REFERENCES
Ankit, K., Tony, L. A., Jana, S., and Ghose, D. (2022).
Multi-agent Cooperative Framework for A utonomous
Wall Construction. In Saraswat, M., Sharma, H., Bal-
achandran, K., Ki m, J. H., and Bansal, J. C., editors,
Congress on Intelli gent Systems, pages 877–894, Si n-
gapore. Springer Nature Singapore.
Farinelli, A., Contini, A., and Zorzi, D. (2020). Decentral-
ized task assignment for multi-item pickup and deliv-
ery in logistic scenarios. In AAMAS.
Goldenberg, M., Felner, A., Stern, R ., Sharon, G., Sturte-
vant, N., Holte, R., and Schaeffer, J. (2014). Enhanced
partial expansion A*. J.Artif.Intell.Res, 50:141–187.
Jennings, J., Whelan, G., and Evans, W. (1997). Coopera-
tive search and rescue wi th a team of mobile robots.
In 1997 8th International Conference on Advanced
Robotics. Proceedings. ICAR’97, pages 193–200.
Li, J., Tinka, A., Kiesel, S., Durham, J. W., Kumar, T. K. S.,
and Koenig, S. (2020). Lifelong Multi-Agent Path
Finding in Large-Scale Warehouses. In AAMAS.
Luna, R. and Bekris, K. E. (2011). Push and swap: Fast co-
operative path-finding with completeness guarantees.
In IJCAI, pages 294–300.
Ma, H., Honig, W., Kumar, T., Ayanian, N., and Koenig, S.
(2019). Lifelong path planning with kinematic con-
straints for multi-agent pickup and delivery. In AAAI.
Ma, H., Li, J., Kumar, T. S., and Koenig, S. (2017). Lifelong
Multi-Agent Path Finding for Online Pickup and De-
livery Tasks. In Proc. of the 16th Conf. on Autonomous
Agents and MultiAgent Systems, AAMAS ’17, pages
837–845.
Miyashita, Y., Yamauchi, T., and Sugawara, T. (2023). Dis-
tributed Planning with Asynchronous Execution with
Local Navigation for Multi-agent Pickup and Delivery
Problem. In AAMAS.
Okumura, K., Machida, M., D´efago, X., and Tamura, Y.
(2019a). Pr iority Inheritance with Backtracking for
Iterative Multi-agent Path Finding. In Proc. of the
Twenty-Eighth International Joint Conf. on A rtificial
Intelligence, IJCAI-19.
Okumura, K., Tamura, Y., and D´efago, X. (2019b). win-
PIBT: Expanded Prioritized Algorithm for Iterative
Multi-agent Path Finding. CoRR, abs/1905.10149.
Sharon, G. , Stern, R., Felner, A., and Sturtevant, N. R.
(2015). Conflict-based search for optimal multi-agent
pathfinding. Artif.Intel, 219:40–66.
Shimada, D., Miyashita, Y., and Sugawara, T. (2025). Path
Finding with Flexible Provisional Booking in Multi-
agent Pickup and Delivery Problems. In Arisaka, R.,
Sanchez-Anguix, V., Stein, S., Aydo˘gan, R., van der
Torre, L., and I to, T., editors, PRIMA 2024: Princi-
ples and Practice of Multi-Agent Systems, pages 74–
80, Cham. Springer Nature Switzerland.
Silver, D. (2005). Cooperative pathfinding. In Proc. of the
AAAI Conf. on Artificial Intelligence and Interactive
Digital Entertainment, volume 1, pages 117–122.
Standley, T. S. (2010). Finding optimal solutions to cooper-
ative pathfinding problems. In AAAI, volume 1, pages
28–29.
Wagner, G. and Choset, H. (2015). Subdimensional expan-
sion for multirobot path planning. Artif.Intell, 219:1–
24.
Wang, Z. (2023). Research on optimal route planning and
delivery strategy of multiple robots using HCA algo-
rithm in a restaurant. In 2023 IEEE 2nd Int. Conf.
on Electrical Engineering, Big Data and Algorithms
(EEBDA), pages 1740–1746.
Yamauchi, T., Miyashita, Y., and Sugawara, T. (2022).
Standby-Based Deadlock Avoidance Method for
Multi-Agent Pickup and Delivery Tasks. In AAMAS.
Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems
537