Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup

and Delivery Problems

Daiki Shimada, Yuki Miyashita

and Toshiharu Sugawara

Waseda University, Shinjuku, Tokyo 1698555, Japan

Keywords:

Multi-Agent Pickup and Delivery, Cooperation, Path Finding.

Abstract:

We propose an efﬁcient method for determining subsequent movements based on temporarily generated short-

est paths in the multi-agent pickup and delivery (MAPD ) problem. The MAPD problem involves multiple

agents (such as carrier robots) continuously performing transportation tasks in a vast environment wi th obsta-

cles while avoiding collisions with other agents. Our method is an extension of the decentralized path-ﬁnding

algorithms, priority inheritance with backtracking (PIBT), and can be efﬁcient in environments w ith narrow

one-way paths and few detours. Our method, PIBT with provisional booking (PIBT-PB), not only secures the

next node as in PIBT but also provisionally books some nodes in advance based on dynamic priorities between

agents to detect possible conﬂict earlier. Therefore, it reduces the number of “turning back” and wasted “wait-

ing” actions in environments. Our experiments show that PIBT-PB is more efﬁcient than the baselines, PIBT

and windowed P IBT, and even in less restrictive environments, it performs as efﬁciently as PIBT.

1 INTRODUCTION

Signiﬁcant interest has been growing in the ﬁeld of

multi-agent systems (MAS) designed for executing

sophisticated collaborative and coordinated tasks. Ex-

amples o f such applications include coo perative se-

curity monitoring using multiple patrolling robots,

material transportation using carrier robots in ware-

houses (Ma et al., 2017) , construction sites (Miyashita

et al., 2023; Ankit et al., 2022), rescue environ-

ments (Jennings et al. , 1997), and automated cater-

ing rob ots in restaurants (Wang, 2023). In these sys-

tems, agents must individually determine appropriate

actions based on their r ecognition of the surroundings

to fulﬁll their o bjectives while avoiding negative inter-

ference, such as collision and resource conﬂict. Par-

ticularly, we address the problem of the m ulti-agent

pickup and delivery (MAPD) (Ma et al., 2017) in

a c onstruction site context. Here, numerous agents

(e.g., carrier robots) continuously pick up heavy and

large materials and deliver them to the required desti-

nations during the night, ready for the next day’s in-

stallation work by human builders. This process is

carried out while avoiding collisions and deadlocks.

Therefore, it can be viewed as a repe ated multi-agent

https://orcid.org/0000-0002-1676-9346

https://orcid.org/0000-0002-9271-4507

path ﬁnding ( MAPF) problem, where multiple agents

generate collision-free paths between their start and

goal locations.

In a construction site, installation mater ia ls are

stored in a few storage areas and carried to speciﬁc

locations where the materials will be installed. An-

other example of an MAPD situation is where mate-

rials are carried to a nd from trucks in the loading b ay.

This suggests that agents in the MAPD problem tend

to gather in a few areas, restrictin g the paths that they

can select and easily leading to deadlo c k and conges-

tion. These problems can be mitigated in automated

warehouses because they are specially designed to

accommodate the movement of carrier a gents, with

wide paths and many detours of similar lengths. How-

ever, this is not th e case for constru c tion sites; agents

must move along relatively narrow paths that d o not

allow them to pass each other. The topological struc-

ture of the environment may change because of new

walls and doors installed the previous day. Further-

more, agents may take detours to avoid head-on col-

lisions, but some detour s are relatively longer. T here-

fore, the agent has to ﬁnd a possible collision as early

as possible to generate collision-free paths.

Numerous studies on MAPF and MAPD prob-

lems (Ma et al., 2017; Okumura et al., 2019a; Sha ron

et al., 2015; Yamauchi et al., 2022; Li et al., 2020)

have been conducted beca use of their many appli-

528

Shimada, D., Miyashita, Y. and Sugawara, T.

Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems.

DOI: 10.5220/0013158300003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 3, pages 528-537

ISBN: 978-989-758-737-5; ISSN: 2184-433X

cations, and our study focuses on a decentralized

path-ﬁnding algorithm, such as priority inheritance

with backtracking (PIBT) (Okumura et al., 2019a)

and windowed priority inheritance with backtracking

(winPIBT) (Okumura et al., 2019b) algorithms, be-

cause they guarantee reachability in a decentralized

manner with fewer constraints on the arrangement o f

destinations. Agents using these algorithms can deter-

mine paths for their tasks individually based on their

priorities, thus reducing the planning time by localiz-

ing the co mputation. Although PIBT is usually efﬁ-

cient, it may delay the d e te ction of po ssible collisions

because the agent communic a te s with oth er age nts

two nodes away and can only secure the next position

toward its destination. To a ddress this issue, winPIBT

exclusively secures some nodes with a ﬁxed time win-

dow. A longer window enables agents to detect pos-

sible collisions earlier, but the overall efﬁciency may

decrease because the low-priority agents are restricte d

from moving by the secured nod e s, even if no colli-

sions occur.

Thus, we propo se a method called PIBT with pro-

visional booking (PIBT-PB). In this method, agents

secure their immediate n ext nodes as in PIBT, but re-

frain from securing subsequent nodes. Instead, they

tentatively boo k severa l nodes in advance, which are

referred to as provisional nodes. This strategy al-

lows for earlier detection of potential head-on col-

lisions and prevents the obstruction of other agents.

The number of provisional nodes is ﬂexible, and can

be adjusted based on the topological struc ture of the

environment. We show th at PIBT-PB maintains the

reachability of all agents in the relaxed bi-connected

area, as in PIBT, and demonstrate that it performs

as efﬁciently as PIBT. Our experimental results indi-

cate that PIBT-PB reduces the makespan — the total

time required to complete all tasks, including plan-

ning time — compared to the baseline methods, PIBT

and winPIBT, in our test environment. In addition, we

discuss the beneﬁts and limitations of the proposed

approa c h.

2 RELATED WORK

Many studies have been conducted on M APF and

MAPD problems (Ma et al., 2017; Okumura et al.,

2019a ; Sharon et al., 2015; Standley, 2010 ; Golden-

berg et al., 2014; Wagner and Choset, 2015 ; Silver,

2005; Yamauchi et al., 2022; Li et al., 2020). They

can be roughly classiﬁed into centralized and dec e n-

tralized approaches. In ce ntralized appr oaches, the in-

formation on the environment and all agents, are col-

lected to a single agent/server and it calculates and

distributes all reasonable collision-free paths to all

agents ( Sharon et al., 2015; Luna and Bekris, 2011).

For example, in conﬂict-based search (CBS) (Sharon

et al., 2015), the path planning process is divided

into high- and low-level search subprocesses. In low-

level search, agents independently generate the short-

est p aths to their destinations and send them to the

centralized server. The server then modiﬁes all paths

to eliminate possible collisions and distribute them to

individual agents. Although this approach is likely

to control all agents optimally in terms of travel path

length, th e computational cost increases as the num-

ber of agents increases. Moreover, we must con-

sider the system’s ﬂexibility issue in the sense that if

one age nt cannot move as planned for any reason, all

agents will b e affected by the replanning process for

the entire control.

In a decentralized approach (Ma et al., 2017; Oku-

mura et al., 2019a; Ma et a l., 2019 ; Yamauchi et al.,

2022; Li et al., 2020; Farinelli et al., 2020; Miyashita

et al., 2023 ), agents autonomously decide their own

paths. Although this approach is more ﬂexible, as

agents have only local inf ormation, it has some issues

such as the optimality of generated paths and reac ha-

bility. For example, in token passing (TP) (Ma et al.,

2017), the agent accesses the token, a type of shared

memory, to refer to its content, generates the shortest

collision-free path, and stores the information on that

path into the token. TP can guarantee reachability un-

der a reasonable assumption for an au tomated ware-

house but its performance degrades in our target en-

vironm ent because it has only a few endpo ints. Li et.

al (Li et al., 2020) prop osed the rolling-horizon co lli-

sion resolution (RHCR) in which agents replan their

paths at regular intervals while che cking possible col-

lisions within a certain window size. However, in con-

gested situations, path generation becomes costly and

cannot guarantee reachability.

Meanwhile, PIBT can reach the destination with

only local communication, and reachability is guar-

anteed in an environment whose topological structure

is relaxed bi-connected. However, its short-sighted

algorithm delays the detection of collisions, reduc-

ing the efﬁciency. To overcome this drawback, win-

PIBT (Okumura et al., 20 19b) secures a ﬁxed num-

ber of nodes in advance to detect possible collisions

earlier. However, it is not always safe, meaning that

agents may unnecessarily r estrict the movement of

other lower-priority agents. Our proposed method

can be co nsidered an extension of PIBT, where some

nodes are provisionally booked in advance if the

nodes are safe to book. Although we already reporte d

the abstract of this extension (Shimada et al., 2025),

we pr ovide a detailed explanation of the algorithm

Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems

529

Figure 1: Example environment (Env. 3).

(a) Deadlock

(b) Priority inheritance

Figure 2: Procedure of Priority Inheritance.

and discusses the th e oretical completeness of the al-

gorithm and the results of experimental evaluation.

3 PROBLEM DESCR IP TION

3.1 MAPD

Let A = {a

, a

, ..., a

} be the set of N (∈ N) agents,

where N is the set of non-negative integers. A

MAPD e nvironment can be expressed by a connected

undirected graph G = (V , E) embeddable in a two-

dimensional Euclidean space, where V and E are the

sets of nodes and edge connecting nodes, resp ectively.

We introduce the discrete time t ∈ N. An agent can

move along e dge (u, v) ∈ E connectin g two nodes

v, u ∈ V . As G is undirected, if (v, u) ∈ E, (u, v) ∈ E.

We assume that the length of all edges is 1, meaning

that any agent can move to an adjacent node in one

time step (by adding dummy nodes if ne c essary). An

example of the grid-based environment is shown in

Fig. 1, where red circles are age nts that can move up,

down, left, and right if there is space.

The location of agent a

at time t is denoted by

(t) ∈ V. At e a ch time t, a

can m ove to v

(t + 1) ∈

(t)

or stay at the sam e node v

(t), where N

= {v ∈

V | (u, v) ∈ E} is the set of the n eighbor nodes o f

u ∈ V . A collision occurs when two agents stay at one

node or cross the same edge simultaneously. The re-

fore, the following conditions must be satisﬁed for

∀a

, a

∈ A (i 6= j).

(t) 6= v

(t) ∧ (v

(t) 6= v

(t + 1) ∨ v

(t + 1) 6= v

(t))

(1)

Meanwhile, synchronized circular movements with-

out collisions are assumed to be possible, that is,

(t + 1) = v

(t) ∧ v

(t + 1) = v

(t) ∧ . . .

∧ v

(t + 1) = v

(t)

(2)

is allowed for different agents a

, a

, . . . , a

∈ A.

Let Γ = {τ

, τ

, ...} b e a ﬁnite set of tasks. In

MAPD, task τ

is represented by a pair of pickup node

and delivery node g

6= g

). When τ

is as-

signed to a

at t, τ

is removed fro m Γ and a

travels

from v

(t) to g

, g

by setting its destination in turn.

After both destinations are visited, another task in Γ

is assigned to a

if Γ 6= ∅. This a lso means that all

agents individually determine their next destinations

when they have reached the curr ent de stin a tions. We

denote the set of th e current destinations of all agents

as D = {d

, . . . , d

}, where d

is the destinatio n of a

and we deﬁne d

= nil if a

has no destination.

3.2 Priority Inheritance with

Backtracking (PIBT)

According to PIBT (Okumura et al., 2019a), each

agent a

has a priority that is updated at each time

step. The age nt interacts with nearby agents through

a recursive process involving priority inheritance (PI)

and backtracking (BT) to determine its next move

based on the priority order.

Deﬁnition 1. (Relaxed bi-connected graph) Graph

G = (V, E) is a relaxed bi-connected grap h iff G is a

connected g raph, and there is a cyclic path of at least

length 3 between any pair of neig hbor nodes in G.

PIBT has been proven to ensure that all agents can

reach their destinations without deadlo c k, livelock,

or collision, if G = (V, E) is a re la xed bi-connected

graph and |A| ≤ |V |. We brieﬂy explain PIBT. Agent

∈ A has a priority

(t) = ε

+ η

(t) (3)

for the m ovement along its planned path. Here, ε

(0 ≤ ε < 1) is the base value of the unique priority as-

signed to a

initially, and η

(t) (∈ N) is the elapsed

time after a

updates its destination. Agent a

sets

(t) to 0 when arriving at its destination. Evidently,

agents have different priorities.

First, agent a

ranks the nodes in N

(t)

∪ {v

(t)},

usually at time t, according to the estimated distance

to its destination from v ∈ N

(t)

∪ {v

(t)} usin g some

path-ﬁnding methods for MAS, such as c ooperative

A* search (CA*) (Silver, 2005). Then, a

secures the

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

530

(a) Deadlock in PI (b) PI after BT

Figure 3: Example Procedure of Pri ority Inheritance and

Backtracking.

next node in the order of priority as follows. Suppo se

that a

has the highest priority in its local area. If agent

s.t. p

(t) < p

(t) occupies node a

and wants to

move next, a

pushes a

and a

must yield its current

location by moving a certain direc tion. However, this

may result in a potential collision with other agents.

In this case, a

inherits its priority to a

to avoid it.

An example is shown in Fig. 2, where the arrow ind i-

cates the potential moves and p

(t) < p

(t).

In Fig. 2a, a

attempts to secure a

’s current node, and

thus, a

has to move left, but as a

tries to move to the

same node, a

cannot determine the next node. How-

ever, by PI, the priority of a

is inherited b y a

, thus,

has a higher priority than a

(Fig. 2b) a nd can se-

cure the left node. Meanwhile, a

does not move right

but chooses to “stay” at its current nod e, which is the

next ranked node toward its destination , as shown in

Fig. 2c.

Backtracking (BT) is a process where the results

of the priority inhe ritance (PI) process, which at-

tempted to secure a node, are returned in reverse or-

der. Particularly, an agent that fails to secure a node

returns the result directly to the agent that inherited

its priority. Upon receiving th is infor mation of fail-

ure, the priority inherited agent attempts to move to

the next be st node if possible. If there are no other

agents at that node, it secure s that node; otherwise,

it r ecursively executes PI and attempts to secure that

node. It then repeats this op eration. If all relevant

agents cannot determine th e next location through the

PI and BT processes, they decide that the PI is stuck

and cho ose to remain at the current nodes. Note that

in environments that meet the condition of the relaxed

bi-connected graph, it is demonstrated that all agents

can determine their next nodes by PI and BT (Oku-

mura et al., 2019a) .

An example procedure of BT is shown in Fig. 3,

in which the priority of each agen t is assumed to

be greater in descendin g order of subscript numbers.

First, the highest priority agent a

attempts to secure

the next node where a

is located. Figu re 3a shows

(a) Moves at time t (b) At time t + 1

Figure 4: Inefﬁcient path planning in PIBT.

that the priority of a

is inherited alon g a

, a

, and

. However, because a

could not secure the next

nodes, a

and a

backtrack by returning the failure

results, as shown in Fig. 3b, a nd remaining in their

current nodes. Subsequently, a

can select another

node to move by pushin g a

because a

has hig her

priority than a

owing to PI. Th is result of suc c ess is

then delivered by backtracking along a

, a

, and

(Fig. 3 c). After that, all involved agen ts can decide

their next nodes. Figu re 3d shows their positions at

the next time step, wherein a

and a

cannot move,

but they can stay a t the same positions.

3.3 winPIBT (windowed PIBT)

The winPIBT (Okumura et al., 2019b) algo rithm e n-

hances the PIBT by incorporating a time window, al-

lowing agents to secure multiple nodes for c onsecu-

tive time steps. This extension aims to mitigate the

inefﬁciencies in PIBT, particularly those stemming

from delayed collision detection.

An example of inefﬁciency in PIBT is illustrated

in Fig. 4 at time t, where agen ts a

and a

move

to their destinations indicated by ﬂags of the same

colors. Here, agent a

has a higher priority than a

(t) > p

(t)), and the agents’ secured nodes for the

next moves are shown in the sam e colors of agents.

In Figs. 4a and 4b, both agents secure and move ac-

cording to the shortest paths that they individually

generate. After that, they encounter and recognize

the possible collision. Therefore, a

must push b ack

the lower-priority agent a

a few times, as shown in

Figs. 4c and 4d. These moves by a

are wasteful and

inefﬁcient.

For this problem, in winPIBT, th e information of

the secured n odes is shared among all agents (at least,

close agents), and the highest priority agents that have

Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems

531

(a) a

secures path (b) collision detected

replans a path (d) 3 steps later

Figure 5: Movements in winPIBT.

not yet dete rmined the next node attempt to secure

some future nodes whose number is speciﬁed by the

ﬁxed size of the time-window, W (> 0). We illus-

trate how winPIBT mitigates this wasted movement

problem using Fig. 5 , where W = 3. At t, a

secures

the nodes for up to three time steps ahead (Fig. 5a).

Then, a

attempts to secure three nodes but can se-

cure only one ne ighbor node (Fig. 5b). If there were

no other path, a

would stay at this node for three

time steps, but in this situation, there is anothe r path,

and the node after thre e steps will be the same or

closer to its destination. Thus, it secures three nodes

along an alternative path at t (Fig. 5c). Figure 5d

shows the next reservation at t + 3, in dicating that

both agents can reach their destination without unnec-

essary movement. Note that it is obvious that when

W = 1, winPIBT is eq uivalent to PIBT.

winPIBT introduces the disentangled condition,

which intuitively requires that agents secure paths in

advance while holding Condition (1). This condi-

tion includes that agent a

cannot secure the nu m-

ber of nodes at t

that must be equal or less than the

number of the nodes already secured by agent a

) > p

) and canno t secure the node that has

been secured by a

to pass before a

. Thus, they must

satisfy the following condition,

(t) 6= v

(t + k) f or t

≤ ∀t ≤ t

+ w an d 0 ≤ k ≤ t

(4)

where π

= (v

), ..., v

+ w)) is the nodes secured

by a

(w ≤ W), v

(t + k) is the n ode at time t + k se-

cured b y a

, and t

is the last time when a

already

secured the node.

Although winPIBT has the reachability to destina-

tions for all agents, the condition expressed by Eq. (4)

indicates that the agents are likely to block the moves

of other ag e nts with the lower priorities even if no col-

lision occurs, resulting in inefﬁciency. Figure 6a ex-

(a) A blocking situation.

(b) Secured and provisional

nodes.

Figure 6: Comparison between wi nPIBT and PIBT-PB.

presses an example situation in which p

< p

for agents a

, a

, and a

, and they move to their desti-

nations with the same colors. Because a

ﬁrst secures

some nodes earlier, including a crossing node, a

and

must wait for a

to pass, even if no collision occurs.

If the ﬁxed length of time window is short, agents may

delay in detecting possible collisions. In contrast, a

longer time win dow allows earlier collision detection

but incre ases the likelihood of blocking other agents.

Therefore, we aim to address this problem while ef-

fectively detecting possible collisions.

4 PROPOSED METH OD

4.1 Maximal Bi-Connected Component

and Node Classiﬁcation

First, we analyze the characteristics of a bi-connected

graph. Inefﬁcient and wasteful behavior in PIBT

occurs when a lower-priority agent a

encounters a

higher-priority agent a

and a

has no other way ex-

cept turning back. This situation occurs in a narrow

path, and the cost of going bac kward is signiﬁcant if

the distance is long.

First, let us consider the characteristics of the re-

laxed bi-connected graph.

Proposition 1. If G = (V, E) is a relaxed bi-connected

graph, |N

| ≥ 2 for ∀v ∈ V .

Proof. If ∃u ∈ V s.t. N

is the singleton set, i.e., N

′

}. Clearly, the pair (u, u

′

) does not have a cyclic

path.

We can classify nodes in V into one-way and

crossing nodes from Proposition 1.

Deﬁnition 2. Node v ∈ V is called a one-way node if

| = 2; otherwise, v is called a crossing node.

We denote the sets of one-way and cro ssing nodes

as V

= {v | |N

| = 2} and V

crs

= {v | |N

| ≥ 3}, re-

spectively. We also investigate the structure of a re-

laxed bi-connected graph.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

532

Deﬁnition 3. Graph G = (V, E) is a bi-connec ted

graph iff any pair of nodes, v

, v

∈ V , has a cyclic

path in G.

For graph G = (V, E) and V

′

⊂ V , we can nat-

urally generate subgraph G

′

= (V

′

, E

′

), where E

′

{(v

, v

) ∈ E | v

, v

∈ V

′

Deﬁnition 4. Subgraph G

′

= (V

′

, E

′

) generated from

graph G = (V, E) is maximal bi-connected component

iff G

′

is b i-connected an d ∀v ∈ V \ V

′

, the subgraph

naturally generated from V

′

∪{v} is not bi-conn ected.

Proposition 2. (Structure of a relaxed bi-

connected graph) G = (V, E) is a conn e cted graph

and G

= (V

, E

), . . . , G

= (V

, E

) ar e all maximal

bi-connected subgraphs of G. G = (V, E) is a r elaxed

bi-connected graph iff G consists only of a number of

subgrap hs of maximal bi-conn ected components, i.e.,

V = V

∪··· ∪V

and E = E

∪···∪E

. Furthermor e ,

any pairwise intersections of maximal bi-connected

subgrap hs have at most one node (Miyashita et al.,

2023), and this intersection node has at least four

neighboring nodes, with two of them belonging to

the same maximal bi-connected node.

Proof. The sufﬁcient condition is straightforward.

Suppose that G is a relaxed bi-connected graph. If

there exists ∃v ∈ V \V

∪ ··· ∪V

such that ∃v

′

∈ N

and v

′

∈ V

∪···∪V

(if not, G is not connected), then

v and v

′

have a cyclic path. Thus, they belong to

a maximal bi-connected subgraph, which also leads

to a contradiction. The latter part is also evident be-

cause if their intersection has two distinct nodes, the

union of these sub graphs forms a larger bi-connected

graph (Miy ashita et al., 2023), which contradicts that

these subgraphs are maximal. Furth ermore, suppose

there is o nly one neighboring node v

′

of an intersec-

tion nod e v belonging to a maximal bi-co nnected sub-

graph G

. Because v

′

, v ∈ V

, there is a cyclic path

connecting these nodes, mean ing that v has another

neighboring node v

′′

∈ V

, which leads to a contradic-

tion. This also means that any intersection node of

maximal b i-connected g raphs has at least four neigh-

boring nodes.

Proposition 2 shows that if a node u belongs to the

intersection of two maximal bi- c onnected subgraphs,

then u ∈ V

crs

. Thus, a sequence of one-way nodes

belongs to a maximal bi-connected component.

4.2 PIBT with Provisional Booking

The ce ntral idea of PIBT-PB for MAPD in a relaxed

bi-connected area is that an agent ﬁrst secures the next

node. If it does not inherit the priority from other

Algorithm 1: PIBT-PB at current time t

for at least securing

the node at t

+ 1.

1 p

) is deﬁned by Eq. 3. Agent a

at v

) will

secure a node for t

+ 1.

2 π

[t] ∈ V : Secured node by a

for time t; a

deﬁnitely visits to it at t.

3 π

[t] ∈ V : Provisional nodes booked by a

for t in

advance.

4 procedure PIBT-PB(A, t

5 // t

suggests the current time.

6 S ← A and is sorted by p

) in descending

order.

7 for a

∈ S do

8 if π

+ 1] = nil then

9 // a

has no secured node at t

+ 1. if

(π

+ 1] 6= nil) then

10 mPIBT(a

, a

)

11 else

12 mPIBT(a

, nil,t

)

13 end

14 end

15 end

16 end

agents, it attempts to book further nodes provisionally

if the nodes are safe.

The algo rithms are listed in Algs. 1 , 2, and 3. We

denote the array of nodes that agent a

has secured

at t a s π

[t] ∈ V , and the arr ay of nodes that a

pro-

visionally booked at t as π

[t] ∈ V . In the pr oposed

method (functio n PIBT-PB in Alg. 1), agents’ priori-

ties are calculated using Eq. 3 every time. Suppose

that at current time t

the highest priority agent a

∈ A

does not secure its next node yet (π

] = nil). It ﬁrst

attempts to secure the next node using a PIBT-like

method, mPIBT. Subsequently, it provisionally books

additional nodes using addProvisionalNodes to detect

a po ssible collision earlier. We describe how agents

secure/provisionally book nodes in deta il.

4.2.1 Securing next Node

In PIBT-PB (Alg. 1), if a

has already provisionally

booked the next node (Line 9 in Alg. 1)) it invokes

mPIBT(a

, a

) to simp ly check the conﬂict. Other-

wise, a

invokes mPIBT(a

, nil,t

) (Line 12).

In Alg. 2, when a

6= nil, a

’s priority is inherited

from a

(including the case a

= a

) and a

releases the

provisional nodes because it is unin te ntionally push e d

by another agent (Line 3). If a

has no pr ovisionally

booked node at t

+ 1, it calculates the sho rtest path

to d

starting f rom π

) (= v

)) using CA* (or a n-

other similar algorithm ) by setting the already secured

node at t

+1 and the provisional nodes booked by the

higher priority agents thus far as obstacles (Line 7).

This is te mporarily stored to array Π. Note that agent

Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems

533

Algorithm 2: mPIBT at time t

to decide the next node to

move.

1 procedure mPIBT(a

, a

, t

2 if (a

6= nil ∧ a

6= a

) then

3 π

← ∅, p

) ← p

)

4 end

5 if (π

= ∅) then

6 // CA

∗

) searches the shortest path to d

by setting the secured and provisional

nodes as obstacles.

7 Π ← CA*(a

), π

+ 1] ← Π[t

+ 1]

8 end

9 while (π

6= ∅) do

10 // If there is a potential conﬂict, PI

occurs.

11 if (∃a

∈ A s.t. (p

) <

)) ∧ (π

+ 1] = π

])) then

12 if mPIBT(a

, a

) is nil then

13 // As a

secured a node even if

mPIBT returns nil, the

following CA* returns another

path if exists.

14 Π ← CA ∗ (a

)

15 π

+ 1] ← Π[t

+ 1]

16 continue

17 end

18 end

19 π

+ 1] ← π

+ 1]

20 if a

= a

then

21 return valid

22 end

23 if (∃a

∈ A s.t. π

+ 1] = π

+ 1])

then

24 π

← ∅

25 end

26 if (a

= nil) ∧ (π

[t + 1] 6= d

) then

27 addProvisionalNodes(a

, Π,t

)

28 end

29 return valid

30 end

31 π

+ 1] ← π

]

32 // stay if a

ﬁnds no node to move next.

33 return nil

34 end

s.t. p

) < p

) initially by Eq. 3 may now have

a higher priority by PI. In this case, a

has already se-

cured a node at t

+ 1 but has no provisional nodes. If

CA* c annot generate the path, it returns ∅ and jumps

to Line 33. Otherw ise, a

continues with another gen-

erated path Π and π

+ 1] (= Π[t

+ 1]).

If this is not the case, a

secures the provisional

node (Line 19) and returns “valid” if a

= a

. Sub-

sequently, if ∃a

, s.t. π

[t + 1] = π

[t + 1], a

release

the provision a l nodes because p

(t) < p

(t). Further-

more, if a

is not pushed by another agent, it invokes

addProvisionalNodes to add itionally book some pro-

visional nodes if possible. If a

cannot ﬁnd any neigh-

boring node, it remains at the current node (Line 33).

Algorithm 3: To ﬁnd some provisional node after t

+ 2.

1 procedure addProvisionalNodes(a

, Π, t

2 for (t ∈ {t

+ 1, . . . }) do

3 // Try to provisionally book π

+ 2]

and after that.

4 // If a

reached d

at t or the next node is

crossing, exit from the for-loop.

5 if (π

[t] = d

) ∪(Π[t + 1] ∈ V

crs

) then

6 return valid

7 end

8 // If a possible collision is detected,

9 if (∃a

∈ A s.t. (Π[t] = π

[t]) ∨ (Π[t] =

[t + 1] ∧ Π[t + 1] = π

[t])) then

10 // The lower-priority agent releases

all provisional nodes.

11 if p

) < p

) then

12 π

← ∅, π

[t + 1] ← Π[t + 1]

13 else

14 π

← ∅

15 return nil

16 end

17 end

18 end

19 end

4.2.2 Provisional Booking

In the function addProvisionalNodes (Alg. 3), agent

attempts to provisionally book some nodes until a

has secured or provisionally booked its destination d

or before a crossing node (Line 5). If a

detects a

potential collision (Line 9), the provisional nodes of

the lower-priority agent will be released. If a

is the

higher-priority agent, the next node in Π[t + 1] is pro-

visionally booked and stored to π

[t + 1].

Figure 6b illustrates the situation in which agents

utilizing PIBT-PB avoid the unnecessary blo c king of

other agents shown in Fig. 6a. Initially, a

secures the

right node and provisionally books another right node

(depicted as the two red nodes in Fig. 6b) en route

to its destination by u sin g the mPIBT and addPro-

visionalNodes functions. Next, a

secures an upper

node without booking any pr ovisional nodes. Sub-

sequently, a

secures the left node and provisionally

books ﬁve left nodes, in c luding its destination node,

based on the shortest path to the destination.

4.3 Property of PIBT-PB

Each agent has unique priority. Hence, one agent a

always has the highest priority. This agent con tin-

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

534

ues to have top priority until it arrives at its destina-

tion. Furthermo re, from mPIBT in PIBT-PB, a

can

always secure the highest ranked v ∈ N

(t)

at t based

on the shortest path to its destination because (1) pair

(t), v) has a cycle path, and (2) whe n a

calcu-

lates the shortest path, other agents do not secure the

next node, and the provisional nodes booked by other

agents are not considered as obstacles. Therefore, the

following proposition is evident fr om Pro p. 2.

Proposition 3. An agent with the highest priority can

reach its destination in a ﬁnite time if the environment

G is the relaxed bi- connected graph.

When an agent reaches its destination, its prior-

ity becomes the lowest. Furthermore, if p

(t) > p

(t)

for a

, a

∈ A, the sam e inequality holds at t + 1 un-

less a

reaches its destination at t + 1. In addition, if

there is an age nt that cannot reac h its design ation, its

priority will monotonically increase and become the

highest, thus allowing it to ﬁn ally reach the destina-

tion. Therefore, the following theorem holds.

Theorem 1. When environment G is a relaxed bi-

connected gr aph, a ll agents can reach their destina-

tions in a ﬁnite time.

When all nodes are crossing, PIBT-PB opera te s in

the same manner as PIBT, as the process for determin-

ing agents’ subseque nt nodes is fundamentally equiv-

alent to PIBT without the use of provisio nal node

booking. This means that agents using PIBT-PB fol-

low the same paths as those using PIBT. If the envi-

ronment has some one-way node s, agents can book

provisional nodes and dete ct possible head-on colli-

sions earlier than with PIBT.

Similar to winPIBT, PIBT-PB requires all agents

to have access to shared memory containing infor-

mation ab out locations, time, and agents of all se-

cured/provisional n odes. However, in o ur proposed

approa c h, agent a

only p rovisionally books one-way

nodes. Consequently, a

utilizes the node informa-

tion solely for the connected one-way path that termi-

nates at a provisionally unboo ked crossing node. This

approa c h reduces the cost of accessing shared mem-

ory by organizing it according to the graph structure

(Proposition 2) because the provisional node s are con-

tained within a spe ciﬁc maximal bi-connected com-

ponen t of G.

5 EXPERIMENTAL EVALUATION

5.1 Experimental Environment

We evaluate the propo sed method by conducting ex-

periments in three environments shown in Fig. 7 (En-

Figure 7: Environment 1 (randomly assigned destinations).

Figure 8: Environment 2.

vironm ent 1 — Env. 1), Fig. 8 (Environment 2 —

Env. 2), and Fig. 1 (Env. 3) and compared the perfor-

mance of agents using PIBT-PB with that of agents

using the baselines, PIBT and winPIBT. Env. 1 is a

model of a warehouse, which is o ften used for experi-

ments in some papers. Pickup and delivery nodes are

selected randomly when a task is generated. In this

type of environment, we cannot identify the clear dif-

ference in performance between PIBT and winPIBT;

however, we aim to demonstrate the super ior perfor-

mance of our method. Env. 2 is a simple model de-

signed to illustrate the differences between PI BT and

PIBT-PB. Although only four corners are one-way

nodes, PIBT-PB outperforms PIBT. In Env. 2, green

and blue nodes denote pickup and delivery points, re-

spectively. Thus, the agent ﬁrst moved to a green

node and then proceeded to a blue node, located ei-

ther on the same or opposite side. Env. 3 (Fig. 1) sim-

ulates a construction site where agents pick up ma-

terials at one of the storage areas ( green nodes) and

deliver them to a node in the installation areas (blue

nodes). Their pickup and delivery locations are un-

balanced, thus congestion ma y occur near pickup and

delivery areas if the number of agents is large.

The initial positions of the agents were randomly

assigned to each environment, and 1000 (600) MAPD

tasks were allocated to Envs. 1 and 3 (E nv. 2). Each

task τ comprises pickup and delivery points τ =

, g

). The assigned agent sets g

and then g

its destinations. Upon task completion, another task

is assigned from Γ, which is a set of MAPD tasks. The

efﬁciency metric for MAPD is the makespan, which

measures the n umber of steps from the start of the

experiment to the completion of all task s; a lower

Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems

535

makespan indicates a better performance. The win-

dow size of winPIBT W (> 1) was optimized for the

shortest makespan and varied according to the envi-

ronment. Experiments were conducted by varying the

number of agents N to assess the performance impact

on PIBT-PB and the baselines. The results p resented

below are the averages of 20 independent trials.

5.2 Experimental Results

The results are presented in Fig. 9. Across all three

environments, the ﬁndings de monstrate that PIBT-PB

outperformed the oth e r methods, achieving shorter

makespan time steps regar dless of agent quantity. As

shown in Fig ure 9a, PIBT and winPIBT displayed

nearly identical performances in Env. 1, where the

agent numbers ranged from 20 to 100, and the win-

PIBT window sizes were set at W = 5 and 10. The

value W = 10 was larger in other environment. Thus,

agents could ﬁnd possible conﬂicts mu c h earlier than

those o f agents using PIBT. However, they restricted

other agents’ movement. Because the advantages

and disadvantages cancel each other out, the values

of makespans for winPIBT were almost identical to

those of PIBT. In contrast, PIBT-BP did not book any

crossing nodes as provisional no des in advance and

needlessly restricted other agents’ movements, wh ile

it could detect possible conﬂicts effectively. For ex-

ample, the makespan of PIBT-PB (1040.8) was ap-

proxim ately 14.3% lower than that of PIBT (1214.7)

when N = 100.

The makespans in Env. 2 are p lotted in Fig. 9b ,

in which we varied the number of agents from 10 to

50. With 48 nodes in the moving area (white nodes),

N = 50 means the environment is extremely crowded.

Figure 9b shows that the agents using PIBT-PB ex-

hibited shorter makespans than those using the base-

lines, even in such a simple environment. Because the

pickup and delivery nodes could be set to four co rner

nodes, agents could reach them with the ﬁnal step by

provisional booking. Even in a grid-like environment,

such as Env. 2, age nts using winPIBT restricted other

agents by securing some nodes o n the opposite side,

making the pe rformance of winPIBT optimal when

W = 3, a relatively small window size.

In Env. 3, PIBT-PB demonstrated superior perfor-

mance, whereas PIBT a nd winPIBT (W = 5) showed

nearly identical re sults, with winPIBT having a slight

edge. In this setting , winPIBT agents f requently

blocked other agents, causing them to wa it at in te r-

section points in the cen tral areas. The beneﬁts and

drawbacks of winPIBT compared to PIBT balance

each other, similar to Env. 1, resulting in a compa-

rable efﬁciency. However, PIBT-PB agents success-

fully avoided unnecessary blockag e s at intersections

where agents could potentially hinder each other ’s

movements, thus enhancing the efﬁciency of detect-

ing head-on collisions.

5.3 Discussion

Our experiments revealed that PIBT-BP impr oved

MAPD problem-solving efﬁcacy compared with tra-

ditional pr iority-based method s, including PIBT and

winPIBT. WinPIBT agents increased path-plann ing

efﬁciency by using time windows for e a rly collision

detection, unlike PIBT. However, winPIBT also im-

posed strict movement constraints on other agents, of-

ten hindering them at intersections by securing addi-

tional nodes. To mitigate th is, we avoided securing

extra nodes at intersections, where movement might

be restricted. Critical scenario s to prevent potential

collisions in narrow, one -way paths wher e agents can-

not pass ea ch other and must backtrack and waste

time. Therefore, PIBT-PB agents books as many pro-

visional nodes as possible on one-way routes to detect

future collisions. PIBT-BP is expected to be particu-

larly effective in maze-like env ironments, with lim-

ited or lengthy alternative routes and warehouse set-

tings.

6 CONCLUSION

We proposed PIBT with provisional booking (PIBT-

BP), which extends PIBT to detect possible collisions

earlier. First, agents using PIBT- BP secure the next

node similar to PIBT, and then attempt to provision-

ally book additional one-way nodes in advance, as

booking a crossing node often blocks o ther ag ents’

actions. Subsequently, if no c ollision is detected, the

agent p roceeds through the provisional nod es step-by-

step. If a higher-priority ag e nt a pproaches, it releases

the booked nodes and generates an alternative path by

treating the approaching agent as an additional obsta-

cle. Nonetheless, agents always reach their destina-

tions, because the ﬁrst step follows a method that is

essentially identical to PIBT.

One of the future works is to improve the dynamic

prioritization used for co ntrolling agents. In this

study, the prioritization depends on the time elapsed

since the destination was updated a nd restarted.

Therefore, an agent with a nearby destination may

have to take a detour owing to its priority level. To

avoid this situation, we propose utilizing informatio n,

such as the distance between the agent and the des-

tination, as well as the structural informa tion of one-

way paths in the environment. We also need to adapt

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

536

(a) Env. 1 (b) Env. 2

Figure 9: Experimental Results of Makespan.

our method to accommodate ﬂexible environments,

such as construction sites. These environments u n-

dergo changes where paths ar e generated or elimi-

nated owing to the con struction of new walls, doors,

polls, and changes in material storage areas.

REFERENCES

Ankit, K., Tony, L. A., Jana, S., and Ghose, D. (2022).

Multi-agent Cooperative Framework for A utonomous

Wall Construction. In Saraswat, M., Sharma, H., Bal-

achandran, K., Ki m, J. H., and Bansal, J. C., editors,

Congress on Intelli gent Systems, pages 877–894, Si n-

gapore. Springer Nature Singapore.

Farinelli, A., Contini, A., and Zorzi, D. (2020). Decentral-

ized task assignment for multi-item pickup and deliv-

ery in logistic scenarios. In AAMAS.

Goldenberg, M., Felner, A., Stern, R ., Sharon, G., Sturte-

vant, N., Holte, R., and Schaeffer, J. (2014). Enhanced

partial expansion A*. J.Artif.Intell.Res, 50:141–187.

Jennings, J., Whelan, G., and Evans, W. (1997). Coopera-

tive search and rescue wi th a team of mobile robots.

In 1997 8th International Conference on Advanced

Robotics. Proceedings. ICAR’97, pages 193–200.

Li, J., Tinka, A., Kiesel, S., Durham, J. W., Kumar, T. K. S.,

and Koenig, S. (2020). Lifelong Multi-Agent Path

Finding in Large-Scale Warehouses. In AAMAS.

Luna, R. and Bekris, K. E. (2011). Push and swap: Fast co-

operative path-ﬁnding with completeness guarantees.

In IJCAI, pages 294–300.

Ma, H., Honig, W., Kumar, T., Ayanian, N., and Koenig, S.

(2019). Lifelong path planning with kinematic con-

straints for multi-agent pickup and delivery. In AAAI.

Ma, H., Li, J., Kumar, T. S., and Koenig, S. (2017). Lifelong

Multi-Agent Path Finding for Online Pickup and De-

livery Tasks. In Proc. of the 16th Conf. on Autonomous

Agents and MultiAgent Systems, AAMAS ’17, pages

837–845.

Miyashita, Y., Yamauchi, T., and Sugawara, T. (2023). Dis-

tributed Planning with Asynchronous Execution with

Local Navigation for Multi-agent Pickup and Delivery

Problem. In AAMAS.

Okumura, K., Machida, M., D´efago, X., and Tamura, Y.

(2019a). Pr iority Inheritance with Backtracking for

Iterative Multi-agent Path Finding. In Proc. of the

Twenty-Eighth International Joint Conf. on A rtiﬁcial

Intelligence, IJCAI-19.

Okumura, K., Tamura, Y., and D´efago, X. (2019b). win-

PIBT: Expanded Prioritized Algorithm for Iterative

Multi-agent Path Finding. CoRR, abs/1905.10149.

Sharon, G. , Stern, R., Felner, A., and Sturtevant, N. R.

(2015). Conﬂict-based search for optimal multi-agent

pathﬁnding. Artif.Intel, 219:40–66.

Shimada, D., Miyashita, Y., and Sugawara, T. (2025). Path

Finding with Flexible Provisional Booking in Multi-

agent Pickup and Delivery Problems. In Arisaka, R.,

Sanchez-Anguix, V., Stein, S., Aydo˘gan, R., van der

Torre, L., and I to, T., editors, PRIMA 2024: Princi-

ples and Practice of Multi-Agent Systems, pages 74–

80, Cham. Springer Nature Switzerland.

Silver, D. (2005). Cooperative pathﬁnding. In Proc. of the

AAAI Conf. on Artiﬁcial Intelligence and Interactive

Digital Entertainment, volume 1, pages 117–122.

Standley, T. S. (2010). Finding optimal solutions to cooper-

ative pathﬁnding problems. In AAAI, volume 1, pages

28–29.

Wagner, G. and Choset, H. (2015). Subdimensional expan-

sion for multirobot path planning. Artif.Intell, 219:1–

24.

Wang, Z. (2023). Research on optimal route planning and

delivery strategy of multiple robots using HCA algo-

rithm in a restaurant. In 2023 IEEE 2nd Int. Conf.

on Electrical Engineering, Big Data and Algorithms

(EEBDA), pages 1740–1746.

Yamauchi, T., Miyashita, Y., and Sugawara, T. (2022).

Standby-Based Deadlock Avoidance Method for

Multi-Agent Pickup and Delivery Tasks. In AAMAS.

Multi-Agent Path Finding Using Provisionally Booking Nodes for Pickup and Delivery Problems

537