Improvement of PIBT-based Solution Method for Lifelong MAPD

Problems to Extend Applicable Graphs

Toshihiro Matsui

Nagoya Institute of Technology, Gokiso-cho Showa-ku Nagoya Aichi 466-8555, Japan

Keywords:

Multiagent Pathﬁnding Problem, Lifelong Multiagent Pickup-and-Delivery Problem, PIBT, Swap Operation.

Abstract:

We address an extension of priority inheritance with backtracking (PIBT) for lifelong multiagent pickup-and-

delivery (MAPD) problems that performs a swap operation integrated into the original algorithm to adapt

speciﬁc extended case problems. The multiagent pathﬁnding (MAPF) problem has been widely studied as a

basis for various practical multiagent systems. PIBT is a scalable and on-demand solution method for con-

tinuous MAPF problems, where each agent determines its next move in each time step by locally solving

agent-move collisions. Since it can be applied to limited cases such as biconnected graphs, several extensions

using additional techniques have been suggested. However, there are opportunities to extend the PIBT process

with several techniques that can be integrated into the solution process itself. As the ﬁrst step, we extend a so-

lution method based on PIBT for lifelong MAPD problems, fundamental continuous problems, by integrating

a speciﬁc swap task. We address detailed techniques, including additional management of priorities, subgoals,

and states of agents. We also experimentally evaluate the proposed approach with several problem settings.

1 INTRODUCTION

We address an extension of priority inheritance with

backtracking (PIBT) (Okumura et al., 2022; Oku-

mura et al., 2019) for lifelong multiagent pickup-and-

delivery (MAPD) problems that performs a swap op-

eration integrated into the original algorithm to adapt

speciﬁc extended case problems. The multiagent

pathﬁnding (MAPF) problem has been widely stud-

ied as a basis for various practical multiagent systems,

including robot navigation, autonomous carriers in

warehouses and construction sites, autonomous taxi-

ing of airplanes and video games (Ma et al., 2017).

This problem is a combinatorial optimization prob-

lem ﬁnding a set of agents’ paths, where all the agents

must move from their start locations to their goal lo-

cations without colliding with each other. The set of

paths should be minimized by optimization criteria.

Several types of solution methods for MAPF prob-

lems, including optimal and quasi-optimal methods,

have been developed. A major optimal approach is

based on variants of Conﬂict Based Search (Sharon

et al., 2015), which performs two layers of search.

There are several optimal and quasi-optimal extended

variations (Ma et al., 2019; Barer et al., 2014) that

https://orcid.org/0000-0001-8557-8167

address the mitigation of the relatively high computa-

tional cost of the optimal search method.

A different greedy approach individually ﬁnds and

reserves the single quasi-optimal path in a time-space

graph for each agent according to an order on all

the agents (Silver, 2005). There are also different

approaches, including push, swap, and rotate opera-

tions among agents (De Wilde et al., 2014; Luna and

Bekris, 2011), and general optimization methods.

The MAPF problem has been extended to the con-

tinuous MAPF problem, where each agent updates its

sequence of subgoals, and a MAPF method is repeat-

edly performed for the sequences. The lifelong multi-

agent MAPD problem is an important class of con-

tinuous MAPF problems, where each agent repeat-

edly performs pick-up and delivery tasks (Ma et al.,

2017). While a scalable quasi-optimal approach for

this problem is based on a theorem regarding the end-

points of agents’ paths (Ma et al., 2017), there are sev-

eral challenges to improving the performance of solu-

tion methods (Li et al., 2021; Yamauchi et al., 2022).

We focus on PIBT that is a solution method for

continuous MAPF problems, where each agent de-

termines its next move in each time step by locally

solving collisions of agents’ moves. PIBT performs

a management of priorities of agents and a dedicated

back-tracking method. Although it can be applied to

Matsui, T.

Improvement of PIBT-based Solution Method for Lifelong MAPD Problems to Extend Applicable Graphs.

DOI: 10.5220/0013243800003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 1, pages 123-134

ISBN: 978-989-758-737-5; ISSN: 2184-433X

123

limited cases such as biconnected graphs, the method

can work with narrow aisles and dense populations

of agents. There are several extensions of PIBT us-

ing additional techniques (Okumura et al., 2019), in-

cluding methods addressing more general cases of

graphs (Okumura et al., 2022; Okumura, 2023). How-

ever, these methods basically employ external exten-

sions where PIBT can be considered a module. There

are opportunities to extend the PIBT with techniques

that can be integrated into the solution process itself.

This consideration is important to understand the

detailed properties of the original solution method and

to uncover some informative insights to improve the

solution method or some heuristics. As the ﬁrst step,

we extend a solution method based on PIBT for life-

long MAPD problems, fundamental continuous prob-

lems, by integrating a speciﬁc swap task.

We add a high-level layer of tasks to PIBT to man-

age individual cooperation tasks of groups of agents.

Namely, we employ PIBT as a processing engine and

introduce a context of a swap task of an individual

group of agents. The tasks are independently con-

structed in a bottom-up manner, and their conﬂict sit-

uations are solved using their priority values. As the

ﬁrst study, we present the swap tasks of agents for a

class of problems that can be naturally extended from

that for the original PIBT. The approach to execute

such bottom-up tasks of agents on a fundamental so-

lution method as an engine is the major aim of this

study.

We address detailed techniques, including addi-

tional management of priorities, subgoals and states

of agents. We also experimentally evaluate the pro-

posed approach with several problem settings.

In the next section, we present the background of

our study, including multiagent pathﬁnding problems,

lifelong pickup-and-delivery problems, and the solu-

tion method PIBT. The details of our proposed ap-

proaches are described in Section 3. We ﬁrst consider

some important graph structures of maps and then

establish a set of operations regarding speciﬁc swap

tasks to address dead-end aisles. We experimentally

verify our approach in Section 4 and conclude in Sec-

tion 5.

2 BACKGROUND

We note that several segments in the following sub-

sections are based on the literature (Matsui, 2024b)

with the same background in part, although the aim

of this study is completely different from the previous

work.

. . . . . .

. O O O O .

. T O T . .

. O O O O .

. . . . . .

. O O O . .

. T O T . .

. O O O O .

. . . . . .

. T T T T .

. O O O O .

. T T T T .

. O O O O .

. T T T T .

Solvable with PIBT Unsolvable with PIBT

. T T T T .

. O O O O .

. T T T T .

. O O O O .

. T T T T .

. . . . . .

. O O O O .

. T O T . .

. O O O O .

. . . . . .

Intersections and aisles With a square

Pickup-and-delivery location

Obstacle

Intersection

. . . . . .

. O O O . .

. T O T . .

. O O O O .

. . . . . .

Square

Figure 1: Warehouse map and decomposed structures.

2.1 MAPF and Lifelong MAPD

The multiagent pathﬁnding (MAPF) problem is an

optimization problem for ﬁnding a set of paths of

multiple agents where there are no collisions between

the paths. A MAPF problem consists of a graph

G = (V, E) representing a two-dimensional map, a set

of agents A , and a set of pairs of vertices that repre-

sent the start and goal locations for individual agents.

All agents must move from their start locations to

their goal locations without colliding with each other,

and the set of agents’ paths, including stay/wait ac-

tions, should be minimized by optimization criteria.

There are two types of collision paths to be avoided;

two agents must not stay at the same location at the

same time (a vertex collision) and must not move on

the same edge at the same time from both ends of the

edge (a swapping collision). In a fundamental set-

ting, a graph representing a four-connected grid-like

map containing obstacles is employed, and time steps

are discrete. The continuous MAPF problem is an ex-

tended class of MAPF problems where each agent up-

dates its sequence of subgoals, and a solution method

for MAPF is repeatedly performed for the sequences.

The lifelong multiagent pickup-and-delivery

(MAPD) problem (Ma et al., 2017) is a speciﬁc

class of continuous MAPF problems, where multiple

pickup-and-delivery tasks are repeatedly allocated to

agents. Figure 1 shows examples of warehouse maps

containing pickup-and-delivery locations. The tasks

can be repeatedly generated in arbitrary time steps.

A set of currently generated tasks is denoted by T .

Task τ

∈ T has its pickup and delivery locations

), where s

∈ V . An agent who is allocated to

task τ

ﬁrst moves from its current location to pickup

location s

and then moves to delivery location g

to complete the task. The problem consists of task

allocation and continuous MAPF problems. At least

partially greedy approaches are commonly employed

to allocate tasks generated on demand, and MAPF

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

124

1 UNDECIDED← A(t) // agents list

2 OCCUPIED←

0 // vertices list

3 update priorities p

(t) for all agents a

4 while UNDECIDED̸=

0 do

5 a ← the agent with the highest priority in UNDECIDED

6 PIBT(a,⊥) // ⊥ denotes empty

7 end while

9 function PIBT(a

)

10 UNDECIDED←UNDECIDED\{a

}

11 C

← ({v|(v

(t),v) ∈ E} ∪{v

(t)})

12 \({v

(t)}∪OCCUPIED)

13 while C

̸=

0 do

14 v

∗

← arg max

v∈C

(v) // most preferred move

15 OCCUPIED←OCCUPIED∪{v

∗

}

16 if a

s.t. v

∗

= v

(t) ∧ a

∈UNDECIDED exists then

17 if PIBT(a

) is valid then

18 v

(t + 1) ← v

∗

19 return valid // move with push

20 else

21 C

← C

\OCCUPIED

22 end if

23 else

24 v

(t + 1) ← v

∗

25 return valid // move/stay without push

26 end if

27 end while

28 v

(t + 1) ← v

(t)

29 return invalid // stay by failing to move

30 end function

(t): location of agent a

at time step t

Figure 2: PIBT at time step t (Okumura et al., 2022).

solvers are applied to the pathﬁnding.

A fundamental approach is based on the well-

formed MAPD problems that take into account end-

point vertices, which can be pickup, delivery, or park-

ing locations of agents (

ap et al., 2015; Ma et al.,

2017). However, this requires extra aisle space in

maps and relatively large redundancy of parallelism

on task execution including agents’ movements.

We focus on a different type of solution method,

PIBT (Okumura et al., 2022), that can be applied to

narrow maps with dense populations of agents, al-

though this method is also a greedy approach with

several restrictions as mentioned below.

2.2 PIBT

PIBT is a scalable solution method for the (contin-

uous) MAPF problem (Okumura et al., 2022). The

method performs push operations among agents ac-

cording to the priority of the agents. In each time step,

each agent decides its next move/stay. When an agent

cannot push other agents on all vertices neighboring

its current location, a backtracking is performed to

ﬁnd other push chains.

In the pseudo code (Fig. 2), it is assumed that each

agent a

has its goal location, and the preference value

of location v based on the goal is represented by f

(v)

(line 14). The priority p

(t) of agent a

consists of

the elapsed time for the current goal and a small tie-

break value based on a

’s identiﬁer. Agent a having

the locally highest priority initiates a recursive push

process (line 6). Agent a

selects its most preferred

move from those remaining and pushes its neighbor-

ing agent to clear a vertex if necessary (lines 14-17).

The pushed agent a

tries to move to its neighboring

vertex and also pushes a

’s neighboring agent if nec-

essary. If all the agents pushed by agent a

can move

or there is no agent obstructing a

, a chain of moves is

determined (line 18). As a result, the locations of a set

of agents in a cycle might rotate. If one of the pushed

agents cannot move, backtracking is performed (line

29) so that its parent agent can try to move in a dif-

ferent direction. An agent that cannot move in this

process stays in its current location (line 28).

PIBT can solve problems represented by several

types of graphs, including biconnected ones, that al-

ways allow the rotation of agents’ locations. The

method can work with narrow aisles and dense popu-

lations of agents, even if all non-obstacle vertices are

occupied by agents. However, it easily sticks in the

case of maps with dead ends.

In the case of continuous problems, each agent has

its list of subgoals and continues to move to the ﬁrst

subgoal with increasing its priority. After reaching the

ﬁrst subgoal, the subgoal is removed from the list and

the priority of the agent is reset. For MAPD prob-

lems, we employ a baseline greedy task allocation

method in which each agent having no tasks selects

a task whose pickup location is nearest to its current

location.

3 SPECIFIC SWAP TASK

We improve PIBT for an extended case where un-

branched narrow aisles with single dead-ends (DE

aisles) are added to a basic map represented by a bi-

connected graph that can be well handled by the orig-

inal algorithm. This is the minimal extension to in-

troduce a speciﬁc case of swap operation (De Wilde

et al., 2014; Luna and Bekris, 2011) based on PIBT.

When an agent is blocking another agent in a DE

aisle, both agents can retreat from the aisle to swap

their locations. Speciﬁcally, in the case of PIBT, a

reasonable action is to perform the swap operation of

the agents on a biconnected component of a graph by

simply employing PIBT itself (Fig. 5). Although this

is intuitively simple, our aim is to clarify the details of

several important extensions in this kind of algorithm

for future study to address more general cases.

Improvement of PIBT-based Solution Method for Lifelong MAPD Problems to Extend Applicable Graphs

125

Swap task MAPD task Idle state

One

-push sequence

Move

dir. const./pref.

Retreat task

Subgorl

: r

Pickup

-delivery task

Subgoals: (s

, g

)

Stay task

Subgoal

: cur. loc.

PIBT for continuous MAPF

Figure 3: Tasks on PIBT.

3.1 Decomposition of Map Structure

We address the extended case where the undirected

graphs of maps consist of a biconnected component

and several parts of DE aisles. To concentrate on a set

of important operations naturally extending PIBT to

adapt to this case, we do not consider without cycles

and with isthmuses (De Wilde et al., 2014) cases, and

we will address such general cases that require sev-

eral additional techniques in future study. While we

employ graphs representing maps in a four-connected

grid world as common settings, our approach can be

extended for non-grid maps.

Except for obstacle vertices, we decompose the

parts of a graph into the following structures: 1) Aisle

including DE aisle, 2) Intersection vertex, including

end vertices of a square, connecting to aisles, and 3)

Other part of square (Fig. 1). Since PIBT works effec-

tively for parts with sufﬁcient space such as squares,

we distinguish the narrow parts from others and im-

prove the original algorithm by adding several opera-

tions that consider such narrow parts.

An aisle consists of vertices whose degrees are

one (DE aisle) or two. In the case with squares, cor-

ner vertices whose degrees are two are excluded. An

intersection vertex’s degree is greater than two. The

deﬁnition of square depends on the graphs. For a four-

connected grid map, a square is a cluster of minimum

cycles of the neighboring four vertices, although we

distinguish intersection vertices from them.

Here, we employ the following simple preprocess-

ing to extract the parts: 1) Vertices whose degrees

are one or two are marked as candidate vertices of

aisles. 2) Vertices whose degrees are greater than two

are marked as vertices of intersections. 3) Candidate

vertices of aisles are excluded as a part of a square if

they are contained in one of the minimum cycles. 4)

Decomposed parts and corresponding vertices except

that of squares are labeled with individual values to

refer to each other in later steps. The map structure

and the map data are shared by all the agents.

3.2 Integrating Speciﬁc Swap Operation

We introduce a set of operations for a speciﬁc swap

task into a version of PIBT that solves lifelong MAPD

problems. Since this baseline version has been inte-

grated with a task assignment process for pickup-and-

delivery tasks, we added another extension for a swap

Initiator Target Swept (Controller) (Interruption)

Swap task

(Ask) Initiator

Ini-restraint Retreat Swept Target Cancel, (one push)

Restraint Cancel, (one push)

Restraint Initiator Cancel

(Complete)

One push sequence for highest swap task only

(Ask) Target

One push Initiator

OP-Retreat Swept Target

Restraint

(OP-Complete) (To Initiator)

Figure 4: Sub-modes and sequence in swap task.

Init.

Tgt.

Swept

. . . . . .

. O O O O .

. T O T . .

. O O O O .

. . . . . .

. O O O O .

. T O T . .

. O O O O .

. . . . . .

. O O O O .

. T O T . .

. O O O O .

. . . . . .

. O O O O .

. T O T . .

. O O O O .

. . . . . .

t =0 t =1

t =2

t =3

Sub

mode

(t)

Init

Rstrnt.

)

Retreat

)

Swept

(t)

Init

Rstrnt.

)

Retreat

)

Restraint

(t)

2, 3, 4

Init

Rstrnt.

)

Restraint

)

Restraint

(t)

Completed

Swap tasks

Progress of a

. . . . . .

. O O O O .

. T O T . .

. O O O O .

. . . . . .

. O O O O .

. T O T . .

. O O O O .

. . . . . .

t =4

t =5

(t) > p

(t)

(t) > p

(t), p

(t)

(t) : originally a

's priority at every time step

(t) uniformly increases at every time step.

Figure 5: Swap task.

Table 1: Constraint/preference of move direction f

(v).

Ds P Baseline: move on the shortest path to the ﬁrst subgoal.

Dad P Avoid DE aisles without the ﬁrst subgoal.

Dadrc C Avoid resolving DE aisle in a restraint mode.

Dadrp P Avoid resolving DE asl. in a rstrnt. mod. for higher swp. tsks.

Dado C Avoid other DE aisles if the initiator of the top most swap task.

Dl C Move on the limited path in a one-push sequence.

Dap P Option: Avoid the asl. on the ﬁrst pusher’s path (Matsui, 2024b).

P/C: Preference/Constraint

Priority: Dl > Dado > Dadrp > Dadrc > Dad > Dap > Ds

Table 2: Completion/cancel of swap task.

Ce Cmpl. The initiator entered the resolving DE aisle.

Co Cancel The task is to be overwritten.

Cp Cancel A restraint agent was pushed into the resolving DE aisle.

Ch Cancel The initiator in one of other DE aisles became the highest.

Table 3: Acceptable number of agents.

Co, Ce Dadrc, Dad, Ds N

Cp, Co, Ce Dadrp, Dadrc, Dad, Ds N

Ch, Cp, Co, Ce Dl, Dado, Dadrp, Dadrc, Dad, Ds N

The num. of vertices in the biconnected component.

(The num. of non-obstacle vertices) − (the num. of

vertices in the longest pair of two DE aisles).

(The num. of non-obstacle vertices) − (the num. of

vertices in the longest DE aisle).

task assignment, as shown in Figs. 3-5 and Tbls. 1-3.

The extended pseudo codes are shown in Fig. 9 in an

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

126

appendix section. The life cycle of a swap task con-

sists of several sub-modes that basically representing

initiation, retreat, restraint and complete/cancel steps

(Fig. 4). The agents related to a swap task are cate-

gorized into three types: initiator, target, and swept

agents (Fig. 5). The control of a swap task basically

consists of the state transition of sub-modes (Fig. 4), a

priority inversion between the initiator and the target,

additional preferences/constraints on the evaluation of

agents’ moves f

(v) (Tbl. 1), and several cancel rules

to resolve conﬂicted tasks (Tbl. 2). Several possible

combinations of the rules affect the acceptable num-

ber of agents (Tbl. 3).

In the following, we describe several details of our

approach. We ﬁrst address the member of cooperative

swap tasks (Section 3.2.1). Then the basic ﬂow of

the task, including related contexts, is presented (Sec-

tions 3.2.2- 3.2.7). Finally, additional rules are intro-

duced to extend applicable cases of the basic method

(Sections 3.2.8 and 3.2.9).

3.2.1 Initiator, Target and Swept Agents

When agent a

, whose ﬁrst subgoal is in a DE aisle,

detects a possible deadlock situation, a swap task is

initiated by agent a

. This situation is detected in the

process of PIBT as agent a

cannot push its next agent

located in the DE aisle before arriving at a

’s ﬁrst sub-

goal. Although such a situation might be inexact due

to some perturbation of a system dependent on PIBT,

we accept it as a margin for a bottom-up approach.

An agent can be an initiator primarily in the fol-

lowing two cases. 1) An agent who is at an intersec-

tion and entering a DE aisle containing its ﬁrst sub-

goal. 2) An agent who is not being pushed and mov-

ing in a DE aisle containing its ﬁrst subgoal. TA) It

is possible to further restrict the former case with the

condition that the agent is not being pushed.

Target agent a

is in a push chain and asked to re-

treat from a DE aisle by initiator agent a

when target

agent a

’ is blocking the ﬁrst subgoal of a

. In addi-

tion, other agents between initiator agent a

and target

in a push chain are also marked as swept agents

that are dug by target agent a

(Fig. 5).

3.2.2 Initiation of Swap Task

To maintain the consistency of priority values among

agents, we allow each agent a

to initiate a swap task

only if agent a

has a priority value higher than the tar-

get, all swept agents, and all their initiator (controller

in general cases) agents if any. For a

itself, it must not

have a swap task initiated by an agent with a higher

priority value, while a

can overwrite its own swap

task. In addition, a

cannot initiate a swap task dur-

ing a speciﬁc critical section in a one push sequence

discussed in Section 3.2.9. If an agent cannot ﬁnd

the target and swept agents satisfying the conditions

above, the agent cannot initiate the swap task until its

possible turn.

The initiation operation differs partially for target

and swept agents. For a target agent a

, a new re-

treat task with a new subgoal r

is inserted. Basically,

the retreat task must be done before a

’s pickup-and-

delivery task if it has the task

. The new subgoal

is the intersection adjacent to the DE aisle. Moreover,

the priority values of the initiator and the target are ex-

changed so that the priority of the target is higher than

the initiator. We allocate a retreat task only to a target

agent to clarify the role of agents (Figs. 4 and 5).

3.2.3 Context of Swap Task

All member agents, including initiator a

, target and

swept ones, of a swap task that is initiated by agent

record 1) the identiﬁers of the initiator and target

agents and 2) a vertex of retreat intersection identi-

cal to the subgoal r

of the target agent’s retreat task.

One of initiator and target agents with higher prior-

ity is distinguished as 3) a controller of a swap task

that can push its other members. An initiator also

has 4) a set of identiﬁers for all members of its swap

task. With this information, the initiator can notify its

members of the completion/cancel of its swap task.

Other members can ask their initiator to cancel their

task, if necessary (Fig. 4). Each agent can be a mem-

ber of at most one swap task. Therefore, the initia-

tion and completion/cancel of each swap tasks must

be atomic. Note that an initiated task can be over-

written by another initiator with a higher priority or

the original initiator itself. In this case, the former

task must be canceled to remove its shared informa-

tion before the initiation of the new task, and that must

also be atomic. Although this is slightly complicated,

such procedures can be composed without contradic-

tion. We implemented this process with procedures of

individual agents that are called by related agents to

update their status at appropriate timings in the main

process of PIBT (in part of task initiation/termination

lines in Fig. 9).

3.2.4 Retreat, Restraint and Completion Phases

After the initiation, target and swept agents imme-

diately change their sub-modes to retreat and swept

modes, while the initiator changes to a speciﬁc re-

straint mode (Fig. 4, and t = 0 in Fig. 5). These

We slightly optimized this so that a

’s subgoal is pro-

cessed at ﬁrst if a

just locates at its subgoal (Lines 48, and

49 in Fig. 9).

Improvement of PIBT-based Solution Method for Lifelong MAPD Problems to Extend Applicable Graphs

127

additional operations are performed in the process of

PIBT (Fig. 9). The role exchange among initiator and

target agents depends on a priority management but

we separately describe its details in the next section.

As mentioned above, we allow the agents to ini-

tiate their possible swap tasks in time steps arbitrar-

ily, and an existing swap task might be overwritten.

Therefore, a swap task might be discarded without

completion, although such situations will converge

due to the consistent priority values among agents.

When a target agent arrives at the subgoal r

of its

retreat task, it completes the task and the priority val-

ues of the target and the initiator are exchanged. We

note that the priority value of the retreating target in-

creases at each time step in the manner of PIBT, while

the priority is not reset after the retreat task. Then, the

corresponding initiator agent recovers its dominance

at least over its member agents (t = 2 in Fig. 5).

In addition, when each of the target and swept

agents arrive at the retreat intersection identical to the

subgoal r

of the target agent’s retreat task, each agent

changes to the restrained mode. The agents are then

inhibited from reentering the DE aisle from which

they have retreated. During this period, the corre-

sponding initiator agent can push the target and swept

agents except for into the DE aisle of its ﬁrst subgoal

in the manner of PIBT.

When an initiator agent enters the DE aisle with

its ﬁrst subgoal, the initiator notiﬁes its members of

the completion of the swap task (t = 5, a

in Fig. 5).

Then, each corresponding target and swept agent exits

from the restrained mode and discards the swap task

asked by the initiator. If other non-member agents

enter a DE aisle during a swap task due to parallel

moves of agents, a new swap task will drive the agents

away.

3.2.5 Priority Management for Swap

As mentioned above, we allow each agent a

to initiate

a swap task only if agent a

has a priority value higher

than the target, all swept agents, and all their initia-

tor (controller) agents if any. For a

itself, it must not

have a swap task initiated by an agent with a higher

priority value, while a

can overwrite its own swap

task. In addition, a

cannot initiate a swap task dur-

ing a speciﬁc critical section in a one push sequence

shown in Section 3.2.9. We permit agents to repeat-

edly ask to swap in arbitrary time steps if necessary.

Before an agent initiates a new swap task, its old swap

task is canceled if one exists.

As a result of the initiation of a swap task, the tar-

get agent to retreat must have a priority higher than

its initiator agent. The operation must also not affect

other agents. For this priority management, we em-

ploy a priority inversion technique between the initia-

tor and the target of a swap task in this study. Al-

though this is an intuitive idea, we found that the

priority inversion raises several complicated issues in

handling agents’ information

. Since the controller

agent with the highest priority in a swap task switches,

we must always carefully identify the controller agent

to evaluate the exact priority value of a swap task. The

inversion must also be applied in all cases of cancel-

ing swap tasks. The inversion is immediately shared

by all members to be decided in the same push chain

in initiation cases, but can affect decided/undecided

agents in other cases. This requires an additional mu-

tex in the one push sequence shown in Section 3.2.9.

We also note again that the priority value of a target

agent increases during its retreat task, while the prior-

ity is not reset at the end of the task so that the priority

value is returned to its original owner (an initiator).

The same applies to another priority.

3.2.6 Subgoal to Retreat

For target agent a

of a swap task, a retreat task with

a subgoal vertex r

is inserted as a

’s ﬁrst task. The

subgoal vertex r

to retreat is the intersection adjacent

to a DE aisle from which a

is retreating. We pro-

hibit retreating agents, including swept agents, from

allocating their new pickup-and-delivery tasks if they

do not have pickup-and-deliver tasks. Therefore, the

retreat task is always prior to the pickup-and-delivery

tasks of the target and swept agents.

Even though the target agent’s ﬁrst subgoal of its

pickup-and-delivery task is outside of its current aisle,

we always insert a new retreat task because it re-

lates several other controls of the swap/retreat task.

However, if a target agent is newly asked by another

agent with a higher priority value to swap, the current

swap/retreat task is canceled by asking its initiator be-

fore it is overwritten by that of the new swap task.

3.2.7 Limitation of Reentering DE Aisle

After the target and swept agents of a swap task move

to the intersection adjacent to the corresponding DE

aisle, they must not reenter the DE aisle. The agents

change their sub-modes to the restraint mode to in-

hibit such reentering moves ((Figs. 4 and 5). Al-

though it is possible to simply conﬁne all restrained

agents, inside of the biconnected component of a

graph, this only well works with the number of agents

up to the number of vertices within the biconnected

component N

(Tbl. 3). Instead, it is reasonable to in-

hibit only reentering a DE aisle related to the current

In an appendix section, we mention another solution

depending on monotonically increasing priority values.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

128

(ii) Necessity of Dado

. . . . . O

. O O O . T

. O T . . O

. O O O . O

. . . . . O

t =0

. . . . . O

. O O O . T

. O T . . O

. O O O . O

. . . . . O

t =1

Init.

Tgt.

Swap task

. . . . . O

. O O O . T

. O T . . O

. O O O . O

. . . . . O

t =2

Init.

Tgt.

Swap tasks

(t) < p

(t)

. . . . . .

. O O O O .

. T O T . .

. O O O O .

. . . . . .

t =0

. . . . . .

. O O O O .

. T O T . .

. O O O O .

. . . . . .

t =4

(i) Necessity of Dadrp and Cp

Figure 6: Necessity of extended rules.

swap task. Here, we control the movement direction

of agents. Such restriction can be represented by a

preference value (Dad) for each movement direction

or a hard constraint to exclude such a move (Dadrc)

shown in Tbl. 1.

In general, an agent at an intersection should not

enter any DE aisle that does not contain its ﬁrst sub-

goal regardless of its mode. This is represented by the

preference value of the movement direction as a ba-

sic extension (Dad), and this preference value must be

evaluated prior to the original values of f

(v) and must

not be evaluated prior to other extended constraints

and preference values. Most importantly, in the case

of restrained agents, the choice of inhibited reentering

is eliminated from their movement direction (Dadrc).

The restrained mode of a target/swept agent is held

until the completion/cancel of the swap task.

3.2.8 More Extended Rules

However, the set of rules above well works with up

to N

agents (Tbl. 3). With the number of agents

over N

, the parallel moves of agents due to PIBT can

cause a dead-lock situation. In the case of t = 4 shown

in Fig. 6 (i), target agent a

tries to retreat, and another

target a

having a lower priority value blocks a

avoiding a

’s own inhibited DE aisle. To resolve this

situation, we modify the rule as follows: If restrained

agent a

is pushed at an intersection and the ﬁrst agent

in the push chain has a priority value higher than a

’s

controller agent, the limitation of reentering a

’s in-

hibited DE aisle is considered by a preference value

(Dadrp) rather than a hard constraint. As the result,

is pushed into its inhibited DE aisle and asks to is

initiator to discard the swap task (Cp). Since we allow

parallel execution of swap tasks, this rule is necessary

to discard a lower priority task in a race condition.

Above rules are summarized in Tbls. 1-3.

By adding rules of Dadrp and Cp, the solution pro-

cess works with up to N

agents (Tbl. 3). This limi-

tation assures that a pair of an initiator and a target

of the topmost swap task can stay in the biconnected

Init.

Tgt.

Swept

, a

Sub-mode

(t)

0, 1, 2

Retreat

(t), p

(t) < p

)

3 a

Retreat/Ask

4 a

One push

(t), p

(t) < p

)

Retreat

5 a

Retreat

(t), p

(t) < p

)

6 a

Restraint

(t), p

(t) < p

)

8 a

To be cmplt.

Swap task

Progress

この例ではa1が逆進入を阻止する

a0が押し込まれても，a2が空けたところに押し込める．

最短経路は細い袋小路と距離マップの工夫で整合．袋小路から押さない一般化では押された分の更新が必要．

T . . . .

O O O . .

T . . . .

t =0

T . . . .

O O O . .

T . . . .

t =1

T . . . .

O O O . .

T . . . .

t =2 ,3

T . . . .

O O O . .

T . . . .

t =4

T . . . .

O O O . .

T . . . .

t =5

T . . . .

O O O . .

T . . . .

t =6

T . . . .

O O O . .

T . . . .

t =8

T . . . .

O O O . .

T . . . .

t =0

T . . . .

O O O . .

T . . . .

t =4

T . . . .

O O O . .

T . . . .

t =5

T . . . .

O O O . .

T . . . .

t =2,3

T . . . .

O O O . .

T . . . .

t =7

T . . . .

O O O . .

T . . . .

t =6

T . . . .

O O O . .

T . . . .

t =8

Figure 7: One push sequence.

component. With the number of agents over N

, the

initiator of the topmost swap task can be pushed into

a DE aisle. Under this situation, if one of member

agents blocks its inhibited DE aisle and there is no

room for the initiator to back to the biconnected com-

ponent, the system sticks (Fig. 6 (ii)). We can add a

rule so that the initiator of the topmost swap task al-

ways avoids DE aisles except for that of its subgoal

(Dado). However, two issues still remain. First, an

initiator in a DE aisle might be promoted to that of

the topmost swap task. For this case, we force the ini-

tiator to cancel its swap task and to retry from the cur-

rent situation (Ch), and the topmost agent eventually

completes its task. The second issue shown below is

a self-lock situation in the topmost swap task.

3.2.9 One Push Sequence

We introduce the ﬁnal extension that is a special mode

in a swap task with the highest priority. The rule of

Dado prevents the initiator of the topmost swap task

from entering other DE aisles during the retreat task

phase in its swap task. Instead of that, there can be

a type of deadlock situations in the case of the num-

ber of agents greater than N

. This situation is always

identical where the initiator of the topmost swap task

blocks up to two DE aisles containing at least one un-

occupied vertex by staying an intersection connect-

ing to the DE aisles, and other DE aisles except for

the resolving one have been already occupied. As the

result, the corresponding target agent of the topmost

swap task cannot push and sticks in a DE aisle (T = 3

in Fig. 7).

To solve this problem, we introduce a special one

push sequence where the sticking target agent asks to

its initiator to one push to retreat from the blocked in-

tersection (Fig. 4 and the case of T = 3-5 in Fig. 7).

Since the initiator is in the biconnected component,

Improvement of PIBT-based Solution Method for Lifelong MAPD Problems to Extend Applicable Graphs

129

it can always move. After that, the target can push a

set of agents into an unoccupied DE aisle. Here, the

priority inversion between the initiator and the target

agent is applied twice to exchange the control priv-

ilege of their swap task. We note that this priority

inversion can be performed between an initiator de-

ciding its one push action and a target that does not

immediately decide its next action

. Therefore, the

chances where a non-member agent inverts the one

push must be inhibited with a mutex to protect this

critical section. We simply force remaining agents to

stay in their current location at this time step so that

the target cannot be interrupted. In the next time step,

the target has the highest priority and can correctly

push before others’ actions.

In addition, we introduce a special retreat mode

following the one push. In the push chain of the

PIBT process, each agent basically moves according

to its preference value f

(v), and it might invert the

one push action

. To avoid this situation, we force

the agents pushed by the target to move on the short-

est path from the current target’s location to the in-

tersection that has been released by the initiator (Dl)

(T = 3-5 in Fig. 7). Note that this restriction also af-

fects the initiator’s one push action by inhibiting its

move to an inverted direction. More importantly, the

target agent waiting for one push does not move. With

this rule, the target successfully completes its retreat

mode, and the solution process well works with the

number of agents up to N

(Tbl. 3) that is the theoret-

ical limit.

3.3 Correctness

We brieﬂy sketch the correctness of our method for

appropriate settings.

Proposition 1. A swap task that is initiated by an

agent with the highest priority always completes.

Proof. All the tasks except for one with the highest

priority can be canceled when they conﬂict with an-

other task with a higher priority value. The mutex for

the special critical section at the end of one push se-

quence protects the role exchange between an initia-

tor and a target from an interruption by non-members’

pushes. Therefore, the swap task with the highest pri-

ority from its initiation is always completed.

Proposition 2. All agents have chances to be the one

with the highest priority.

Here, we do not prefer to reorder the agents in a queue

to be processed by the PIBT procedure in a single time step.

This only causes redundant moves of agents (but not

preferred), since at least one agent is pushed into an unoc-

cupied aisle.

Proof. A swap task with a temporal priority inversion

is performed under the priority of its original initiator

agent, and the initiator’s priority value increases ac-

cording to the manner of PIBT. Therefore, a priority

value of each agent still monotonically increases un-

til the agent reaches its ﬁrst subgoal of a pickup-and-

delivery or stay task. All swap tasks are eventually

completed/canceled without resetting priority values,

and the highest one is always completed. Therefore,

all tasks, including swap tasks, eventually complete.

An agent that completes one of other tasks resets its

priority. Therefore, all agents have chances to act, and

all allocated tasks eventually complete.

We also note that the presented rules are com-

posed step by step in a lazy manner to ﬁnd issues to

be addressed, and there are other solutions and oppor-

tunities to reduce some redundancy. Regarding the

completeness, at least there can be dead-lock situa-

tions if a system is incorrectly conﬁgured with an in-

appropriate number of agents. The time complexity

of the additional part in the PIBT process relates the

interaction/maintenance among agents’ states and that

is almost linear for the number of agents.

For the acceptable number of agents shown in Ta-

ble 3, the following intuitive proposition exists.

Proposition 3. In a map where DE aisles are added

to a basic map represented by a biconnected graph,

any swap tasks can be done in an appropriate se-

quence if the number of agents is not greater than N

where N

= (the number of non-obstacle vertices) −

(the number of vertices in the longest DE aisle).

Proof. If a group of agents performing a single swap

task can empty its resolving DE aisle, and if its ini-

tiator agent remains in a biconnected component of

the graph of map, the initiator agent can rotate the

agents in the biconnected component to move to the

DE aisle. For this reason, the number of unoccupied

vertices must not be less than that of the longest DE

aisle.

Therefore, the veriﬁcation with N

agents is a goal

in this study.

4 EVALUATION

4.1 Settings

We experimentally veriﬁed several details of our ex-

tended techniques, since we currently concentrate on

an extension of PIBT for a speciﬁc case of life-

long MAPD problems. While there are several re-

lated scalable complete solution methods, including

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

130

. . . . . . . . . . . . .

. O O O O O . O O O O O .

. O T . . O . O . . T O .

. O O O . . . . . O O O .

. . . O O O . O O O . . .

. O . . T O . O T . . O .

. O O O O O . O O O O O .

. . . . . . . . . . . . .

4D-5L-1G-2

. . . . . . . . . . . . .

. O O O O O . O O O O O .

. O T T T O . O T T T O .

. O O O . . . O O O . . .

. . . O O O . . . O O O .

. O T T T O . O T T T O .

. O O O O O . O O O O O .

. . . . . . . . . . . . .

4D-5L-3G

. . . . . . . . . . . . . . . . . . . . . . . . .

. O O O O O . O O O O O . O O O O O . O O O O O .