A Multitier Approach for Dynamic and Partially Observable Multiagent

Path-Finding

Anıl Do

gru

1 a

, Amin Deldari Alamdari

1 b

, Duru Balpınarlı

2 c

and Reyhan Aydo

gan

1,3,4 d

Department of Computer Science, Özye˘gin University,

Istanbul, Turkey

Department of Industrial Engineering, Özye˘gin University,

Istanbul, Turkey

Department of Artiﬁcial Intelligence and Data Engineering, Özye˘gin University,

Istanbul, Turkey

Delft University of Technology, Delft, The Netherlands

Keywords:

Multiagent Path-Finding, Uncertainty, Ant Colony Optimization, Consensus.

Abstract:

This paper introduces a novel Dynamic and Partially Observable Multiagent Path-Finding (DPO-MAPF) prob-

lem and presents a multitier solution approach accordingly. Unlike traditional MAPF problems with static

obstacles, DPO-MAPF involves dynamically moving obstacles that are partially observable and exhibit un-

predictable behavior. Our multitier solution approach combines centralized planning with decentralized exe-

cution. In the ﬁrst tier, we apply state-of-the-art centralized and ofﬂine path planning techniques to navigate

around static, known obstacles (e.g., walls, buildings, mountains). In the second tier, we propose a decentral-

ized and online conﬂict resolution mechanism to handle the uncertainties introduced by partially observable

and dynamically moving obstacles (e.g., humans, vehicles, animals, and so on). This resolution employs a

metaheuristic-based revision process guided by a consensus protocol to ensure fair and efﬁcient path alloca-

tion among agents. Extensive simulations validate the proposed framework, demonstrating its effectiveness in

ﬁnding valid solutions while ensuring fairness and adaptability in dynamic and uncertain environments.

1 INTRODUCTION

As deﬁned by (Stern et al., 2021), Multiagent Path-

Finding (MAPF) involves the allocation of conﬂict-

free paths to rational agents, facilitating their travel

from initial to target locations and is crucial for tack-

ling various real-world challenges, including logistics

and robot rescue operations. Due to its practical appli-

cations, it has attracted signiﬁcant attention from re-

searchers, leading to the exploration of various prob-

lem variants over time. While some researchers fo-

cus on the MAPF with ﬁxed (i.e., static) obstacles for

the agents to avoid along their paths (Sharon et al.,

2015), others have introduced uncertainty by incorpo-

rating partial observability where agents lack all rel-

evant information regarding the presence/location of

those static obstacles (Shofer et al., 2023).

Besides, uncertainty in real-world scenarios arises

from the stochastic or unpredictable behaviors of

https://orcid.org/0000-0002-9951-8912

https://orcid.org/0000-0002-3124-7677

https://orcid.org/0000-0002-5349-0563

https://orcid.org/0000-0002-5260-9999

other entities in the environment. For instance, birds

in a multi-drone path-ﬁnding problem contribute to

uncertainty since agents may not be able to predict

their presence or movement patterns, nor can they

communicate with them. Similarly, the movement

of pedestrians and animals or sudden blockage on

the road, such as car accidents or building crashes,

pose dynamic and unpredictable challenges for au-

tonomous driving. These entities and events do not

only create uncertainty but also result in dynamically

unavailable regions in the environment, acting as ob-

stacles that can potentially lead to conﬂict. To the

best of our knowledge, none have comprehensively

addressed the MAPF problem involving partially ob-

servable and dynamically moving obstacles that also

exhibit unpredictable behaviors.

In this study, we extend the classical MAPF prob-

lem by introducing partially observable and dynamic

obstacles (e.g., humans, vehicles, animals, and non-

rational entities) that move unpredictably over time,

while agents have limited vision to detect them. In our

formulation, agents cannot communicate with these

obstacles at any point and have no prior knowledge of

their existence, positions, or velocities until they enter

562

gru, A., Alamdari, A. D., Balpınarlı, D. and Aydo

gan, R.

A Multitier Approach for Dynamic and Partially Observable Multiagent Path-Finding.

DOI: 10.5220/0013159800003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 3, pages 562-573

ISBN: 978-989-758-737-5; ISSN: 2184-433X

their ﬁeld of view (i.e., partial observability). Conse-

quently, our formulation introduces the challenge of

navigating through both static and dynamic obstacles

while coping with limited knowledge about environ-

ment. We refer to this problem as the Dynamic and

Partially Observable Multiagent Path-Finding (DPO-

MAPF) problem.

Regarding the potential solution approaches, two

categories emerge: centralized and decentralized.

Centralized approaches employ a single authority that

uses all available knowledge to optimize paths for all

agents, which can be effective for global optimiza-

tion but faces scalability challenges as the number of

agents increases. Decentralized approaches, on the

other hand, grant autonomy to individual agents, of-

fering scalability beneﬁts but may struggle to ﬁnd

optimal solutions in complex environments. Hence,

we propose a multitier strategy combining both ap-

proaches to leverage their strengths for effectively

tackling the DPO-MAPF problem.

In both tiers, the primary objective remains guid-

ing agents from their initial locations to their goal

destinations while minimizing total travel time and

avoiding collisions. In the ﬁrst tier, we employ a cen-

tralized and ofﬂine pathﬁnding strategy, particularly

in regions where static obstacles (e.g., walls, moun-

tains, buildings) are pre-known and globally observ-

able. This tier can integrate well-known centralized

algorithms, such as (Sharon et al., 2015; Li et al.,

2021b; Okumura et al., 2022; Lam et al., 2022).

To address the uncertainty introduced by DPO-

MAPF problem, the second tier applies a decentral-

ized and online conﬂict resolution approach, which

includes a revising strategy along with a consensus

protocol. The second tier primarily aims for decen-

tralized execution of the centralized plan in a DPO-

MAPF environment. If any conﬂicts arising from the

uncertainty are detected in the current plan, the revis-

ing strategy employing a metaheuristic method (i.e.,

Ant Colony Optimization) resolves them. Further-

more, a consensus protocol inspired by (Eran et al.,

2021) manages conﬂict resolution among agents by

prioritizing them based on speciﬁc criteria, such as

urgency in reaching their goals, objective functions,

fairness among them, or speciﬁc task requirements.

Our proposed strategy encourages collaboration,

information sharing, fairness, and self-organization

among agents in dynamic and uncertain environ-

ments. Consequently, this study seeks to develop

adaptive and responsive path-ﬁnding techniques that

incorporate both ofﬂine and online decision-making

processes, effectively addressing the challenges posed

by static and dynamic obstacles and enhancing multi-

agent coordination in real-world scenarios.

The rest of the paper is organized as follows: Sec-

tion 2 reviews related work, and Section 3 deﬁnes the

problem. Section 4 presents the mathematical formal-

ization, and Section 5 outlines the proposed multitier

approach. Section 6 introduces proposed decentral-

ized and online conﬂict resolution. Section 7 details

the experimental setup and the performance evalua-

tion results. Finally, Section 8 discusses the main con-

tributions and future work directions.

2 RELATED WORK

The MAPF problem has been widely studied, gen-

erally categorized into centralized and decentralized

approaches. Centralized approaches utilize compre-

hensive knowledge and coordination, employing op-

timal solvers that consider all agents’ paths for efﬁ-

cient conﬂict resolution and optimization. The A*

algorithm is foundational in MAPF solvers, known

for its efﬁciency in ﬁnding shortest paths (Ryan,

2008; Standley, 2010). Building on A*, (Sharon

et al., 2015) introduce Conﬂict-Based Search (CBS),

which coordinates multiple agents and is widely used

in MAPF. CBS operates as a two-level search al-

gorithm, resolving conﬂicts by branching into sub-

problems and replanning paths when needed. Sub-

sequent enhancements like Meta-Agent CBS (MA-

CBS) improve performance in conﬂict-heavy scenar-

ios by coupling agents into meta-agents based on con-

ﬂict frequency (Sharon et al., 2015). (Lam et al.,

2022) advance CBS through the Branch-and-Cut-and-

Price (BCP) algorithm, integrating mixed-integer pro-

gramming (MIP) for better performance. (Li et al.,

2021a) further enhance CBS with CBSH2-RTC, ef-

fectively resolving symmetric conﬂicts and reducing

node expansions, signiﬁcantly improving scalability.

As the number of agents increases, the state space

expands exponentially, leading to the utilization of

suboptimal solvers to handle this growth. In such

cases, suboptimal solvers prove beneﬁcial for ﬁnd-

ing paths quickly. (Semiz and Polat, 2021) introduce

the Incremental MAPF (I-MAPF) problem and pro-

pose CBS-D*-lite, which combines CBS with D*-

lite. This approach efﬁciently updates only affected

agents’ paths in dynamic environments without re-

planning from scratch, enhancing adaptability and

computational efﬁciency over traditional CBS, al-

though it may not yield optimal solutions in rapidly

changing scenarios. (Li et al., 2021b) present Ex-

plicit Estimation CBS (EECBS), which employs on-

line learning to obtain inadmissible heuristics of each

high-level node and uses Explicit Estimation Search

to select which high-level node to expand next. (Oku-

A Multitier Approach for Dynamic and Partially Observable Multiagent Path-Finding

563

Table 1: Comparison of the Most Promising Multiagent Path-Finding Approaches.

Approach Optimality Completeness Environment Uncertainty Observability Solver Type

CBS (Sharon et al., 2015) Centralized Optimal Complete Static No Fully Rule-based

CBSH2-RTC (Li et al., 2021a) Centralized Optimal Complete Static No Fully Rule-based

EECBS (Li et al., 2021b) Centralized Suboptimal Complete Static No Fully Rule-based

PIBT (Okumura et al., 2022) (De)centralized Suboptimal Incomplete Static Yes Partially Priority/Rule-based

PRIMAL (Sartoretti et al., 2019) Decentralized Suboptimal Incomplete Static Yes Partially Learning-based

BCP (Lam et al., 2022) Centralized Optimal Complete Static No Fully Systematic

I-MAPF (Semiz and Polat, 2021) Centralized Optimal Complete Static Yes Partially Systematic

Our Approach Mixed Suboptimal Complete Dynamic Yes Partially Multitier

mura et al., 2022) introduce Priority Inheritance with

Backtracking (PIBT), a two-level decoupled approach

that uses dynamic prioritization to manage agent

movements, ensuring reachability in large, real-time

environments. Despite its scalability and efﬁciency,

PIBT may struggle in dense settings.

Although CBS and its variants excel at providing

promising solutions, they struggle in highly dynamic

environments with unpredictable movements. Most

studies assume dynamic settings but typically focus

on partial observability (Shofer et al., 2023) or the

change in the number of agents in a team of coop-

erative and collaborative agents as a dynamic envi-

ronment (Wan et al., 2018). The static assumption

in many MAPF models can lead to inefﬁciencies and

suboptimal performance when dealing with such dy-

namic elements. Additionally, their centralized nature

demands high computational resources and full ob-

servability, often impractical in real applications. In

contrast, this study addresses dynamic environments

with partially observable and unpredictable entities,

offering a more realistic perspective.

On the other hand, decentralized approaches

in multi-agent pathﬁnding have gained signiﬁcant

attention, particularly for managing dynamic and

large-scale environments by distributing path plan-

ning tasks among local decision-makers. These ap-

proaches offer a ﬂexible and scalable solution when

centralized coordination becomes impractical or com-

putationally intensive. Due to limited knowledge, de-

centralized approaches do not guarantee optimality

but offer reasonable solutions in a shorter time. Con-

sequently, various techniques have been developed to

address the challenges of decentralized multi-agent

pathﬁnding (DMAPF) and coordination in dynamic

environments. DMAPF requires multiple agents to

coordinate their movements to avoid collisions and

achieve their individual goals (Verbari et al., 2019).

(Desaraju and How, 2012) utilize rapidly explor-

ing random tree (RRT), a popular technique used in

DMAPF to quickly explore the search space and ﬁnd

a feasible path for each agent.

Moreover, decentralized approaches offer diverse

applications for MAPF beyond traditional methods.

(Morag et al., 2023) introduce a learning-based mo-

tion planning framework where each agent constructs

a graph of boundary value problems based on con-

straints. (Peng, 2023) combine heuristic search, em-

pirical rules, and multiagent reinforcement learning

with real-time and heuristic planners. (Netter and

Vamvoudakis, 2023) present a decentralized motion

planning algorithm that leverages game theory for

online path updates in dynamic environments, miti-

gating the freezing robot problem while considering

kinodynamic constraints. (Kasaura et al., 2023) pro-

pose periodic MAPP, using constraint relaxation and

optimization for periodic agent appearances. Lastly,

(Keskin et al., 2024) suggest a decentralized ne-

gotiation strategy using token protocols and path-

aware/heatmap in grid environments.

Given the inherent complexities and dynamic na-

ture of MAPF, bioinspired algorithms have emerged

as a promising approach to enhance adaptability and

collision avoidance in complex and dynamic settings

(Aljalaud et al., 2023). For example, (Dai et al.,

2019) introduces an improved Ant Colony Algorithm

(ACO) for MAPF, combining features of the A* algo-

rithm and the MAX-MIN Ant System to handle intri-

cate environments. (Huang et al., 2021) optimize the

ACO algorithm by reﬁning parameters and addressing

vertex conﬂicts. Their approach introduces adaptive

pheromone intensity, reduction factors, and an initial

pheromone distribution, which help to prevent the al-

gorithm from getting stuck in local optima.

Finally, Table 1 summarizes the cutting-edge ap-

proaches mentioned above for the MAPF problem. In

contrast to these studies mentioned above, our study

proposes a novel multitier approach, leveraging the

advantages of centralized and decentralized processes

and integrating ACO into a decentralized and online

framework to address the challenges of DPO-MAPF.

3 PROBLEM DEFINITION

MAPF problem involves the allocation of conﬂict-

free paths to agents, guiding them from their ini-

tial locations to their goal locations. In the context

of the MAPF problem, we consider m agents rep-

resented by a = {a

,...,a

} navigating within a

graph G = (V, E). The initial and goal locations for

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

564

an agent a

are denoted as s

∈ V and g

∈ V , respec-

tively, and an agent can traverse from vertex v ∈ V to

vertex v

′

∈ V if there is an edge such that (v,v

′

) ∈ E.

Time is considered discrete, and at each time step, an

agent can occupy only one vertex of the graph and

execute a single action for moving to an adjacent ver-

tex. The path of every agent a

is represented by a

sequence of vertices (v

∈ V ) identiﬁed at each time

step t and denoted as π

(single-agent plan). At a

given time step, agents cannot be located in the same

vertex, π

̸= π

∀ i ̸= j, and cannot traverse the same

edge, (π

= π

t+1

) =⇒ (π

t+1

̸= π

) ∀ i ̸= j, where π

is the location of a

at t (Stern, 2019). A valid solution

for the MAPF problem is a set of single-agent plans

without any conﬂict with the obstacles and each other.

As in the classical MAPF problem, the environ-

ment contains static obstacles, such as walls, build-

ings, or permanent structures, that remain in ﬁxed

and known locations throughout the path-ﬁnding pro-

cess. Our DPO-MAPF problem also has partially ob-

servable and dynamically moving obstacles with un-

known positions and behaviors. Let’s assume that n

dynamic obstacles represented by o = {o

,...,o

}

moving within the graph G = (V,E). Like agents,

each dynamic obstacle (o

) has an initial location

∈ V) and can traverse from vertex v ∈ V to vertex

′

∈ V if there is an edge such that (v,v

′

) ∈ E. The dy-

namic obstacles can only be positioned in one of the

graph’s vertices at each time step (see the red cells in

Figure 1) and execute a single action at each time step

t. The agents cannot be located in the same vertex

and traverse the same edge with dynamic obstacles as

well (π

̸= π

and π

= π

t+1

=⇒ π

t+1

̸= π

∀a

∈

a∧o

∈ o, where π

is the location of a

and π

is the

location of o

at t). It is worth noting that the agents

cannot communicate with these dynamic obstacles.

Agents also disappear when they reach the goal po-

sition to prevent post-goal collisions, as addressed in

(Sharon et al., 2015). Additionally, the agents have

a limited vision to detect dynamic obstacles. Agents

can observe dynamic obstacles only if they fall within

their ﬁeld of view. This means that agents are not pro-

vided with any information about the existence, po-

sition and velocity of dynamic obstacles before the

agents’ observation. Finally, the maximum total time

step to complete the mission is limited.

To address this, we developed a simulation frame-

work

for the DPO-MAPF task visualized in Figure

1. This framework enables grid-like DPO-MAPF sce-

To ensure the reproducibility and trans-

parency of our study, the correspond-

ing GitHub repository is publicly available:

https://github.com/aniltrue/DPO_MAPF_ICAART_2025

Figure 1: Illustration of the Simulation Framework.

narios, including different densities of dynamic and

static obstacles, map sizes and number of agents.

Each agent can move between neighboring cells

based on available actions such as “Up”, “Down”,

“Right”, “Left”, and “Wait”. Note that the grid en-

vironment, commonly used for path-ﬁnding prob-

lems, can be equivalently represented as an undirected

graph where the connectivity between cells (i.e., ver-

tices) is deﬁned by their adjacency in the grid. This

structured representation simpliﬁes certain operations

while maintaining the general properties of graph-

based representations, thus ensuring that the classical

MAPF problem deﬁned on graphs is consistent with

our grid-based implementation. Besides, inspired by

(Eran et al., 2021), each agent’s ﬁeld of view is rep-

resented by a ﬁxed-sized square (i.e., 5 × 5), with the

agent always positioned at the center of this square.

Consequently, this simulation framework facilitates

the generation of environments, evaluation of pro-

posed approaches, and visualization of agent move-

ments within the DPO-MAPF environment.

4 LP FORMALIZATION

The linear programming (LP) approach has already

been used to ﬁnd optimal solutions for classical

MAPF problems. However, it struggles with scala-

bility issues due to the N P -Hard nature of the MAPF

problem, highlighting the necessity for alternative ap-

proaches such as heuristic algorithms.

In our context, we introduce an LP formalization

of the DPO-MAPF problem for validation during the

scenario-making process. Since LP models must be

deterministic and solvable by commercial solvers, the

uncertainty introduced by partial observability makes

the LP approach less suitable for DPO-MAPF. In

other words, the mathematical model requires com-

plete information about the environment, including

A Multitier Approach for Dynamic and Partially Observable Multiagent Path-Finding

565

both dynamic and static obstacles. Nonetheless, it still

provides a solid baseline for evaluation purposes.

The parameters and decision variables are sum-

marized below, followed by a formulation of the

DPO-MAPF problem:

Indices:

i = Index representing a node in the graph

t = Discrete time-step

m = Index of an agent

Sets:

T = The set of time-steps, [0,T

max

I = The set of all nodes in the graph.

M = The set of agents, indexed from 1 to m.

E = The set of all edges, where each edge is

a pair (i,i

′

) indicating a direct connection

between nodes i and i

′

N (i) = The set of neighboring nodes of node i,

deﬁned as N (i) = {i

′

∈ I | (i, i

′

) ∈ E}.

Parameters:

= Indicates whether node i is occupied

by an obstacle at time-step t.

= Indicates whether node i is the

starting position of agent m.

= Indicates whether node i is the goal

position of agent m.

max

= The maximum available time step for

agent m to complete its travel.

′

= The distance between node i and node

′

in the graph.

Decision Variables:

itm

= Indicates whether agent m is located

at node i at time-step t.

= Indicates whether agent m has reached

its goal position at time-step t.

′

= Indicates whether agent m moves from

node i

′

to node i at time-step t.

Mathematical Model:

minimize

∑

m∈M

∑

t∈T

∑

i∈I

∑

′

∈N (i)

′

·Y

′

(1)

subject to:

itm

≤ 1 − o

, ∀m, t, i (2)

i0m

= s

, ∀m, i (3)

∑

i∈I

itm

= 1 − A

, ∀m, t (4)

∑

t∈T

itm

≥ g

, ∀m, i (5)

itm

≤

∑

′

∈N (i)

′

(t−1)m

, ∀m, i, t ∈ T \ {0} (6)

∑

t∈T

∑

i∈I

itm

(1 − g

) ≤ t

max

, ∀m (7)

(t+1)m

≤ A

∑

i∈I

∑

j∈J

i jtm

· g

i jm

, ∀m,t (8)

′

≤ X

′

, ∀m, i, t ∈ T \ {0}, i

′

∈ N (i) (9)

′

≤ X

itm

, ∀m, t, i, i

′

∈ N (i) (10)

′

≥ X

itm

+ X

′

(t−1)m

− 1, (11)

∀m, i, t ∈ T \ {0}, i

′

∈ N (i)

itm

∈ {0,1}, ∀m, t, i (12)

′

∈ {0,1}, ∀m, t, i, i

′

(13)

∈ {0,1}, ∀m, t (14)

The objective function (1) aims to minimize the

total travel distance of all agents across the graph for

the agents from their starting positions to their respec-

tive goal positions while ensuring collision-free paths.

Constraint (2) prevents agents from entering nodes

blocked by any obstacle type at any time step. Con-

straint (3) guarantees that agents start their journey

from their predeﬁned starting points. Constraint (4)

reinforces the condition that an agent is either present

at a node or inactive once it reaches its goal, prevent-

ing collisions. Constraint (5) indicates whether agents

have reached their predeﬁned goals. Constraint (6) re-

stricts agents to move only to adjacent nodes, ensur-

ing valid movements within the graph. Constraint (7)

sets a maximum time limit for each agent to complete

its path. Constraint (8) speciﬁes that an agent must

actively move toward its goal if not yet reached, but

once at the goal, it may remain stationary. Constraints

(9 & 10 & 11) ensure the ﬂow variable accurately rep-

resents agent movements between nodes only when

both nodes are occupied consecutively. Finally, Con-

straints (12 & 13 & 14) enforce that all decision vari-

ables are binary.

5 PROPOSED MULTITIER

RESOLUTION APPROACH

In the context of MAPF solvers, valid paths for agents

are determined by either a single decision-maker (i.e.,

centralized) or multiple decision-makers (i.e., decen-

tralized) during travel (i.e., online) or before travel

(i.e., ofﬂine). Each approach has advantages and

disadvantages, and the choice must be made care-

fully, considering the requirements, limitations, and

attributes of the problem. The primary aim of this

study is to introduce a practical approach to tackle

the challenges of DPO-MAPF as outlined above. Our

DPO-MAPF problem involves both static and dy-

namic obstacles. While static obstacles are known be-

forehand and can be efﬁciently handled using central-

ized and ofﬂine solvers, dynamic obstacles introduce

partial observability, requiring a decentralized and on-

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

566

line approach. To handle both dynamic and static ob-

stacles, we introduce a multitier approach comprising

the following key components and leveraging the ad-

vantages of various MAPF solver approaches:

1. Centralized and Ofﬂine Path Finding. The ﬁrst

tier, illustrated at the top of Figure 2, generates

initial paths for each agent by considering only

static obstacles (i.e., ignoring uncertainty) in a

centralized and ofﬂine manner. This tier can em-

ploy either an optimal centralized and ofﬂine al-

gorithm or a suboptimal one, each with advan-

tages and disadvantages depending on the prob-

lem speciﬁcations and requirements. It is impor-

tant to emphasize that the proposed approach is

designed to be independent of the speciﬁc choice

of MAPF solver.

2. Decentralized and Online Conﬂict Resolution.

This tier independently executes each agent’s path

generated in the ﬁrst tier until potential colli-

sions caused by uncertainty are detected. When

such conﬂicts arise, a revising strategy following

a consensus protocol is employed to adjust their

paths, dynamically preempting potential collision

in real-time as shown at the bottom of Figure 2.

Figure 2: Illustration of the Proposed Multitier Approach.

6 CONFLICT RESOLUTION

Dynamic obstacles are unknown in advance and only

become apparent when observed, which makes them

challenging to manage in centralized approaches. To

address this limitation, we propose integrating a de-

centralized and online conﬂict resolution method. In

our proposed approach, agents independently handle

any conﬂict while following their path. Recall the

ﬁeld of view concept where agents can dynamically

observe a part of the graph on their route. When an

obstacle-based conﬂict is foreseen, the agent revises

its path accordingly via a revising strategy.

The proposed revising strategy ﬁrstly applies the

“WAIT” action to update the current path of the cor-

Algorithm 1: Proposed ACO for Revising Strategy.

Input: Graph G = (V,E), unavailable edges

Output: Conﬂict-free path for the agent

1 τ

i j

← 1 ∀ (i, j) ∈ E;

2 β ← β

initial

;

3 while termination condition not met do

4 for each ant do

5 repeat

6 Calculate transition probabilities for

available edges (Eq. 15);

7 Randomly select an edge;

8 Traverse on the chosen edge;

9 until ant reaches the goal or exceeds

travelling time limit;

10 Determine the incumbent path ;

11 Calculate reinforcement levels (Eq. 16);

12 Update pheromones (Eq. 17);

13 Decrease β linearly from β

initial

to β

final

;

14 if early-stopping condition is met then

15 Break the loop;

16 return The incumbent conﬂict-free path;

responding agent. However, the waiting tactic may

not always resolve some situations, such as corridor

cases (Sharon et al., 2015) and the irrational behav-

ior of dynamic entities. For such cases, a new path

must be reconstructed for that agent during travel.

To achieve this, we adapt the Ant Colony Optimiza-

tion (ACO) algorithm to generate a new conﬂict-free

path. ACO has proven effective for graph-based op-

timization (Stützle and Dorigo, 2004), particularly in

minimizing total travel time in complex path-ﬁnding

tasks (Dai et al., 2019; Huang et al., 2021). However,

applying the standard ACO directly to dynamic and

multiagent environments requires adaptation, also in-

corporating improvements in exploration - exploita-

tion, and hyperparameter tuning (Algorithm 1).

In our version, each agent independently exe-

cutes ACO from their current position to their goal.

Pheromone levels (τ), representing historical desir-

ability, are initially set to 1.0 for all edges (Line 1).

The algorithm stops when it reaches either the itera-

tion limit (e.g., 150) or an early-stopping condition,

where the threshold is set to 1/3 of the iteration limit

(Lines 3-15). In each iteration, multiple ants (e.g.,

75) are deployed, exploring within their travel time

limits (Lines 5-9). For an adaptation of ACO to DPO-

MAPF, a mechanism is essential to prevent conﬂicts

by prohibiting traversal through occupied or reserved

vertices and edges. For this purpose, the agent must

integrate environmental knowledge by marking cer-

tain edges as “unavailable”, considering (i) static ob-

stacles, (ii) positions of other agents at the current and

subsequent time steps, and (iii) current (t) and next

A Multitier Approach for Dynamic and Partially Observable Multiagent Path-Finding

567

(t + 1) positions of observed dynamic obstacles.

The ants randomly select edges based on tran-

sition probabilities deﬁned by Equation 15 (Line 6)

while unavailable edges are always assigned a prob-

ability of zero. For the remaining edges, the transi-

tion probability is inﬂuenced by two factors: the cur-

rent pheromone level, τ

i j

(t), and heuristic informa-

tion, η

i j

, which corresponds to the inverse Manhattan

distance to the goal. The balance between exploration

and exploitation is controlled by the parameters α and

β. At the beginning, exploration is prioritized by set-

ting β to a high value (e.g., β

initial

= 5.0, Line 2). As

the algorithm progresses, β decreases linearly to β

final

(e.g., 0.5), encouraging exploitation of known paths

(Line 13). To maintain stable probabilities, α is ﬁxed

at 1.0 throughout the process.

i j

(t) =

i j

(t)

× η

i j

∑

N (i)

(t)

× η

(15)

Once all ants complete their journeys, the best

path found so far (i.e., the incumbent path), is deter-

mined (Line 10). Pheromone levels are then updated

based on the ants’ performance. Edges traversed by

ants receive pheromone reinforcement, with the mag-

nitude determined by the ratio of the maximum travel

capacity to the travel cost (Equation 16, Line 11). In

contrast, non-traversed edges decay at a predeﬁned

evaporation rate, ρ (e.g. 0.1), gradually reducing their

inﬂuence over time (Equation 17, Line 12).

∆τ

i j

(t) =

(

max

if ant k uses edge (i, j)

0 otherwise

(16)

i j

(t + 1) = (1 − ρ) × τ

i j

(t) +

∑

∆τ

i j

(t)

(17)

This enhanced ACO approach ensures robust path

revision by dynamically adapting to changing envi-

ronments. If the initial revision fails, the agent con-

tinues along its current path, monitoring for new op-

portunities to resolve conﬂicts as conditions evolve.

Conﬂicts can arise not only from dynamic obsta-

cles but also from interactions with other agents. In

such cases, while the prioritized agent determined by

a consensus protocol continues along its path, the oth-

ers are expected to concede by revising their paths. To

determine which agent receives priority, we introduce

two consensus protocols:

• Fair Token Protocol: Initially, all agents have

an equal number of tokens, and the agent who

obtains more tokens will be prioritized, as illus-

trated in Figure 3. The prioritized agent then loses

one token while the other agents gain one token,

ensuring fairness among the agents. Since, the

Figure 3: Fair Token Protocol.

obtained tokens will be considered in subsequent

conﬂicts. The protocol ﬁrst considers urgent con-

ditions (e.g., reaching the end of travelling time

limit or failure in revising strategy). To evalu-

ate urgency, hypothetical revisions are applied to

all agents’ paths to assess if an urgent condition

would arise. Based on this evaluation, the agent

who has an urgency is prioritized. When urgency

is detected for more than one agent, prioritizing

is arbitrarily assigned. If no urgency is detected,

the token difference is considered. When agents

hold the same number of tokens, the difference

in cost between their current and their hypothet-

ical revised paths is used as a tiebreaker. This

token-based protocol, inspired by (Eran et al.,

2021), ensures feasibility and fairness. Unlike

their negotiation-based strategy, we implement a

rule-based token allocation approach.

• Probability-Based Protocol: This protocol uses

a probabilistic approach to determine the likeli-

hood of other agents conceding to a particular

agent by considering factors such as the number

of agents involved (N), the remaining travel time

limit (C

= t

max

−t

current

) and remaining distance

to the goal of agent m, D

. The risk factor as-

sociated with failing to reach the goal if agent

m will concede is computed using Equation 18.

The priority score helps agents efﬁciently resolve

conﬂicts, as indicated in Equation 19. Subse-

quently, P

concede

is determined by considering the

normalized priority score and the risk factor, as

outlined in Equation 20, and calculates the proba-

bility of other agents conceding to agent m. Con-

sequently, this protocol prioritizes the agents ran-

domly based on the assigned probabilities.

· (1 −

) (18) PS

= C

− D

(19)

concede

= (

· R

)/(

∑

l=1

· R

) (20)

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

568

7 EXPERIMENTAL EVALUATION

As DPO-MAPF is a novel challenge, no benchmark

algorithms are available for direct comparison. In-

stead, we investigate what extent we have solved this

problem. For this purpose, we assess various aspects

of the proposed algorithm across multiple scenarios

and conﬁgurations to demonstrate the impact and per-

formance of key components.

7.1 Experimental Setup

To facilitate experiments, we developed a custom map

generator to create diverse MAPF simulation setups.

This generator produces environments tailored to dif-

ferent scenarios using unique random seeds, allowing

for variations in static and dynamic obstacle densi-

ties, the number of agents, and map sizes. We de-

signed 132 different map conﬁgurations with dimen-

sions of 10 × 10, 15 × 15, 20 × 20, and 25 × 25, ac-

commodating 3 to 12 agents and dynamic obstacles.

These conﬁgurations included static obstacle densi-

ties of 5%, 10%, 15%, and 20%, with three random

seeds assigned to each scenario. The maximum to-

tal time step was set to twice the Manhattan distance

between each agent’s initial and target locations.

For the evaluation, the ﬁrst tier uses either an

optimal (CBSH2-RTC (Li et al., 2021a)) or a sub-

optimal (EECBS (Li et al., 2021b)) solver. Addi-

tionally, we run 9,504 scenarios, including different

algorithm conﬁgurations with three different random

seeds due to the non-deterministic nature of ACO.

We also utilize ﬁve evaluation metrics: success rate,

runtime, concession difference, Earth Mover’s Dis-

tance (EMD), and optimality gap. The success rate

assesses how effectively the algorithms handle uncer-

tainty, while runtime measures the processing time.

The number of concessions made by each agent at the

end of a scenario is recorded, and the maximum con-

cession difference among agents is obtained, reﬂect-

ing how fairly the protocols prioritize agents during

conﬂicts. Finally, the EMD quantiﬁes changes in a

path after applying the revision process (Rubner et al.,

1998; Wiedemann et al., 2021). The average EMD

for all agents at the end of each scenario is consid-

ered, making it a key metric for assessing the impact

of the revision process. The optimality gap measures

the difference in the objective value between found

solution and optimal solution.

For each randomly generated scenario, we use a

commercial solver (e.g., GUROBI) to obtain optimal

solutions and validate the existence of feasible solu-

tions for a more reliable evaluation. Through our LP

formalization, we calculate the optimality gap and en-

sure that only operationally feasible and solvable sce-

narios are included in the evaluation. However, the

Certainty Assumption (Winston, 2022) does not hold

due to the inherent uncertainty in DPO-MAPF. There-

fore, dynamic obstacles are treated as fully observable

and known in advance for the LP solution.

Furthermore, Section 7.2 and Section 7.3 evaluate

the distinct conﬁgurations of the proposed multitier

approach. It is worth noting that ideal results aim for

a higher success rate, a lower optimality gap, a shorter

runtime and a lower concession difference. For each

evaluation metric, normality tests and statistical group

comparisons are applied to identify signiﬁcant differ-

ences. If a signiﬁcant difference is observed within

a group, corresponding pairwise statistical tests are

conducted for post-hoc analysis, with detailed expla-

nations provided in the Appendix.

7.2 Evaluation of Multitier Components

This section evaluates the performance of the pro-

posed multitier approach by analyzing the impact of

the utilization of a centralized solver (i.e., ﬁrst tier)

and various consensus protocols (i.e., second tier).

Besides primary protocols, we include a random strat-

egy that randomly prioritizes agents. Furthermore, we

compare the performance of optimal (CBSH2-RTC)

and sub-optimal (EECBS) solvers in the ﬁrst tier.

Figure 4: Success Rate of Multitier Components.

Figure 5: Optimality Gap of Multitier Components.

Figures 4 and 5 illustrate the performance of the

proposed approach across different conﬁgurations in

terms of success rate and optimality gap, respec-

tively. The Probability-Based protocol with the opti-

mal solver achieves the highest success rate (46.5%),

while the same protocol with the sub-optimal solver

A Multitier Approach for Dynamic and Partially Observable Multiagent Path-Finding

569

records the lowest success rate (42.9%). Although

the optimal solver generally shows a slightly higher

success rate, no statistically signiﬁcant difference is

observed in the group test (p ≈ .313 > .05). Ad-

ditionally, the Probability-Based protocol with the

sub-optimal solver exhibits the worst optimality gap

(4.5%), while the Random protocol with the sub-

optimal solver shows the best optimality gap (4.2%),

again with no statistically signiﬁcant difference in the

group test (p ≈ .994 > .05).

Figure 6: Runtime (sec) of Multitier Components.

In terms of runtime, the Fair-Token protocol with

the sub-optimal solver has a statistically signiﬁcant

longer runtime, as conﬁrmed by the group test (p ≈

.003 < .05) and displayed in Figure 6. This is be-

cause, unlike other protocols that only revise the

paths of non-prioritized agents, the Fair-Token pro-

tocol evaluates potential urgency by considering hy-

pothetical revisions for all agents, including the prior-

itized one. This additional evaluation step increases

the runtime.

Figure 7: Average EMD of Multitier Components.

Moreover, Figure 7 presents the average EMD,

which measures the magnitude of path changes for

each conﬁguration. No signiﬁcant differences, as in-

dicated by the group test (p ≈ .168 > .05). This anal-

ysis indicates that consensus protocols and the cen-

tralized solver have no substantial impact on the suc-

cess rate, optimality gap, or runtime, as they do not

directly affect the revision of the paths taken by the

agents in DPO-MAPF.

Furthermore, Figure 8 shows the maximum con-

cession difference among agents at the end of a sce-

nario. The Fair-Token protocol, with both optimal

Figure 8: Max. Concession Diff. of Multitier Components.

and suboptimal solvers, achieves the greatest fairness

(0.43 and 0.42, respectively), while the Probability-

Based protocol with the both solvers records the worst

fairness (0.71 for the optimal solver and 0.80 for the

suboptimal one). Statistically signiﬁcant differences

are observed across all combinations in the group test

(p ≈ .000 < .05). The Fair-Token protocol, speciﬁ-

cally designed to promote fairness, results in the low-

est concession difference overall. On the other hand,

the Probability-Based protocol tends to frequently

prioritize a single agent with the highest risk of fail-

ing to complete its path, potentially leading to unfair

situations.

7.3 Evaluation of Revising Strategy

The Fair-Token protocol with the optimal solver

(CBSH2-RTC) is the best performing conﬁguration

for our multitier approach based on the evaluation re-

sults. Using this conﬁguration, this section evaluates

the performance of proposed revising strategy by ex-

amining different aspects, including:

• No-Revising: A baseline scenario where no re-

vising strategy is applied.

• Only Waiting: A conﬁguration that employs only

the “Waiting” strategy as a revising method.

• Basic ACO: Our basic variant of ACO adapted to

DPO-MAPF, introducing “unavailable edges” (ar-

eas ants cannot traverse) and limiting ants’ travel

based on the maximum time step of the agent.

• Enhanced ACO: Our enhanced variant of Basic

ACO incorporating the following proposed im-

provements: (i) applying the “Waiting” strategy

before ACO, (ii) implementing an early-stopping

condition, and (iii) dynamically β adjustment.

Figure 9 shows the success rate of the revising

strategies, indicating that Enhanced ACO statistically

signiﬁcantly outperforms No-Repairing, Only Wait-

ing, and Basic ACO (44.7% vs. 18.9% & 23.5% &

39.9%). Note that the group test shows the statisti-

cally signiﬁcant difference among revising strategies

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

570

Figure 9: Success Rate for Revising Strategies.

(p ≈ .000 < .05). Basic ACO also statistically outper-

forms No-Repairing and Only Waiting, indicating that

the traditional Only Waiting strategy struggles to han-

dle the non-rational behavior of dynamic obstacles,

underscoring the importance of path reconstruction.

Figure 10: Optimality Gap for Revising Strategies.

Furthermore, Figure 10 presents the optimality

gap among the strategies, where the difference in the

group test is statistically signiﬁcant (p ≈ .000 < .05).

Despite the increase in success rate, Enhance ACO

has the worst optimality gap (4.47%), indicating a

trade-off in the proposed enhancements. Note that the

optimality gap can be only calculated for successfully

completed scenarios, meaning that the gap between

strategies becomes less relevant when the difference

in success rates is statistically signiﬁcant.

Figure 11: Runtime (sec) for Revising Strategies.

Moreover, Figure 11 illustrates the runtime for

each revising strategy, with a statistically signiﬁcant

difference in the group test (p ≈ .00 < .05). The Ba-

sic ACO strategy yields the slowest runtime (1.92 sec)

while the No-Revising and Only Waiting achieve the

fastest one (0.02 sec). This result denotes that ACO

Figure 12: Average EMD for Revising Strategies.

process increases the runtime to resolve conﬂicts but

achieves a higher success rate. However, the early-

stopping condition and the use of waiting strategy

enhancements successfully reduce the redundant pro-

cessing time without compromising the success rate.

Figure 12 presents the average EMD for each re-

vising strategy, with a statistically signiﬁcant differ-

ence in the group test (p ≈ .000 < .05). Regarding

to average EMD, Enhanced ACO revises paths more

extensively than the other versions (7.08 vs. 6.33 and

1.03). The results suggest that more substantial up-

dates tend to increase the success rate, the optimality

gap, and the runtime. The correlations between av-

erage EMD and optimality gap (0.106), success rate

(0.550), and runtime (0.504) are also observed.

Notably, the LP formulation overlooks the uncer-

tainty challenges of the DPO-MAPF problem. This

means that both the lower bound (i.e., optimal value)

and the feasibility check provided by LP for randomly

generated scenarios may not be achievable due to

the lack of situational awareness. While this limita-

tion complicates analysis and reasoning, the LP solu-

tion still serves as a valuable baseline for evaluation.

Our proposed approach might already be achieving

the best possible performance under ideal conditions,

but this remains uncertain due to the inherent unpre-

dictability of DPO-MAPF.

8 CONCLUSION

Uncertainty in real-world scenarios arises from the

stochastic or unpredictable behaviors of other enti-

ties and events in the environment. To better account

for such uncertainties, this study extends the tradi-

tional MAPF problem by introducing partially ob-

servable and dynamically moving obstacles, referred

to as DPO-MAPF. To address this challenge, we pro-

pose a multitier solution approach. In the ﬁrst tier,

a centralized and ofﬂine path planning algorithm is

employed to ﬁnd conﬂict-free paths for each agent,

considering only static and fully observable obstacles.

The second tier involves a decentralized and online

conﬂict resolution process that employs a heuristic-

A Multitier Approach for Dynamic and Partially Observable Multiagent Path-Finding

571

based revising strategy (i.e., ACO) and a consensus

protocol to resolve conﬂicts arising from the uncer-

tainty challenges in DPO-MAPF.

For evaluation, we developed a grid-based frame-

work, which enables the random generation of sce-

narios with varying map sizes, agent counts, and ob-

stacle densities. Additionally, a linear programming

approach using a commercial solver was developed to

validate the feasibility of the randomly generated sce-

narios. The evaluation results show that the proposed

revising strategy improves the success rate under the

uncertainty of DPO-MAPF, although it slightly in-

creases the optimality gap and extends the runtime.

Furthermore, the results indicate that the magnitude

of revision is correlated with success rate, runtime,

and optimality gap. While the ﬁrst tier employs state-

of-the-art approaches (CBSH2-RTC and EECBS), us-

ing an optimal solver yields better outcomes. For

consensus protocols, Fair Token protocol provides the

greatest fairness among agent without compromising

performance. In summary, the decentralized and on-

line conﬂict resolution approach enables agents reach

their goals under uncertainty and enhances fairness

among agents, which is crucial in certain real-world

applications.

Our problem deﬁnition closely mirrors real-life

scenarios by incorporating uncertainty, making it ap-

plicable for applications such as autonomous vehicle

navigation, drone ﬂeet coordination, and warehouse

robotics. Future work could focus on reﬁning the re-

vising strategy, including (i) experimenting with dif-

ferent ﬁeld-of-view ranges, (ii) utilizing initial paths

instead of constructing them from scratch, and (iii)

optimizing the balance between success rate, optimal-

ity gap, and runtime to develop more robust solutions

for DPO-MAPF. Further analysis should expand the

variety of scenarios (e.g., larger map sizes, a higher

number of agents, or existing benchmark scenarios

from (Stern et al., 2021)) by identifying alternative

approaches for validation and optimal values rather

than LP due to the scalability and uncertainty issues.

Additionally, this study establishes a benchmark for

performance comparison in future research, as DPO-

MAPF is a newly deﬁned problem with no alternative

algorithms available thus far.

ACKNOWLEDGEMENTS

We sincerely thank Prof. Burcu Balçık and Assoc. Prof.

Mehmet Önal for her valuable insights and guidance, which

have greatly contributed to the development of this work.

REFERENCES

Aljalaud, F., Kurdi, H., and Youcef-Toumi, K. (2023). Bio-

inspired multi-uav path planning heuristics: A review.

Mathematics, 11(10):2356.

Dai, X., Long, S., Zhang, Z., and Gong, D. (2019). Mo-

bile robot path planning based on ant colony algorithm

with a* heuristic method. Frontiers in neurorobotics,

13:15.

Desaraju, V. R. and How, J. P. (2012). Decentralized path

planning for multi-agent teams with complex con-

straints. Auton. Robots, 32(4):385–403.

Eran, C., Keskin, M. O., Cantürk, F., and Aydo

gan, R.

(2021). A decentralized token-based negotiation ap-

proach for multi-agent path ﬁnding. In European

Conference on Multi-Agent Systems, pages 264–280.

Springer.

Huang, S., Yang, D., Zhong, C., Yan, S., and Zhang, L.

(2021). An improved ant colony optimization algo-

rithm for multi-agent path planning. In 2021 Interna-

tional Conference on Networking Systems of AI (IN-

SAI), pages 95–100. IEEE.

Kasaura, K., Yonetani, R., and Nishimura, M. (2023). Pe-

riodic multi-agent path planning. In AAAI Conference

on Artiﬁcial Intelligence.

Keskin, M. O., Cantürk, F., Eran, C., and Aydo

gan, R.

(2024). Decentralized multi-agent path ﬁnding frame-

work and strategies based on automated negotia-

tion. Autonomous Agents and Multi-Agent Systems,

38(1):1–30.

Lam, E., Le Bodic, P., Harabor, D., and Stuckey, P. J.

(2022). Branch-and-cut-and-price for multi-agent

path ﬁnding. Computers & Operations Research,

144:105809.

Li, J., Harabor, D., Stuckey, P. J., Ma, H., Gange, G., and

Koenig, S. (2021a). Pairwise symmetry reasoning for

multi-agent path ﬁnding search. Artiﬁcial Intelligence,

301:103574.

Li, J., Ruml, W., and Koenig, S. (2021b). Eecbs: A

bounded-suboptimal search for multi-agent path ﬁnd-

ing. In Proceedings of the AAAI conference on artiﬁ-

cial intelligence, volume 35, pages 12353–12362.

Morag, J., Stern, R., and Felner, A. (2023). Adapting

to planning failures in lifelong multi-agent path ﬁnd-

ing. In Proceedings of the International Symposium

on Combinatorial Search, volume 16, pages 47–55.

Netter, J. and Vamvoudakis, K. G. (2023). Decentral-

ized multi-agent motion planning in dynamic environ-

ments. In 2023 American Control Conference (ACC),

pages 1655–1660.

Okumura, K., Machida, M., Défago, X., and Tamura, Y.

(2022). Priority inheritance with backtracking for iter-

ative multi-agent path ﬁnding. Artiﬁcial Intelligence,

310:103752.

Peng, S. (2023). Multi-robot path planning combin-

ing heuristics and multi-agent reinforcement learning.

arXiv preprint arXiv:2306.01270.

Rubner, Y., Tomasi, C., and Guibas, L. (1998). A metric

for distributions with applications to image databases.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

572

In Sixth International Conference on Computer Vision

(IEEE Cat. No.98CH36271), pages 59–66.

Ryan, M. R. (2008). Exploiting subgraph structure in multi-

robot path planning. Journal of Artiﬁcial Intelligence

Research, 31:497–542.

Sartoretti, G., Kerr, J., Shi, Y., Wagner, G., Kumar,

T. S., Koenig, S., and Choset, H. (2019). Primal:

Pathﬁnding via reinforcement and imitation multi-

agent learning. IEEE Robotics and Automation Let-

ters, 4(3):2378–2385.

Semiz, F. and Polat, F. (2021). Incremental multi-agent

path ﬁnding. Future Generation Computer Systems,

116:220–233.

Sharon, G., Stern, R., Felner, A., and Sturtevant, N. R.

(2015). Conﬂict-based search for optimal multi-agent

pathﬁnding. Artiﬁcial Intelligence, 219:40–66.

Shofer, B., Shani, G., and Stern, R. (2023). Multi agent path

ﬁnding under obstacle uncertainty. In Proceedings of

the International Conference on Automated Planning

and Scheduling, volume 33, pages 402–410.

Standley, T. (2010). Finding optimal solutions to coopera-

tive pathﬁnding problems. In Proceedings of the AAAI

conference on artiﬁcial intelligence, volume 24, pages

173–178.

Stern, R. (2019). Multi-agent path ﬁnding–an overview. Ar-

tiﬁcial Intelligence: 5th RAAI Summer School, Dol-

goprudny, Russia, July 4–7, 2019, Tutorial Lectures,

pages 96–115.

Stern, R., Sturtevant, N., Felner, A., Koenig, S., Ma, H.,

Walker, T., Li, J., Atzmon, D., Cohen, L., Kumar,

T., Barták, R., and Boyarski, E. (2021). Multi-agent

pathﬁnding: Deﬁnitions, variants, and benchmarks.

Proceedings of the International Symposium on Com-

binatorial Search, 10:151–158.

Stützle, T. and Dorigo, M. (2004). Ant Colony Optimiza-

tion. The MIT Press.

Verbari, P., Bascetta, L., and Prandini, M. (2019). Multi-

agent trajectory planning: A decentralized iterative

algorithm based on single-agent dynamic rrt*. 2019

American Control Conference (ACC), pages 1977–

1982.

Wan, Q., Gu, C., Sun, S., Chen, M., Huang, H., and Jia,

X. (2018). Lifelong multi-agent path ﬁnding in a dy-

namic environment. In 2018 15th International Con-

ference on Control, Automation, Robotics and Vision

(ICARCV), pages 875–882.

Wiedemann, T., Vlaicu, C., Josifovski, J., and Viseras, A.

(2021). Robotic information gathering with reinforce-

ment learning assisted by domain knowledge: An ap-

plication to gas source localization. IEEE Access,

9:13159–13172.

Winston, W. (2022). Operations Research: Applications

and Algorithms. Cengage Learning.

APPENDIX

The components of proposed approaches (mentioned in 7.2)

are evaluated through a series of statistical analyses to en-

sure their reliability and consistency during the evaluation

process. Firstly, normality tests using the Shapiro-Wilk test

were conducted, and all evaluation metrics (average EMD,

optimality gap, runtime, and max. concession difference)

failed across all groups. Consequently, non-parametric tests

were applied for further comparison.

For success rate, the Cochran’s Q test was used and in-

dicated no signiﬁcant difference (p ≈ .313 > .05). Sim-

ilarly, the average EMD was evaluated using the Fried-

man test, which showed no signiﬁcant difference (p ≈

.168 > .05). Lastly, the optimality gap was analyzed with

the Kruskal-Wallis test, yielding no signiﬁcant difference

(p ≈ .994 > .05). Additionally, for both runtime and max.

concession difference, the Friedman test was applied and

showed signiﬁcant differences for both metrics (p ≈ .003 <

.05 for runtime, p ≈ .000 < .05 for max. concession Differ-

ence), prompting pairwise comparisons using the Wilcoxon

signed rank test (Table 2).

Table 2: Pairwise Comparisons (p-values) for Runtime and

Max. Concession Difference.

Comparison Runtime Max. Concession Difference

Random (EECBS) vs Fair Token (EECBS) .030 .002

Probability-Based (CBSH2-RTC) vs Fair Token (EECBS) .019 .000

Fair Token (EECBS) vs Random (CBSH2-RTC) .005 .012

Fair Token (EECBS) vs Probability-Based (EECBS) .008 .000

Random (EECBS) vs Probability-Based (CBSH2-RTC) .330 .179

Random (EECBS) vs Random (CBSH2-RTC) .085 .957

Random (EECBS) vs Fair Token (CBSH2-RTC) .829 .003

Probability-Based (CBSH2-RTC) vs Random (CBSH2-RTC) .543 .092

Probability-Based (CBSH2-RTC) vs Probability-Based (EECBS) .510 .519

Probability-Based (CBSH2-RTC) vs Fair Token (CBSH2-RTC) .274 .000

Fair Token (EECBS) vs Fair Token (CBSH2-RTC) .340 .900

Random (CBSH2-RTC) vs Probability-Based (EECBS) .232 .074

Random (CBSH2-RTC) vs Fair Token (CBSH2-RTC) .178 .003

Probability-Based (EECBS) vs Fair Token (CBSH2-RTC) .513 .001

Now, the revising strategies (mentioned in 7.3) are eval-

uated through a series of statistical analyses. For all evalua-

tion metrics (optimality gap, runtime, average EMD) failed

the normality test (i.e., Shapiro-Wilk), leading to the use of

non-parametric tests again, illustrated in Table 3. The Fried-

man test was applied for runtime and average EMD all show

signiﬁcant differences (p ≈ .000 < .05), followed by the

Wilcoxon signed rank test for pairwise comparisons. For

the optimality gap, the Kruskal-Wallis test indicated signiﬁ-

cant differences (p ≈ .000 < .05), prompting Conover-Iman

test for pairwise comparisons. Finally, success rate were

analyzed using Cochran’s Q test, which also showed signif-

icant differences (p ≈ .000 < .05), followed by McNemar’s

test for pairwise comparisons.

Table 3: Pairwise Comparisons (p-values) for Revising

Strategies.

Comparison Optimality Gap Average EMD Runtime Success Rate

Enhanced ACO vs Basic ACO .000 .014 .000 .023

Enhanced ACO vs Only Waiting .000 .000 .000 .000

Basic ACO vs Only Waiting .402 .000 .000 .000

Basic ACO vs No-Repairing - - .000 .000

Enhanced ACO vs No-Repairing - - .000 .000

Only Waiting vs No-Repairing - - .000 .005

A Multitier Approach for Dynamic and Partially Observable Multiagent Path-Finding

573