Integrating Late Variable Binding with SP-MCTS for Efﬁcient Plan

Execution in BDI Agents

Frantisek Vidensky

∗ a

, Frantisek Zboril

∗ b

and Petr Veigend

Department of Intelligent Systems, Brno University of Technology, Bozetechova 2, Brno, Czech Republic

ﬁ

Keywords:

BDI Agents, Agent Interpretation, AgentSpeak(L), Monte Carlo Tree Search.

Abstract:

This paper investigates the Late binding strategy as an enhancement to the SP-MCTS algorithm for intention

selection and variable binding in BDI (Belief-Desire-Intention) agents. Unlike the Early binding strategy,

which selects variable substitutions prematurely, Late binding defers these decisions until necessary, aggre-

gating all substitutions for a plan into a single node. This approach reduces the search tree size and enhances

adaptability in dynamic environments by maintaining ﬂexibility during plan execution. We implemented the

Late binding strategy within the FRAg system to validate our approach and conducted experiments in a static

maze task environment. Experimental results demonstrate that the Late binding strategy consistently outper-

forms Early binding, achieving up to 150% higher rewards, particularly for the lowest parameter values of the

SP-MCTS algorithm in resource-constrained scenarios. These results conﬁrm that it is feasible to integrate

Late binding into intention selection methods, opening opportunities to explore its use in approaches with

lower computational demands than the SP-MCTS algorithm.

1 INTRODUCTION

The Belief-Desire-Intention (BDI) model (Rao and

Georgeff, 1995) represents a dominant paradigm in

agent development. Inspired by Bratman’s theory of

intentions (Bratman, 1987), BDI agents mimic human

cognitive processes, enabling sophisticated reasoning,

planning, and decision-making capabilities. This pro-

cess, referred to as practical reasoning, involves se-

lecting goals and determining the means to achieve

them (Wooldridge, 1999).

In BDI-based agent programming languages

(d’Inverno et al., 1998; Winikoff, 2005; Pokahr et al.,

2005; Rao, 1996; Bordini et al., 2007), the agent’s

behaviour is deﬁned by three key mental attitudes:

beliefs, desires, and intentions. Beliefs represent the

agent’s information about its environment and itself.

Desires capture the states the agent aims to achieve,

while intentions embody the agent’s commitments

to speciﬁc actions or plans for achieving its desires.

Plans are the means by which the agent modiﬁes its

environment to achieve its goals. A plan is comprised

of steps that may include primitive actions directly al-

https://orcid.org/0000-0003-1808-441X

https://orcid.org/0000-0001-7861-8220

https://orcid.org/0000-0003-3995-1527

∗

These two authors contributed equally to this work

tering the environment or subgoals addressed by other

plans.

The execution of a BDI agent adheres to a re-

peated deliberation cycle. This cycle involves up-

dating the agent’s beliefs and goals to reﬂect the cur-

rent environment, selecting plans for achieving active

goals, and executing the next step of the chosen plans.

For each top-level goal, a plan is selected, forming the

root of an intention, and its steps are sequentially ex-

ecuted. If a step corresponds to a subgoal, a sub-plan

is chosen and added to the intention, and the process

continues recurrently.

Most researchers aiming to improve practical rea-

soning focus on addressing the intention selection

problem. Practical reasoning, as realized in BDI

agents, is operationalized through the deliberation cy-

cle. The intention selection problem refers to the chal-

lenge of determining which intention to progress dur-

ing the current deliberation cycle.

In many BDI architectures, intentions are exe-

cuted in an interleaved manner (Winikoff, 2005; Bor-

dini et al., 2007), enabling concurrent processing but

introducing potential conﬂicts when steps in one in-

tention block others. Researchers have proposed var-

ious strategies to address these conﬂicts, including

Summary Information-based techniques (Thangara-

jah et al., 2003; Thangarajah et al., 2011), which rea-

Vidensky, F., Zboril, F. and Veigend, P.

Integrating Late Variable Binding with SP-MCTS for Efﬁcient Plan Execution in BDI Agents.

DOI: 10.5220/0013373900003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 1, pages 679-686

ISBN: 978-989-758-737-5; ISSN: 2184-433X

679

son about pre- and post-conditions, Coverage-based

approaches (Thangarajah et al., 2012; Waters et al.,

2014; Waters et al., 2015), which prioritize inten-

tions most at risk due to environmental changes, and

stochastic methods like Single-Player Monte Carlo

Tree Search (SP-MCTS) (Yao et al., 2014; Yao and

Logan, 2016), which optimize intention selection

through simulation.

Our research group has taken a different approach,

focusing on selecting variable substitutions in the

well-known BDI programming language, AgentS-

peak(L) (Rao, 1996). Its most widely used interpreter,

Jason (Bordini et al., 2007), employs the Early bind-

ing strategy for selecting variable substitutions by de-

fault. This means that substitutions are selected dur-

ing the evaluation of context conditions or the per-

forming of test goals. In contrast, our proposed Late

binding strategy (Zboril et al., 2022; Vidensky et al.,

2023) defers the selection of variable substitutions

until necessary, such as during the execution of ac-

tions. Until that point, the agent maintains a structure

called the context, which contains all valid substitu-

tions. This structure is dynamically updated as the

plan is executed.

Both strategies have been implemented in the

Flexibly Reasoning BDI Agent (FRAg)

system,

and experimental evaluations (Vidensky et al., 2024)

demonstrated that, despite increased computational

overhead, the Late binding strategy outperformed the

Early binding strategy in most scenarios. These re-

sults highlight the potential of Late binding for im-

proving adaptability in dynamic settings.

Our previous work (Vidensky et al., 2025) ex-

tended the FRAg system by implementing a failure-

handling mechanism inspired by the CAN (Sardina

and Padgham, 2011) system. Experiments revealed

that when combined with the Late binding strategy,

the failure handling mechanism achieved better re-

sults than the Early binding strategy. Moreover, these

results showed that the Early binding strategy can

be effectively integrated with existing approaches.

Building on these ﬁndings, we incorporated an SP-

MCTS-based approach (Yao and Logan, 2016) for

intention selection into the FRAg system. This ap-

proach is considered a state-of-the-art solution for ad-

dressing the intention selection problem.

In this paper, we explore the integration of the

Late binding strategy into the SP-MCTS algorithm

for intention selection in BDI agents. Section 2 in-

troduces the Late binding strategy and its advantages

over Early binding. Section 3 describes the SP-MCTS

algorithm and its adaptation for action-level intention

selection. Section 4 discusses the limitations of SP-

https://github.com/VUT-FIT-INTSYS/FRAg

MCTS, while Section 5 analyses the potential of Late

binding to address computational challenges in inten-

tion selection. The experimental evaluation is pre-

sented in Section 6, and the paper concludes with a

summary of ﬁndings and future research directions in

Section 7.

2 LATE VARIABLE BINDING

In BDI agent systems, variable substitution plays

a crucial role in ensuring effective and adaptive

decision-making. Traditionally, BDI agents employ

an Early binding strategy, where variable substitu-

tions are determined when evaluating plan context

conditions or test goals (Rao, 1996). While straight-

forward, this approach can lead to failures in dynamic

environments, as it lacks the ﬂexibility to adapt to

changes occurring after plan selection.

The Late binding strategy, introduced in (Zboril

et al., 2022), with its operational semantics detailed

in (Vidensky et al., 2023), defers variable substitu-

tions until they are strictly necessary, such as during

the execution of actions. Instead of binding variables

at an early stage, the strategy maintains all poten-

tial substitutions within a structure called the context.

This context is dynamically updated during plan ex-

ecution, discarding substitutions that no longer sat-

isfy the agent’s belief base or runtime conditions. By

preserving valid options throughout the plan’s lifecy-

cle, the agent can adapt its behaviour to environmental

changes without restarting plans unnecessarily.

The context is established when a plan is selected

as part of an agent’s intention. Unlike early binding

systems, where variable substitutions are applied im-

mediately, late binding systems maintain all possible

variable substitutions. The context is represented as

a set of possible substitutions called Possible Uniﬁer

Set (PUS) (Zboril et al., 2022), which encompasses

all substitutions consistent with the agent’s current be-

lief base and the context conditions of the selected

plan.

During plan execution, the context is continuously

updated to reﬂect changes in the environment or the

agent’s beliefs. This process, known as the restriction

(Zboril et al., 2022), systematically removes invalid

substitutions. So, to be more precise, the restriction

operation reduces the PUS to contain only valid sub-

stitutions Speciﬁcally, the restriction operation is ap-

plied in the following cases:

• Test Goals: When encountering a test goal, the

context is restricted to retain only substitutions for

which the tested predicate is true in the current be-

lief base. This involves a broad uniﬁcation opera-

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

680

Figure 1: A comparison of Early and Late Binding Strate-

gies. The left side illustrates Early binding, where inten-

tions consist of stacks of plans with partially instantiated

variables. In contrast, the right side illustrates Late binding,

which retains uninstantiated plan bodies while maintaining

variable substitutions separately within a context. Taken

from (Vidensky et al., 2025).

tion (Zboril et al., 2022), which identiﬁes all sub-

stitutions that unify the test goal with the agent’s

beliefs.

• Actions: When performing an action, the context

is restricted to include only substitutions consis-

tent with the action by binding free variables to

speciﬁc atoms

By deferring the selection of variable substitutions

until strictly necessary, late binding minimizes the

risk of premature decisions that could lead to plan

failure. This method allows the agent to dynamically

adapt to environmental changes without restarting its

plans.

This dynamic updating mechanism signiﬁcantly

enhances the robustness of BDI agents by reducing

plan failures and improving adaptability in dynamic

environments. As illustrated in Figure 1, early bind-

ing relies on partially instantiated plans, making in-

tentions more rigid and prone to failure in volatile

conditions. Conversely, late binding dynamically se-

lects substitutions, enabling agents to respond effec-

tively to changes without compromising the progress

of their intentions.

3 ACTION-LEVEL INTENTION

SELECTION WITH SP-MCTS

The Single-Player Monte Carlo Tree Search (SP-

MCTS) algorithm (Schadd et al., 2008) has been

adapted for use in BDI agents as a method for opti-

mizing intention scheduling (Yao et al., 2014). This

adaptation involves applying SP-MCTS to the goal-

plan tree (GPT) (Thangarajah et al., 2011) structure

instead of traditional game states. The goal-plan tree

is a hierarchical representation of the agent’s decision

space. Goals are represented as parent nodes, with

plans associated with their children. Subgoals within

these plans expand into additional child nodes, creat-

ing a layered structure that captures the dependencies

between goals and plans. By simulating various se-

quences of plan execution, SP-MCTS evaluates their

effectiveness in achieving goals, making it a suitable

approach for complex decision-making scenarios in

dynamic environments.

SP-MCTS operates in iterative cycles, each con-

sisting of four key steps:

• Selection: Starting from the root node, the algo-

rithm selects a child node using an adapted ver-

sion of the Upper Conﬁdence Bounds for Trees

(UCT) (Kocsis and Szepesv

ari, 2006) formula.

This modiﬁcation balances exploration (searching

unexplored nodes) and exploitation (focusing on

promising nodes) in the context of BDI decision-

making.

• Expansion: A new node is added to the tree by

selecting an unexplored action or goal from the

current node.

• Simulation: A simulated sequence of actions

is performed from the newly added node, using

heuristic-based action selection to estimate out-

comes.

• Backpropagation: The simulation results are

propagated back through the selected path in the

tree, updating the values of visited nodes.

This cycle allows SP-MCTS to iteratively reﬁne

its decision-making process, building a tree that cap-

tures the agent’s possible future states and their asso-

ciated outcomes. The algorithm uses two key param-

eters: α, which determines the number of node ex-

pansions (iterations), and β, which speciﬁes the num-

ber of simulations conducted at each node. These pa-

rameters enable ﬁne-tuning of the algorithm’s perfor-

mance, balancing computational effort and the quality

of decision-making.

While the original SP-MCTS approach applied to

entire plans, its adaptation in the Action-Level Inten-

tion Selection (SA) algorithm (Yao and Logan, 2016)

introduces signiﬁcant enhancements. Speciﬁcally, SA

modiﬁes the goal-plan tree (GPT) to include not only

goals and plans but also primitive actions within those

plans. This reﬁnement enables action-level interleav-

ing, addressing conﬂicts between steps of different

plans more effectively.

By simulating these action sequences within the

enhanced goal-plan tree, the SA algorithm identiﬁes

potential conﬂicts and selects the most promising ac-

tions to achieve goals efﬁciently. Each node in the

tree represents a unique state of the environment and

the agent’s progress. Simulations evaluate the poten-

tial outcomes of different action sequences, and the

Integrating Late Variable Binding with SP-MCTS for Efﬁcient Plan Execution in BDI Agents

681

results are backpropagated to reﬁne the tree. This

enables the SA algorithm to dynamically adapt its

decision-making process to changing environmental

conditions.

These advancements underline the versatility of

SP-MCTS in addressing conﬂicts and enhancing in-

tention selection in BDI agents. Recent research has

further explored the use of SP-MCTS in multi-agent

environments (Dann et al., 2020; Dann et al., 2021;

Dann et al., 2022). These extensions demonstrate the

adaptability of SP-MCTS and its potential for broader

applications in dynamic and collaborative scenarios.

4 LIMITATIONS AND

CHALLENGES OF THE SA

ALGORITHM

While innovative and effective in addressing conﬂicts

at the action level in single-agent environments, the

SA algorithm presents several limitations and chal-

lenges that must be considered for practical imple-

mentation.

One notable limitation is the necessity of ex-

plicitly deﬁning pre- and post-conditions for actions.

These conditions are crucial for evaluating the fea-

sibility and consequences of action sequences dur-

ing simulations. However, in programming environ-

ments like AgentSpeak(L), which do not natively re-

quire such speciﬁcations, this represents a signiﬁcant

departure from standard practices and could increase

the development effort.

Another limitation lies in the reliance on proposi-

tional logic for goal-plan trees. While this approach

is computationally efﬁcient, it limits expressiveness in

scenarios requiring more complex reasoning. Extend-

ing the algorithm to support predicate logic would al-

low for richer representations and introduce signiﬁ-

cant computational overhead, making its scalability

more challenging.

Finally, the computational overhead associated

with the algorithm, particularly the effort required to

build and traverse the SP-MCTS search tree, is non-

negligible. Although the authors claim that the com-

plexity is manageable in their scenarios, dynamic en-

vironments requiring frequent updates may amplify

this overhead, necessitating trade-offs between com-

putational efﬁciency and the optimality of intention

selection.

5 GOAL-PLAN TREE FOR LATE

VARIABLE BINDING

STRATEGY

Using the Goal-Plan Tree (GPT) structure for pro-

grams based on predicates can be computationally ex-

pensive, as it requires creating a separate node for

each possible variable substitution. This approach,

corresponding to Early binding, increases the number

of nodes in the GPT, leading to higher computational

costs in the SP-MCTS algorithm.

In contrast, the Late binding strategy aggregates

all substitutions for a plan into a single node, reducing

the number of nodes that must be traversed and sim-

ulated. This approach enhances computational scala-

bility by minimizing redundant evaluations of invalid

substitutions and makes the strategy particularly suit-

able for scenarios with many possible substitutions.

During simulation, substitutions are dynamically

selected from the context, which maintains these sub-

stitutions. This approach ensures efﬁcient use of the

computational budget deﬁned by β, allowing the algo-

rithm to explore multiple possibilities while reducing

redundant simulations.

To illustrate this concept, consider a simple exam-

ple of an agent participating in a trading market for

collectable cards. The agent’s goal is to ﬁnd a match-

ing offer to satisfy a given demand. A plan for the

agent, written in AgentSpeak(L), can be expressed as

follows:

+! sell : w a n t s ( Buyer , Card , Max _ P r ice )

<- ? offers ( Selle r , Card , Price );

Pri ce <= Max_ P r i ce ;

sell ( Se ller , Buyer , C ard , P r i c e );

! sell .

The environment includes one seller, adam, and

two buyers, betty and clara, interested in the same

card. The agent’s belief base (BB) is deﬁned as:

of f ers ( adam , cd1 , 85).

wan t s ( betty , cd1 , 60).

wan t s ( clara , cd1 , 90).

The example GPT can be seen in Figure 2. The

trees are simpliﬁed as there is only one plan; under the

goal node, child nodes represent the variable bindings

created while evaluating the context condition. The

plan nodes are truncated at the action that may fail,

excluding subsequent actions for simplicity.

Figure 2 illustrates the difference between Early

and Late binding strategies using the goal-plan tree.

In the Early binding strategy (Figure 2a), each substi-

tution generates a separate node, resulting in a larger

tree. For example, the agent must evaluate substitu-

tions for both betty and clara, even though betty’s plan

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

682

(a) Early binding: Separate nodes for each substitution. (b) Late binding: Uniﬁed node with all substi-

tutions.

Figure 2: Illustration of GPT construction under Early and Late binding strategies. Early binding creates multiple nodes for

each substitution, whereas Late binding generates a uniﬁed node.

will fail due to price constraints. This leads to unnec-

essary computational effort during simulations.

In contrast, the Late binding strategy (Figure 2b)

aggregates all potential substitutions into a single

node. During plan execution, a restriction operation

systematically reﬁnes the context by removing sub-

stitutions that no longer satisfy the belief base or plan

conditions. For example, when the price constraint

fails, the invalid substitution for betty is discarded.

The Late binding strategy offers several advan-

tages:

• Reduced Tree Size: By aggregating substitu-

tions, Late binding decreases the number of nodes

in the GPT, improving computational efﬁciency.

• Dynamic Flexibility: Substitutions are evaluated

only when necessary, allowing the agent to adapt

to environmental changes.

• Improved Robustness: By deferring decisions,

Late binding reduces the likelihood of plan fail-

ures caused by premature substitutions.

We hypothesize that the Late binding strategy will

achieve better results than the Early binding strategy

under identical parameter settings for the SA algo-

rithm. By reducing the number of nodes that need to

be traversed, the algorithm can explore more aggre-

gated nodes within the same number of steps, poten-

tially yielding an optimal outcome. This hypothesis

will be validated through experimental evaluation.

We further anticipate that leveraging the Late

binding strategy for SP-MCTS will prove particularly

beneﬁcial in scenarios involving a large number of

possible variable substitutions. In such cases, Early

binding requires the generation of separate nodes for

each substitution, leading to exponential growth in

the size of the goal-plan tree due to the combinato-

rial nature of substitutions across multiple variables.

Late binding addresses this by dynamically narrow-

ing down substitutions during plan execution, avoid-

ing unnecessary computational overhead and invalid

paths. This dynamic and adaptive behavior aligns

with the requirements of complex and dynamic envi-

ronments, where decision-making must remain ﬂexi-

ble and efﬁcient. Consequently, Late binding is ex-

pected to enhance the scalability and robustness of

SP-MCTS in addressing such challenges effectively.

6 EXPERIMENTAL EVALUATION

This section presents the experimental evaluation of

the SP-MCTS algorithm, focusing on its performance

Integrating Late Variable Binding with SP-MCTS for Efﬁcient Plan Execution in BDI Agents

683

with the Late binding strategy. The primary objective

of these experiments is to highlight the advantages of

reducing the number of nodes expanded and simu-

lated by the algorithm and the resulting improvements

in the quality of the achieved results. The evaluation

was conducted within a static maze environment, us-

ing varying values for parameters α (number of itera-

tions) and β (number of simulations per iteration).

6.1 Environment Description

The maze environment is designed as a grid-based

world, where an agent performs tasks involving the

collection of materials. Materials such as gold, silver,

or bronze can be found in various positions across the

grid, and the agent’s goal is to collect speciﬁc combi-

nations of these materials in a deﬁned sequence. The

environment applies speciﬁc rules for material degra-

dation, where materials degrade to lower-value ver-

sions, and evaluates the agent’s performance based on

its ability to complete tasks.

Grid Layout. The environment consists of a grid

with ﬁxed dimensions (6x6). Each position is

uniquely identiﬁed by its coordinates [X, Y] and can

be associated with a material. The materials present

on the grid are:

• Gold: The highest-value material, which de-

grades into silver when picked.

• Silver: An intermediate material which degrades

into bronze when picked.

• Bronze: When picked, the lowest value material

degrades to dust.

• Dust: A neutral material with no value, represent-

ing the ﬁnal stage of degradation.

Material Degradation. Each time the agent picks a

material, it degrades according to the following rules:

• gold → silver → bronze → dust.

For example, if the agent picks a gold material, it

degrades to silver and remains in the same position,

waiting for further collection. Once a material de-

grades to dust, it can no longer be collected or con-

tributed to task completion.

Tasks and Goals. The environment provides pre-

deﬁned tasks, each represented as a combination of

three materials in a speciﬁc order. For example, a

task might require the agent to collect three instances

of gold (gold, gold, gold) or a combination of

gold, silver and bronze (gold, silver, bronze).

After collecting three materials, the agent’s collected

combination is evaluated against the predeﬁned tasks.

The agent is rewarded with a point if the combination

matches any tasks. After collecting three materials,

the agent’s collected combination is evaluated against

the predeﬁned tasks. The agent is rewarded with a

point if the combination matches any tasks.

Agent Interaction. The agent interacts with the en-

vironment using two primary actions:

• go(Direction): Moves the agent to an adjacent

position in the speciﬁed direction (up, down, left,

right).

• pick: Picks the material at the agent’s current po-

sition, adding it to the agent’s collection bag and

triggering material degradation.

The agent’s behaviour is guided by its belief base,

which includes information about its position, per-

ceived materials, and the current task.

Evaluation and Restart. When the agent collects

three materials, its bag is evaluated. If the collected

combination matches a predeﬁned task, the agent is

rewarded with one point. Regardless of success or

failure, the agent’s bag is reset, and a new round be-

gins. This process continues until the agent completes

all tasks or exhausts the available materials.

The maze environment tests the agent’s ability

to adapt strategies by balancing navigation, resource

management, and task adherence for optimal perfor-

mance.

Suitability of the Material Collection Task. The

material collection task was chosen for its ability

to highlight the differences between Early and Late

binding strategies effectively. Unlike Early binding,

which prematurely commits to a speciﬁc task, Late

binding dynamically narrows down the set of viable

tasks based on current conditions, maintaining ﬂexi-

bility throughout the decision-making process. This is

particularly evident when the agent collects material

and evaluates which tasks remain achievable based

on the collected preﬁx. Instead of adopting a sin-

gle task immediately, Late binding reﬁnes the task

space to those still compatible with the agent’s cur-

rent progress and belief base.

6.2 Evaluation Results

The evaluation results for the Early and Late bind-

ing strategies are summarized in Table 1. The table

highlights the rewards achieved for various parameter

combinations. All experiments were conducted with

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

684

Table 1: Rewards achieved by Early and Late variable bind-

ing strategies for different parameter settings of the SP-

MCTS algorithm (α and β).

α β Early Late

5 5 2 5

10 5 3 5

10 10 5 6

15 10 5 5

15 15 5 6

20 15 4 6

20 20 6 7

25 20 6 7

a limit of 60 agent steps per simulation. The Late

strategy consistently outperforms the Early strategy,

particularly for smaller values of α and β.

6.2.1 Analysis

The experimental results demonstrate that the Late

binding strategy consistently outperforms the Early

binding strategy across most tested parameter conﬁg-

urations. This advantage is particularly evident un-

der constrained resource settings. For example, with

α = 5 and β = 5, the Late binding strategy achieved

a reward of 5 points, compared to only 2 points for

the Early binding strategy, an improvement of 150%.

This trend highlights the Late binding strategy’s abil-

ity to explore the decision space more effectively

within limited computational budgets.

While some anomalies were observed, such as oc-

casional performance drops at higher parameter val-

ues, these can be attributed to the stochastic nature

of SP-MCTS and the limited number of experimen-

tal runs. For example, a suboptimal random choice

during simulation or an unﬁnished task within the al-

located steps may have contributed to these results.

Increasing the number of experimental repetitions

would provide a more robust statistical basis for these

ﬁndings.

Overall, the Late binding strategy demonstrates

a clear advantage in strengthening higher-quality so-

lutions and efﬁciently navigating the search space.

These results strongly support the hypothesis that

Late binding enhances performance in scenarios with

restricted resources and dynamic environments.

7 CONCLUSIONS

This paper presented an extension to the FRAg sys-

tem by integrating the Late binding strategy into the

SP-MCTS algorithm within the FRAg system, target-

ing variable binding and intention selection in BDI

agents. The experimental results demonstrated that

the Late binding strategy signiﬁcantly outperforms

the Early binding strategy, particularly under condi-

tions with limited computational resources or small

parameter values for α (iterations) and β (simula-

tions).

The Late binding strategy’s ability to aggregate

multiple variable substitutions into a single node re-

duces the size of the search tree. It facilitates a more

focused exploration of the decision space. This ap-

proach is expected to improve adaptability in dynamic

environments by maintaining a broader range of op-

tions for variable substitutions throughout the execu-

tion of the plan, allowing agents to respond more ef-

fectively to changes.

While SP-MCTS represents a state-of-the-art ap-

proach to intention selection, its computational de-

mands can make it unsuitable for scenarios re-

quiring rapid decision-making or environments with

highly constrained computational resources. Fu-

ture research could explore integrating Summary

Information-based and Coverage-based approaches to

mitigate these limitations, potentially offering a more

balanced trade-off between efﬁciency and adaptabil-

ity in such contexts.

Additionally, a comparative analysis of the FRAg

system against other BDI frameworks in more com-

plex task environments would provide valuable in-

sights into its practical advantages and areas for im-

provement. This direction could further validate the

proposed Late binding strategy’s robustness and ap-

plicability across diverse scenarios.

ACKNOWLEDGEMENTS

This work has been supported by the internal

BUT project FIT-S-23-8151. Computational re-

sources were provided by the e-INFRA CZ project

(ID:90254), supported by the Ministry of Education,

Youth and Sports of the Czech Republic.

REFERENCES

Bordini, R. H., H

ubner, J. F., and Wooldridge, M. (2007).

Programming multi-agent systems in AgentSpeak us-

ing Jason, volume 8. John Wiley & Sons.

Bratman, M. (1987). Intention, plans, and practical reason.

Harvard University Press.

Dann, M., Thangarajah, J., Yao, Y., and Logan, B. (2020).

Intention-aware multiagent scheduling. In Proceed-

ings of the 19th International Conference on Au-

tonomous Agents and MultiAgent Systems, AAMAS

Integrating Late Variable Binding with SP-MCTS for Efﬁcient Plan Execution in BDI Agents

685

’20, page 285–293, Richland, SC. International Foun-

dation for Autonomous Agents and Multiagent Sys-

tems.

Dann, M., Yao, Y., Alechina, N., Logan, B., and Thangara-

jah, J. (2022). Multi-agent intention progression with

reward machines. In Raedt, L. D., editor, Proceed-

ings of the Thirty-First International Joint Conference

on Artiﬁcial Intelligence, IJCAI-22, pages 215–222.

International Joint Conferences on Artiﬁcial Intelli-

gence Organization. Main Track.

Dann, M., Yao, Y., Logan, B., and Thangarajah, J.

(2021). Multi-agent intention progression with black-

box agents. In Zhou, Z.-H., editor, Proceedings of the

Thirtieth International Joint Conference on Artiﬁcial

Intelligence, IJCAI-21, pages 132–138. International

Joint Conferences on Artiﬁcial Intelligence Organiza-

tion. Main Track.

d’Inverno, M., Kinny, D., Luck, M., and Wooldridge, M.

(1998). A formal speciﬁcation of dmars. In Intelligent

Agents IV Agent Theories, Architectures, and Lan-

guages: 4th International Workshop, ATAL’97 Provi-

dence, Rhode Island, USA, July 24–26, 1997 Proceed-

ings 4, pages 155–176. Springer.

Kocsis, L. and Szepesv

ari, C. (2006). Bandit based monte-

carlo planning. In F

urnkranz, J., Scheffer, T., and

Spiliopoulou, M., editors, Machine Learning: ECML

2006, pages 282–293, Berlin, Heidelberg. Springer

Berlin Heidelberg.

Pokahr, A., Braubach, L., and Lamersdorf, W. (2005).

Jadex: A BDI Reasoning Engine, pages 149–174.

Springer US, Boston, MA.

Rao, A. S. (1996). Agentspeak(l): Bdi agents speak out

in a logical computable language. In Van de Velde,

W. and Perram, J. W., editors, Agents Breaking Away,

pages 42–55, Berlin, Heidelberg. Springer Berlin Hei-

delberg.

Rao, A. S. and Georgeff, M. P. (1995). Bdi-agents: From

theory to practice. In Lesser, V. and Gasser, L., editors,

Proceedings of the First International Conference on

Multiagent Systems (ICMAS), volume 95, pages 312–

319, Menlo Park, CA. AAAI Press.

Sardina, S. and Padgham, L. (2011). A bdi agent pro-

gramming language with failure handling, declarative

goals, and planning. Autonomous Agents and Multi-

Agent Systems, 23:18–70.

Schadd, M. P. D., Winands, M. H. M., van den Herik,

H. J., Chaslot, G. M. J. B., and Uiterwijk, J. W. H. M.

(2008). Single-player monte-carlo tree search. In

van den Herik, H. J., Xu, X., Ma, Z., and Winands,

M. H. M., editors, Computers and Games, pages 1–

12, Berlin, Heidelberg. Springer Berlin Heidelberg.

Thangarajah, J., Padgham, L., and Winikoff, M. (2003).

Detecting & avoiding interference between goals in

intelligent agents. In Proceedings of the 18th Inter-

national Joint Conference on Artiﬁcial Intelligence,

IJCAI’03, page 721–726, San Francisco, CA, USA.

Morgan Kaufmann Publishers Inc.

Thangarajah, J., Padgham, L., and Winikoff, M. (2011).

Computationally effective reasoning about goal inter-

actions. Journal of Automated Reasoning, 47(1):17–

56.

Thangarajah, J., Sardina, S., and Padgham, L. (2012). Mea-

suring plan coverage and overlap for agent reasoning.

In Proceedings of the 11th International Conference

on Autonomous Agents and Multiagent Systems - Vol-

ume 2, AAMAS ’12, page 1049–1056, Richland, SC.

International Foundation for Autonomous Agents and

Multiagent Systems.

Vidensky, F., Zboril, F., Beran, J., Koci, R., and Zboril, F. V.

(2024). Comparing variable handling strategies in bdi

agents: Experimental study. In Proceedings of the

16th International Conference on Agents and Artiﬁ-

cial Intelligence - Volume 1: ICAART, pages 25–36.

INSTICC, SciTePress.

Vidensky, F., Zboril, F., Koci, R., and Zboril, F. V. (2023).

Operational semantic of an agentspeak(l) interpreter

using late bindings. In Proceedings of the 15th In-

ternational Conference on Agents and Artiﬁcial In-

telligence - Volume 1: ICAART, pages 173–180. IN-

STICC, SciTePress.

Vidensky, F., Zboril, F., Koci, R., and Zboril, F. V. (2025).

Advanced evaluation of variable binding strategies in

bdi agents with integrated failure handling. Lecture

Notes in Artiﬁcial Intelligence. Accepted for publica-

tion.

Waters, M., Padgham, L., and Sardina, S. (2014). Evaluat-

ing coverage based intention selection. In Proceedings

of the 2014 International Conference on Autonomous

Agents and Multi-Agent Systems, AAMAS ’14, page

957–964, Richland, SC. International Foundation for

Autonomous Agents and Multiagent Systems.

Waters, M., Padgham, L., and Sardina, S. (2015). Improv-

ing domain-independent intention selection in bdi sys-

tems. Autonomous Agents and Multi-Agent Systems,

29(4):683–717.

Winikoff, M. (2005). Jack™ Intelligent Agents: An Indus-

trial Strength Platform, pages 175–193. Springer US,

Boston, MA.

Wooldridge, M. (1999). Intelligent Agents, page 27–77.

MIT Press, Cambridge, MA, USA.

Yao, Y. and Logan, B. (2016). Action-level intention selec-

tion for bdi agents. In Proceedings of the 2016 Inter-

national Conference on Autonomous Agents & Multi-

agent Systems, AAMAS ’16, page 1227–1236, Rich-

land, SC. International Foundation for Autonomous

Agents and Multiagent Systems.

Yao, Y., Logan, B., and Thangarajah, J. (2014). Sp-mcts-

based intention scheduling for bdi agents. In Proceed-

ings of the Twenty-First European Conference on Ar-

tiﬁcial Intelligence, ECAI’14, page 1133–1134, NLD.

IOS Press.

Zboril, F., Vidensky, F., Koci, R., and Zboril, V. F. (2022).

Late bindings in agentspeak(l). In Proceedings of the

14th International Conference on Agents and Artiﬁ-

cial Intelligence - Volume 3: ICAART, pages 715–724.

INSTICC, SciTePress.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

686