Research Proposal in Probabilistic Planning Search

Yazmin S. Villegas-Hernandez and Federico Guedea-Elizalde

Design and Technology Innovation Center, Tecnologico de Monterrey, Monterrey, Mexico

Keywords:

Bayesian Networks, Planning, and Planning Search.

Abstract:

In planning search, there are different approaches to guide the search, where all of them are focused in have a

plan (solution) in less time. Most of the researches are not admissible heuristics, but they have good results in

time. For example, using the heuristic-search planning approach plans can be generated in less time than other

approaches, but the plans generated by all heuristic planners are sub-optimal, or could have dead ends (states

from which the goals get unreachable). We present an approach to guide the search in a probabilistic way in

order to do not have the problems of the not admissible approaches. We extended the Bayesian network and

Bayesian inferences ideas to our work. Furthermore, we present our way to make Bayesian inferences in order

to guide the search in a better way. The results of our experiments of our approach with different well-known

benchmarks are presented. The benchmarks used in our experiments are: Driverlog, Zenotravel, Satellite,

Rovers, and Freecell.

1 INTRODUCTION

In planning, there are four main approaches to in-

crease the efﬁciency of planning systems.

First, Blum and Furst developed a novel algorithm

called Graphplan (Blum and Furst, 1997). This al-

gorithm reduces the branching factor by searching in

a special data structure. Furthermore, this algorithm

searches for layered plans (parallel plans). This algo-

rithm has three limitations. First, Graphplan applies

only to STRIPS language (Fikes and Nilsson, 1994).

Second, this planner performs poorly without extra ad

hoc reasoning capabilities. Third, the most important

limitation of Graphplan is that the quality of the plan

is not as good as the speed of the planning of this

planner.

There are many planning systems that use the

Graphplan algorithm or their own version of this al-

gorithm as SGP (Weld et al., 1998), Blackbox (Kautz

and Selman, 1999), IPP (Koehler, 1999), Medic

(Ernst et al., 1997), STAN (Fox and Long, 2011),

FF (Hoffmann and Nebel, 2011) and others. Further-

more, there is the LPG (Local Search for Planning

Graphs) planner (Gerevini et al., 2003), which is the

only one that does not use the Graphplan algorithm

properly, but it still has a planning-graph approach.

Therefore, LPG works with heuristics that exploit the

structure of the planning graph.

Second, Kautz and Selman developed a novel

method for planning called planning as satisﬁability

(SAT) (Kautz et al., 1992), which transforms plan-

ning problem into a propositional satisﬁability prob-

lem for which efﬁcient solvers are known. The SAT-

PLAN04 (Kautz, 2004) uses STRIPS language as

well as PDDL language (Fox and Long, 2003), but

SatPlan does not handle any non-STRIPS features

other than types, such as derived effects and condi-

tional actions.

Third, Bonet and Geffner developed a new ap-

proach based on heuristic-search planning (HSP)

(Bonet and Geffner, 2001). In this approach, a heuris-

tic is choosing among a set of different heuristics in

order to guide the search through the state space. Un-

fortunately, the heuristics used in this algorithm are

not fully admissible. Indeed, these heuristics do not

work for all domains.

Hoffman and Nebel, using and improving HSP

ideas, developed the FF (Fast-Forward) system (Hoff-

mann and Nebel, 2011), which is one of the fastest

planners in STRIPS language. The FF heuristic im-

plements a relaxed Graphplan algorithm to obtain the

minimum distance between the state and the goal

state, but this relaxed algorithm is not admissible be-

cause the relaxed Graphplan not consider the delete

list (which is all the delete effects of all operators).

The search algorithm, called Enforced Hill-Climbing

(EHC), only does a local search, which can lead to

dead ends (states from which the goals get unreach-

able).

Fourth, Bonet and Geffner used and improved

586

S. Villegas-Hernandez Y. and Guedea-Elizalde F..

Research Proposal in Probabilistic Planning Search.

DOI: 10.5220/0004907205860595

In Proceedings of the 6th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2014), pages 586-595

ISBN: 978-989-758-015-4

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

ideas of planning with incomplete information (Gene-

sereth and Nourbakhsh, 1993) to develop another

approach based on planning with incomplete infor-

mation as heuristic search in belief space (Geffner,

2011).

Planning with incomplete information is distin-

guished from classical planning in the type and

amount of information available at planning and exe-

cution time. In planning with incomplete information,

the initial state is not known, but sensor information

may be available at execution time. By contrast, in

classical planning the initial state is known and there

is no sensor to get knowledge or feedback of the states

at execution time.

Planning with incomplete information can be for-

mulated as a problem of search in belief space, where

belief states can be either set of states or more gen-

erally probability distribution over states. Bonet and

Geffner made the explicit formulation of planning

with incomplete information, in order to extend it

to tasks like contingent planning where the standard

heuristic search algorithms do not apply. The limita-

tions of this planner are two. The ﬁrst limitation is

that the search algorithm is not optimal like A*. The

second limitation is the complexity of a number of

preprocessing and runtime operations in GPT (Gen-

eral Planning Tool) (Bonet and Geffner, 2011) scales

with the size of the state space.

The heuristic-search planning approach has good

results in execution time, but the plans generated by

all heuristic planners are sub-optimal, or could have

dead ends (states from which the goals get unreach-

able). Our approach is different from heuristic-search,

we developed a probabilistic method to guide the

search in an efﬁcient way, in order to not have dead

ends, and also have optimal plans.

The paper is organized as follows. First, we ex-

plain classical planning, Bayesian inference and the

probabilistic planning search. Then we report results

of the experiments and some conclusions are drawn.

2 CLASSICAL PLANNING

FRAMEWORK

In Classical Planning (Russell, 2009), there are only

considered environments that are fully observable, de-

terministic, ﬁnite, static (change happens only when

the agent acts), and discrete (in time, action, objects,

and effects).

Classical Planning (Nilsson, 2010) is deﬁned as

a tuple

∑

=< S, S

, S

, A >, where S is a ﬁnite state

space, S

is an initial situation given by a single state

∈ S, S

is a goal situation given by a non-empty

set of states S

⊆ S, and A is a set of ﬁnite actions.

These actions A(s) ⊆ A are applicable in each state

s ∈ S, and every action a ∈ A(s) maps each state s ∈ S

into another state s

= f (a, s) in a deterministic way.

Indeed, there are positive action costs c(a, s) for doing

an action a ∈ A(s) in a state s ∈ S.

A plan (solution) for a deterministic problem is

a sequence of actions {a

, a

, . . . , a

} that generates

a state trajectory {s

, s

= f (s

), . . . , s

k+1

= f (s

, a

)}

such that each action a

∈ A(s

) is applicable in state

∈ S and s

k+1

∈ S

is a goal state. A plan π is opti-

mal when the total cost

∑

i=0

c(s

, a

) ≥ 0 of doing an

action a

∈ A(s) in a state s

∈ S is minimal.

In Classical Planning is used a transition function

f (s

, s

i+1

, a

) to guide the search (from a state s

to an-

other state s

i+1

using an action a

) to reach a solution.

The most successful planners use heuristics to have a

solution in less time as do FF, HSP, and others. But,

these planners do not have the same results in length

(of the solution) as in time. In order to deliver bet-

ter results in length, we propose a novel approach

to guide the search in Planning using Bayesian infer-

ence.

3 BAYESIAN NETWORK AND

INFERENCE

Bayesian networks (Russell, 2009) are directed

acyclic graphs whose events (nodes) are represented

by random variables which can have several states

with a probability of happening.

Figure 1 depicts a Bayesian network (Russell,

2009) where an event has an effect in several events,

and one or more events can have an effect on one

event.

Figure 1: Bayesian network.

Bayesian Inference (O’Hagan et al., 2004) is a

method of inference that update the probability of a

hypothesis (belief) using Bayes’ rule as additional ev-

idence is acquired.

ResearchProposalinProbabilisticPlanningSearch

587

The process of inference has two steps:

1. Update of the belief given an evidence

2. Propagation of the evidence through the network

Morawski deﬁned a novel model of Bayesian net-

work, which consists of a tree with one father with

several children in a hierarchical way (Morawski,

1989). Indeed, the tree has a father-children relation-

ship, where the father sends information to his son

(π) and a son to his father (λ). Figure 2 depicts the

Morawski model of a Bayesian network.

Figure 2: Evidence propagation in a bayesian network.

In this cause-effect relationship between father

and children we have both evidence: diagnostic and

causal evidence, where these evidences use the chain

rule given conditional independence to be estimated.

Furthermore, we have a node belief which uses the

causal and diagnostic evidence.

Diagnostic evidence is estimated, in equation 1,

using λ information of an event. In this expression,

Children( f

) stands for the children of the event (fa-

ther node) f

and λ

( f

) stands for the probability of

an event k happening given the event (father node) f

which is sent from the event k to the event f

. .

λ( f

) =

∏

k∈Children( f

)

( f

) (1)

Causal evidence, which is deﬁned in equation 2,

uses π information of an event. Furthermore, this

evidence is estimated using the conditional indepen-

dence and rule chain. In this expression, s

stands for

a state of the event S, States( f ) stands for the states

of the event f , P(s

| f

) stands for the probability of

happening state s

given his father(s) f

, π

( f

) stands

for the probability of happening the event f

sent from

the event f

to the event S, and f ather(S) stands for

the fathers of the event S.

π(s

) =

∑

k∈States( f )

P(s

| f

)

∏

f ∈ f ather(S)

( f

) (2)

The belief of an event e

is estimated, in equation

3, using the λ and π information (of his son and fa-

ther), where α is a normalization constant to maintain

in balance the joint probabilities, i.e. to maintain the

sum of all beliefs (in a Bayesian network) equal to 1.

β(e

) = απ(e

)λ(e

) (3)

Bayesian inference is used to do inference be-

tween conditional events, which could be helpful in

a planning problem search to reach to an optimal so-

lution in a probabilistic way.

4 PROPOSAL PROBABILISTIC

PLANNING

We propose a novel Probabilistic Planning Search,

which introduces the use of probability to guide the

search space in a non-relaxed way. Furthermore,

we propose a model for planning with probabilistic

search where a goal is achieved taking account the

probability to reach a goal state.

The model for the probabilistic planning is char-

acterized by a quintuple

∑

=< S, S

, S

, A, P > where

S is a ﬁnite state space, S

is a non-empty set of possi-

ble initial states S

∈ S, S

is a non-empty set of goal

states, A is a set of actions with A(s) denoting the ac-

tions applicable in state s A(s) ∈ A, and P is a proba-

bility transition function such that P(s

, s

) denotes

the non-empty set of probabilities of passing from the

an initial state s

using an action a

to the state s

In our approach we try to establish a basic form

to search in the Bayesian network using Morawski

model. We deﬁned the causal and diagnostic evidence

based on the amount of children states and on the dis-

tance from this state to the goal, respectively. Intu-

itively, we reduce the branch factor with the causal

evidence because it takes account the less amount of

children states; and reduce the plan length because

we take account the minimum cost to reach the goal.

This probability transition function is used to guide

the search, where the maximum probability is chosen

in each state.

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

588

4.1 Bayesian Inference

The state space, where nodes are states, can be mod-

eled as a Bayesian network (with a father-children

structure) with events replaced by states s ∈ S and

arrows (which establish cause-effect relationship be-

tween events) are replaced by actions a ∈ A(s).

Morawski used the belief equation of an event e

to estimate the probability of this event e

given other

events that occur in the past. In the same way, we use

this belief equation to estimate the probability to ﬁnd

the goal state g ∈ S given an action a ∈ A(s) chosen a

shorter way. In this probabilistic model, we estimate

the probability transition function as equation 4.

P(s

, s

) = αλ

children(s

)

)π

) (4)

In these expressions, λ

children(s

) stands

for the probability of happening of the son state

children(s

, a

) given the state s

, π

) stands for the

probability of happening s

, and α is a normalization

constant to maintain the sum of the total probability

equal to 1.

The probability transition function estimates the

shorter branching factor to reach a goal state. This

equation uses causal and diagnostic evidence. These

evidences can be estimated using cost equations of

reaching the goal state and branching.

c(g;s) :=



0 g ∈ s

min

a∈A(s)

[1 + c(s

prec

;s)] otherwise

(5)

The main costs used to update the probabil-

ity transition function are the maximum and total

cost. For the estimation of these costs, we used the

Bellman-Ford algorithm (Vector, 2003), equation 5,

to estimate the shorter length cost to reach a goal state

g ∈ S from state s ∈ S. In this expression, s

prec

stands

for the precondition states (or prior states).

For the estimation of the total cost of the branch-

ing factor, in equation 6, it is deﬁned as the sum of

the costs c(g

, s) of each goal state g

∈ G, where G

stands for the set of goal states G ∈ S.

total

(G;s) :=

∑

∈G

c(g

;s) (6)

The estimation of the maximum cost of branching

factor, in equation 7, is deﬁned as the maximum cost

c(g

, s) of each goal state g

∈ G.

max

(G;s) := max

∈G

[c(g

;s)] (7)

The estimation of λ

(¬s

), which is the probabil-

ity of not happening the state s

(in a shorter way)

given the goal state s

, is deﬁned as equation 8. We

divided the maximum cost of branching factor by the

total cost of branching factor, because it represents the

largest-search-cost possibility divided by all possibil-

ities.

(¬s

) =

max

)

total

)

(8)

The complement of λ

(¬s

) is λ

), which is

estimated as equation 9.

) = 1 − λ(¬s

) (9)

The estimation of π

(¬s

), which is the probabil-

ity of not happening state s

in a shorter way, is in

equation 10. In this expression, Children(s

) stands

for the son states of the state s

(¬s

) =

Children(s

)

(10)

The complement of π

(¬s

) is π

), which is

estimated in equation 11.

) = 1 −

Children(s

)

(11)

Both equations (9) and (11) are used to esti-

mate the probability of the shorter branching fac-

tor of a state space (modeled as Bayesian network).

Therefore, using our approach, we would expect that

shorter plans can be found compared with a heuris-

tic approach. We discuss below the results of our

approach in different domains compared with other

planner approaches.

5 EXPERIMENTS & RESULTS

We compared the performance of our approach

with other planner approaches for some well-known

benchmark problems (STRIPS like) of the AIPS-2002

competition set. We considered Blackbox, which is

the combination of both approaches Graphplan and

SAT; HSP, which uses the heuristic approach; Metric-

ff, which uses and improved the same ideas from HSP

approach; and LPG, which is a planning-graph ap-

proach (which is different to Graphplan algorithm, but

it used in this planner the same graph-search idea).

Furthermore, all the experiments of our probabilistic

approach where using a backward approach and the

Breadth-ﬁrst search algorithm.

For each benchmark, it was estimated for each

approach the mean percentage error (MPE) between

ResearchProposalinProbabilisticPlanningSearch

589

the plan-length results of such approach and the plan-

length minimum of all approaches in each instance.

5.1 Driverlog Benchmark

The Driverlog domain is a variation of Logistics

where the trucks need drivers, furthermore some paths

can only be traversed by truck, and others only on

foot.

Figure 3 depicts the plan-length curves (in loga-

rithmic scale with base 10) on Driverlog instances,

which their size scales from left to right.

Our approach, in these Driverlog instances, solved

most of the problems with less steps in the plan (so-

lution). Our approach dominates both approaches

LPG and HSP. Only in few instances both Blackbox

and Metric-ff (in speed version) obtained better plan-

length results.

In terms of mean percentage error (MPE), our ap-

proach has less error than the other approaches. In

comparison with the best results for each approach,

our approach has 0.050 of MPE, Blackbox has 0.060

of MPE, Metric-ff has 0.163, HSP has 0.297, and

LPG has 0.436 of MPE.

5.2 Zenotravel Benchmark

The Zenotravel domain is a transportation problem,

where objects must be transported by airplanes, and

each airplane has an associated fuel level.

Figure 4 depicts the plan-length curves (in loga-

rithmic scale with base 10) on Zenotravel instances,

which their size scales from left to right.

Metric-ff approach and our approach solved the

Zenotravel instances in less steps than the others, but

Metric-ff cannot perform one instance in this domain.

Furthermore, the Blackbox approach can give a so-

lution to any instance in this domain. Only in three

instances our approach has equal or better results in

comparison with Metric-ff approach.

In terms of mean percentage error (MPE), Metric-

ff and our approach have less error than the other ap-

Figure 3: Plan-length curves on Driverlog instances.

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

590

proaches. In comparison with the best results for each

approach, Metric-ff has 0.118 of MPE, our approach

has 0.224 of MPE, HSP has 0.443, LPG has 0.679 of

MPE, and Blackbox cannot perform any instance in

this domain.

5.3 Satellite Benchmark

In Satellite domain, a number of satellites must take

some observations using some instruments. The ac-

tions that a satellite can do are turning in the right

direction, switching the instruments on or off, cali-

brating the instruments, and taking images.

Figure 5 depicts the plan-length curves (in log-

arithmic scale with base 10) on Satellite instances,

which their size scales from left to right.

In terms of mean percentage error (MPE), Black-

box Metric-ff and our approach have less error than

the other approaches. In comparison with the best re-

sults for each approach, Blackbox has 0.057 of MPE,

Metric-ff has 0.092 of MPE, our approach has 0.269

of MPE, LPG has 0.289 of MPE, and HSP has 0.593.

5.4 Rovers Benchmark

In Rovers domain, a number of planetary rovers must

analyze some samples, and take images. The actions

of a rover include navigating, taking samples, cali-

brating the camera and taking images, and communi-

cating the data.

Figure 6 depicts the plan-length curves (in log-

arithmic scale with base 10) on Rovers instances,

which their size scales from left to right.

In this domain, Metric-ff, HSP and our approach

have a better performance than the other approaches.

Metric-ff and HSP weakly dominates our approach

having equal or better results.

In terms of mean percentage error (MPE), Metric-

ff, HSP and our approach have less error than the

other approaches. In comparison with the best results

for each approach, Metric-ff has 0.016 of MPE, HSP

has 0.067, our approach has 0.270 of MPE, Blackbox

has 0.347, and LPG has 0.357 of MPE.

Figure 4: Plan-length curves on Zenotravel instances.

ResearchProposalinProbabilisticPlanningSearch

591

5.5 Freecell Benchmark

The Freecell domain is a well-known domain of dis-

crete puzzles, where all cards are randomly divided

into a number of piles. The objective of the game is

to move all cards onto four different piles for each

suit, where the cards are arranged upwards from the

ace to the king.

Figure 7 depicts the plan-length curves (in log-

arithmic scale with base 10) on Freecell instances,

which their size scales from left to right.

In this domain, both LPG and HSP approaches do

not have a solution for all tested instances, and it is

not clear which approach is best.

In terms of mean percentage error (MPE), Metric-

ff and our approach have less error than the other ap-

proaches. In comparison with the best results for each

approach, Metric-ff has 0.083 of MPE, our approach

has 0.166 of MPE, HSP has 0.38 of MPE, LPG has

0.517 of MPE, and Blackbox has 0.526.

5.6 Overall Results

Metric-ff and our approach have the best performance

in all the domains tested. Therefore, they have less

mean percentage error in plan-length results in gen-

eral.

Our probabilistic approach has better results than

Blackbox (which is the combination of Graphplan and

Sat approach), LPG (planning-graph approach), HSP

(heuristic-search approach), but our approach cannot

have overall better results in comparison with Metric-

ff (a heuristic approach that uses a relaxed Graphplan

algorithm as a heuristic and a novel Enforced Hill

Climbing search algorithm).

Our approach, as the other approaches, has difﬁ-

culties to work with complex problems (i.e. domains

with many actions, predicates, objects and goals). In

our experiments, the complexity level of each well-

known benchmark is increased, where we can see that

the level of complexity depends on the benchmark

Figure 5: Plan-length curves on Satellite instances.

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

592

used.

6 CONCLUSIONS

We have presented and modeled a novel probabilis-

tic planning search approach. This probabilistic have

good results in comparison with the other approaches.

The most important contribution is the “probabilistic

search approach” idea.

We extended the Bayesian network idea to state

space. The state space was modeled as a father-

children network structure. Using this network we

could make inferences using our Bayesian inference

equations.

Furthermore, we deﬁned explicitly probabilistic

functions in order to deﬁne the Bayesian inference

equations, which are based on well-known rules as

both Bayes’ rule and conditional independence.

In our experiments with different benchmarks we

found that our approach have good results in terms of

plan length.

Our future work focus on the search algorithm

that works better with our Bayesian inferences in or-

der to get optimal results, which means to have plans

with less length in less time.

The benchmarks have different variables (states),

which have an inﬂuence on data distribution. Indeed,

this affects our Bayesian inferences, in order to have

better results in our future work we will study the data

distribution of different well-known benchmark and

we will determine the Bayesian inference equations

that work best with.

ACKNOWLEDGEMENTS

This research was supported by ITESM under the Re-

search Chair CAT-100.

Figure 6: Plan-length curves on Rovers instances.

ResearchProposalinProbabilisticPlanningSearch

593

Figure 7: Plan-length curves on Freecell instances.

REFERENCES

Blum, A. L. and Furst, M. L. (1997). Fast planning

through planning graph analysis. Artiﬁcial intelli-

gence, 90(1):281–300.

Bonet, B. and Geffner, H. (2001). Planning as heuristic

search. Artiﬁcial Intelligence, 129(1):5–33.

Bonet, B. and Geffner, H. (2011). mgpt: A probabilis-

tic planner based on heuristic search. arXiv preprint

arXiv:1109.2153.

Ernst, M. D., Millstein, T. D., and Weld, D. S. (1997). Auto-

matic sat-compilation of planning problems. In IJCAI,

volume 97, pages 1169–1176. Citeseer.

Fikes, R. E. and Nilsson, N. J. (1994). Strips, a retrospec-

tive. Artiﬁcial intelligence in perspective, 227.

Fox, M. and Long, D. (2003). Pddl2. 1: An extension to

pddl for expressing temporal planning domains. J. Ar-

tif. Intell. Res.(JAIR), 20:61–124.

Fox, M. and Long, D. (2011). Efﬁcient implementa-

tion of the plan graph in stan. arXiv preprint

arXiv:1105.5457.

Geffner, H. (2011). Planning with incomplete information.

In Model Checking and Artiﬁcial Intelligence, pages

1–11. Springer.

Genesereth, M. and Nourbakhsh, I. (1993). Time-saving

tips for problem solving with incomplete information.

In Proceedings of the 11th National Conference on Ar-

tiﬁcial Intelligence.

Gerevini, A., Saetti, A., and Serina, I. (2003). Planning

through stochastic local search and temporal action

graphs in lpg. J. Artif. Intell. Res.(JAIR), 20:239–290.

Hoffmann, J. and Nebel, B. (2011). The ff planning system:

Fast plan generation through heuristic search. arXiv

preprint arXiv:1106.0675.

Kautz, H. (2004). Satplan04: Planning as satisﬁability.

Working Notes on the Fourth International Planning

Competition (IPC-2004), pages 44–45.

Kautz, H. and Selman, B. (1999). Unifying sat-based and

graph-based planning. In IJCAI, volume 99, pages

318–325.

Kautz, H. A., Selman, B., et al. (1992). Planning as satisﬁ-

ability. In ECAI, volume 92, pages 359–363.

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

594

Koehler, J. (1999). Handling of conditional effects and neg-

ative goals in ipp.

Morawski, P. (1989). Understanding bayesian belief net-

works. AI Expert, 4(5):44–48.

Nilsson, N. J. (2010). The quest for artiﬁcial intelligence.

Cambridge University Press Cambridge.

O’Hagan, A., Forster, J., and Kendall, M. G. (2004).

Bayesian inference. Arnold London.

Russell, S. (2009). Artiﬁcial intelligence: A modern ap-

proach.

Vector, D. (2003). Routing. Perkins, E. Royer.

Weld, D. S., Anderson, C. R., and Smith, D. E. (1998). Ex-

tending graphplan to handle uncertainty & sensing ac-

tions. In Aaai/iaai, pages 897–904.

ResearchProposalinProbabilisticPlanningSearch

595