ENHANCING LOCAL-SEARCH BASED SAT SOLVERS WITH

LEARNING CAPABILITY

Ole-Christoffer Granmo

University of Agder, Grimstad, Norway

Noureddine Bouhmala

Vestfold University College, Vestfold, Norway

Keywords:

Satisﬁability problem, GSAT, Learning automata, Combinatorial optimization, Stochastic learning.

Abstract:

The Satisﬁability (SAT) problem is a widely studied combinatorial optimization problem with numerous ap-

plications, including time tabling, frequency assignment, and register allocation. Among the simplest and

most effective algorithms for solving SAT problems are stochastic local-search based algorithms that mix

greedy hill-climbing (exploitation) with random non-greedy steps (exploration). This paper demonstrates how

the greedy and random components of the well-known GSAT Random Walk (GSATRW) algorithm can be

enhanced with Learning Automata (LA) based stochastic learning. The LA enhancements are designed so

that the actions that the LA chose initially mimic the behavior of GSATRW. However, as the LA explicitly

interact with the SAT problem at hand, they learn the effect of the actions that are chosen, which allows

the LA to gradually and dynamically shift from random exploration to goal-directed exploitation. Random-

ized and structured problems from various domains, including SAT-encoded Logistics Problems, and Block

World Planning Problems, demonstrate that our LA enhancements signiﬁcantly improve the performance of

GSATRW, thus laying the foundation for novel LA-based SAT solvers.

1 INTRODUCTION

The conﬂict between exploration and exploitation is a

well-known problem in machine learning and other

areas of artiﬁcial intelligence. Learning Automata

(LA) (Tsetlin, 1973; Thathachar and Sastry, 2004)

capture the essence of this conﬂict, and have thus

occupied researchers for over forty years. Initially,

LA were used to model biological systems, however,

in the last decades they have also attracted consider-

able interest because they can learn the optimal action

when operating in unknown stochastic environments.

Also, they combine rapid and accurate convergence

with low computational complexity.

Recent applications of LA include allocation of

polling resources in web monitoring (Granmo et al.,

2007), allocation of limited sampling resources in bi-

nomial estimation (Granmo et al., 2007), and opti-

mization of throughput in MPLS trafﬁc engineering

(Oommen et al., 2007).

LA solutions have furthermore found application

within combinatorial optimization. In (Oommen and

Ma, 1988; Gale et al., 1990) a so-called Object Mi-

gration Automaton is used for solving the classical

equipartitioning problem. An order of magnitude

faster convergence is reported compared to the best

known algorithms at that time. A similar approach

has also been discovered for the Graph Partitioning

Problem(Oommen and Croix, 1996). Furthermore,

the list organization problem has successfully been

addressed by LA schemes, which have been found

to converge to the optimal arrangement with proba-

bility arbitrary close to unity (Oommen and Hansen,

1987). Finally, in (Granmo and Bouhmala, 2007) we

signiﬁcantly improved the performance of traditional

Random Walk (RW) for solving Satisﬁability (SAT)

problems, by embedding RW into the LA framework.

Inspired by the success of the above solution

schemes, we will in this paper propose a novelscheme

for solving SAT problems based on enhancing the

classical GSAT Random Walk (GSATRW) algorithm

with LA based learning capability.

515

Granmo O. and Bouhmala N. (2010).

ENHANCING LOCAL-SEARCH BASED SAT SOLVERS WITH LEARNING CAPABILITY.

In Proceedings of the 2nd International Conference on Agents and Artiﬁcial Intelligence - Artiﬁcial Intelligence, pages 515-521

DOI: 10.5220/0002707505150521

 SciTePress

1.1 The Satisﬁability (SAT) Problem

The SAT problem was among the ﬁrst problems

shown to be NP complete and involves determining

whether an expression in propositional logic is true

in some model (Cook, 1971). Thus, solving SAT

problems efﬁciently is crucial for inference in propo-

sitional logic. Further, other NP complete problems,

such as constraint satisfaction and graph coloring, can

be encoded as SAT problems. Indeed, a large number

of problems that occur in knowledge-representation,

learning, VLSI-design, and other areas of artiﬁcial

intelligence, are essentially SAT problems. Accord-

ingly, improving the efﬁciency and accuracy of SAT

solvers may beneﬁt all of these areas.

Most SAT solvers use the Conjunctive Normal

Form (CNF) representation for propositionallogic ex-

pressions. In CNF, an expression is represented as

a conjunction of clauses, with each clause being a

disjunction of literals, and a literal being a Boolean

variable or its negation. For example, the expression

P∨

Q consists of one single clause, containing the two

literals P and

Q. The literal P is simply a Boolean

variable, while

Q denotes the negation of the Boolean

variable Q.

More formally, a SAT problem can be speciﬁed

as follows. A propositional expression Φ =

j=1

with m clauses and n Boolean variables is given.

Each Boolean variable, x

, i ∈ {1, . . . , n}, takes one of

the two values, True or False. Each clause C

, j ∈

{1, . . . , m}, in turn, is a disjunction of Boolean vari-

ables and has the form:





k∈I





∨





l∈

¯x





where I

⊆ {1, .....n}, I ∩

0, and ¯x

denotes the

negation of x

The task is to determine whether there exists an as-

signment of truth values to the variables under which

Φ evaluates to True. Such an assignment, if it exists,

is called a satisfying assignment for Φ, and Φ is called

satisﬁable. Otherwise, Φ is said to be unsatisﬁable.

E.g., according to propositional logic, the expression

P∨

Q becomes True for any truth assignment where

P is True or Q is False. Accordingly, the expression

P∨

Q is satisﬁable.

Note that since we have two choices for each of

the n Boolean variables involved in a propositional

expression Φ, the size of the search space S becomes

|S| = 2

. That is, the size of the search space grows

exponentially with the number of variables.

1.2 Paper Organization

Among the simplest and most effective algorithms

for solving SAT problems are stochastic local-search

based algorithms that mix greedy hill-climbing (ex-

ploitation) with random non-greedy steps (explo-

ration). This paper proposes how the greedy and ran-

dom components of such local-search algorithms can

be enhanced with LA-based stochastic learning.

In Section 2, we propose how the well-known

GSATRW (Selman et al., 1994) scheme can be en-

hanced with learning capability using the basic con-

cepts of LA. Then, in Section 3, we report the results

obtained from testing the resulting new LA-based ap-

proach on an extensive test suit of problem instances.

Finally, in Section 4 we present a summary of our

work and provide ideas for further research.

2 SOLVING SAT PROBLEMS

USING LEARNING AUTOMATA

This section demonstrates how the greedy and ran-

dom components of SLS based algorithms can be en-

hanced with LA-based stochastic learning. We start

by deﬁning the basic building block of our scheme —

the Learning SAT Automaton — before we propose

how several such LA can form a game designed to

solve SAT problems.

2.1 A Learning SAT Automaton

Generally stated, a learning automaton performs a

sequence of actions on an environment. The envi-

ronment can be seen as a generic unknown medium

that responds to each action with some sort of reward

or penalty, perhaps stochastically. Based on the re-

sponses from the environment, the aim of the learn-

ing automaton is to ﬁnd the action that minimizes the

expected number of penalties received. Figure 1 illus-

trates the interaction between the learning automaton

and the environment. Because we treat the environ-

ment as unknown, we will here only consider the def-

inition of the learning automaton.

A learning automaton can be deﬁned in terms of a

quintuple (Narendra and Thathachar, 1989):

{Φ, α, β, F (·, ·), G (·, ·)}.

Φ = {φ

, φ

, . . . , φ

} is the set of internal automaton

states, α = {α

, α

, . . . , α

} is the set of automaton

actions, and, β = {β

, β

, . . . , β

} is the set of inputs

that can be given to the automaton. An output func-

tion α

= G [φ

] determines the next action performed

by the automaton given the current automaton state.

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

516

}

{

,...

...

{

}

{

...

}

t+1

(

←

)

(

←

)

Action

Response

Automaton

Environment

Figure 1: A learning automaton interacting with an environ-

ment.

Finally, a transition function φ

t+1

= F [φ

, β

] deter-

mines the new automaton state from:

1. The current automaton state.

2. The response of the environment to the action per-

formed by the automaton.

Based on the above generic framework, the crucial is-

sue is to design automata that can learn the optimal ac-

tion when interacting with the environment. Several

designs have been proposed in the literature, and the

reader is referred to (Narendra and Thathachar, 1989;

Thathachar and Sastry, 2004) for an extensive treat-

ment.

We here target SAT problems, and our goal is to

design a team of Learning Automata that seeks their

solution. To achieve this goal, we build upon the

work of Tsetlin and the linear two-action automaton

(Tsetlin, 1973; Narendra and Thathachar,1989) as de-

scribed in the following.

First of all, for each literal in the SAT problem that

is to be solved, we construct an automaton with

• States: Φ = {−N − 1, −N, . . . , −1, 0, . . . , N −

2, N}.

• Actions: α = {True, False}.

• Inputs: β = {reward, penalty}.

Figure 2 speciﬁes the G and F matrices. The G ma-

True

False

−N

−(N−1)

−1

N−1

N−2

......

Reward

Penalty

Figure 2: The state transitions and action selection of the

Learning SAT Automaton.

trix can be summarized as follows. If the automaton

state is positive, then action True will be chosen by

the automaton. If on the other hand the state is nega-

tive, then action False will be chosen. Note that since

we initially do not know which action is optimal, we

set the initial state of the Learning SAT Automaton

randomly to either ’-1’ or ’0’.

The state transition matrix F determines how

learning proceeds. As seen in the graph representa-

tion of F found in the ﬁgure, providing a reward in-

put to the automaton strengthens the currently cho-

sen action, essentially by making it less likely that the

other action will be chosen in the future. Correspond-

ingly, a penalty input weakens the currently selected

action by making it more likely that the other action

will be chosen later on. In other words, the automaton

attempts to incorporate past responses when deciding

on a sequence of actions.

2.2 Learning Automata GSATRW

In addition to the deﬁnition of the LA, we must deﬁne

the environment that the LA interacts with. Simply

put, the environment is a SAT problem as deﬁned in

Section 1. Each variable of the SAT problem is as-

signed a dedicated LA, resulting in a team of LA. The

task of each LA is to determine the truth value of its

corresponding variable, with the aim of satisfying all

of the clauses where that variable appears. In other

words, if each automaton reaches its own goal, then

the overall SAT problem at hand has also been solved.

With the above perspective in mind, we now

present the details of the LA-GSATRW algorithm that

we propose. Figure 3 contains the complete pseudo-

code for solving SAT problems, using a team of LA.

As seen from the ﬁgure, LA-GSATRW alternates

between selecting unsatisﬁed and satisﬁed clauses.

When selecting an unsatisﬁed clause, an ordinary

GSATRW strategy is used to penalize the LA when

they “disagree” with GSATRW, i.e., when GSATRW

and the LA suggest opposite truth values. Conversely,

when selecting a satisﬁed clause, we use an “inverse”

GSATRW strategy for rewarding the LA, this time

when they agree with the “inverse” GSATRW. Note

that as a result, the assignment of truth values to vari-

ables is indirect, governed by the states of the LA.

To summarize, at the core of the LA-GSATRW algo-

rithm is a punishment/rewarding scheme that guides

the team of LA towards the optimal assignment, in

the spirit of the underlying GSATRW strategy.

2.3 Comments to LA-GSATRW

Like a two-action Tsetlin Automaton, our proposed

LA seeks to minimize the expected number of penal-

ties it receives. In other words, it seeks ﬁnding the

truth assignment that minimizes the number of unsat-

isﬁed clauses among the clauses where its variable ap-

ENHANCING LOCAL-SEARCH BASED SAT SOLVERS WITH LEARNING CAPABILITY

517

Procedure learning automata gsat random walk()

Input : A set of clauses C ; Walk probability p ;

Output : A satisfying truth assignment of the clauses, if found;

Begin

/* Initialization */

For i := 1 To n Do

/* The initial state of each automaton is set to either ’-1’ or ’1’ */

state[i] = random element({−1, 0});

/* And the respective literals are assigned corresponding truth values */

If state[i] == -1 Then x

= False Else x

= True;

/* Main loop */

While Not stop(C ) Do

If rnd(0, 1) ≤ p Then

/* Draw unsatisﬁed clause randomly */

= random unsatisﬁed clause(C );

/* Draw clause literal randomly */

i = random element(I

∪

);

Else

/* Randomly select one of the literals whose ﬂipping minimizes

the number of unsatisﬁed clauses */

i = random element(Best Literal Candidates(C ));

/* The corresponding automaton is penalized for choosing the “wrong” action */

If i ∈ I

And state[i] < N − 1 Then

state[i]++;

/* Flip literal when automaton changes its action */

If state[i] == 0 Then

ﬂip(x

);

Else If i ∈

And state[i] > −N Then

state[i]−−;

/* Flip literal when automaton changes its action */

If state[i] == -1 Then

ﬂip(x

);

If rnd(0, 1) ≤ p Then

/* Draw satisﬁed clause randomly */

= random satisﬁed clause(C );

/* Draw clause literal randomly */

i = random element(I

∪

);

Else

/* Randomly select one of the literals whose ﬂipping maximizes

the number of unsatisﬁed clauses */

i = random element(Worst Literal Candidates(C ));

/* Reward corresponding automaton if it */

/* contributes to the satisfaction of the clause */

If i ∈ I

And state[i] ≥ 0 And state[i] < N − 1 Then

state[i]++;

Else If i ∈

And state[i] < 0 And state[i] > −N Then

state[i]−−;

EndWhile

End

Figure 3: Learning Automata GSAT Random Walk Algorithm.

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

518

pears.

Note that because multiple variables, and thereby

multiple LA, may be involved in each clause, we are

dealing with a game of LA (Narendra and Thathachar,

1989). That is, multiple LA interact with the same

environment, and the response of the environment de-

pends on the actions of several LA. In fact, because

there may be conﬂicting goals among the LA involved

in the LA-GSATRW, the resulting game is competi-

tive. The convergence properties of general competi-

tive games of LA have not yet been successfully an-

alyzed, however, results exists for certain classes of

games, such as the Prisoner’s Dilemma game (Naren-

dra and Thathachar, 1989).

In our case, the LA involved in LA-GSATRW are

non-absorbing, i.e., every state can be reached from

every other state with positiveprobability. This means

that the probability of reaching the solution of the

SAT problem at hand is equal to 1 when running the

game inﬁnitely. Also note that the solution of the

SAT problem corresponds to a Nash equilibrium of

the game.

In order to maximize speed of learning, we initial-

ize each LA randomly to either the state ’-1’ or ’0’. In

this initial conﬁguration, the variables will be ﬂipped

relatively quickly because only a single state transi-

tion is necessary for a ﬂip. Accordingly, the joint state

space of the LA is quickly explored in this conﬁgura-

tion. Indeed, in this initial conﬁguration the algorithm

mimics its respective non-learning counterpart. How-

ever, as learning proceeds and the LA move towards

their boundary states, i.e., states ’-N’ and ’N-1’, the

ﬂipping of variables calms down. Accordingly, the

search for a solution to the SAT problem at hand be-

comes increasingly focused.

3 EMPIRICAL RESULTS

We here compare LA-GSATRW with its non-

learning counterpart — the GSAT with Random Walk

(GSATRW) scheme. A main purpose of this compar-

ison is to study the effect of the introduced stochas-

tic learning. The benchmark problems we used

to achieve this contain both randomized and struc-

tured problems from various domains, including SAT-

encoded Bounded Model Checking Problems, Graph

Coloring Problems, Logistics Problems, and Block

World Planning Problems. Due to the random nature

of the algorithms, when comparing LA-GSATRW

with GSATRW, we run the algorithms 100 times, with

each run being stopped after 10

ﬂips.

3.1 Run-Length-Distributions (RLD)

As an indicator of the behavior of the algorithm on

a single instance, we choose the median cost when

trying to solve a given instance in 100 trials, and

using an extremely high cutoff parameter setting of

Maxsteps = 10

in order to obtain a maximal number

of successful tries. To get an idea of the variability of

the search cost, we analyzed the cumulative distribu-

tion of the number of search ﬂips needed by both LA-

GSATRW and GSATRW for solving single instances.

For practical reasons we restrict our presentation here

to the instances corresponding to small, medium, and

large sizes from the underlying test-set.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 500000 1e+06 1.5e+06 2e+06 2.5e+06

Fraction Solved

#Flips

LA GSAT w/Random Walk (N=2)

GSAT w/Random Walk

Figure 4: Cumulative distributions for a 600-variable ran-

dom problem with 2550 clauses (f600).

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06

Fraction Solved

#Flips

LA GSAT w/Random Walk (N=2)

GSAT w/Random Walk

Figure 5: Cumulative distribution for a 1000-variable ran-

dom problem with 4250 clauses (f1000).

Figures 4, 5, and 6 show RLDs obtained by apply-

ing LA-GSATRW and GSATRW to individual large

random problems. As can be seen from the three

plots, we observe that both algorithms reach a success

rate of 100% for f600 and f1000. However, on the

large problem f2000, GSATRW shows a low asymp-

totic solution probability corresponding to 0.37 com-

pared to 0.45 for LA-GSATRW. Note also, that there

ENHANCING LOCAL-SEARCH BASED SAT SOLVERS WITH LEARNING CAPABILITY

519

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06 1e+07

Fraction Solved

#Flips

LA GSAT w/Random Walk (N=2)

GSAT w/Random Walk

Figure 6: Cumulative distributions for a 2000-variables ran-

dom problem with 8500 clauses (f2000).

is a substantial part of trials that are dramatically hard

to solve, which explains the large variability in the

length of the different runs of the two algorithms.

Both algorithms show the existence of an ini-

tial phase below which the probability for ﬁnding a

solution is 0. Both methods start the search from

a randomly chosen assignment which typically vio-

lates many clauses. Consequently, both methods need

some time to reach the ﬁrst local optimum which pos-

sibly could be a feasible solution. The two algorithms

show no cross-over in their corresponding RLDs even

though it is somewhat hard to see for f600, but it be-

comes more pronounced for f1000 and f2000. The

median search cost for LA-GSATRW is 3%, 29%, and

17% of that of GSATRW for f600,f1000and f2000 re-

spectively. The three plots provides evidence for the

superiority of LA-GSATRW compared to GSATRW

as it gives consistently higher success probabilities

while requiring fewer search steps than GSATRW.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06 3.5e+06 4e+06 4.5e+06

Fraction Solved

#Flips

LA GSAT w/Random Walk (N=2)

GSAT w/Random Walk

Figure 7: Cumulative distributions for a 228-variable logis-

tics problem with 6718 clauses (logistics.a).

Figures 7, 8, 9, and 10 contains similar plots for

SAT-encoded logistics problems. However, in this

case it is difﬁcult to claim a clear winner among the

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06 3.5e+06 4e+06 4.5e+06

Fraction Solved

#Flips

LA GSAT w/Random Walk (N=2)

GSAT w/Random Walk

Figure 8: Cumulative distribution for a 843-variable logis-

tics problem with 7301 clauses (logistics.b).

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06 3.5e+06 4e+06 4.5e+06 5e+06

Fraction Solved

#Flips

LA GSAT w/Random Walk (N=2)

GSAT w/Random Walk

Figure 9: Cumulative distributions for a 1141-variable lo-

gistics problem with 10719 clauses (logistics.c).

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 200000 400000 600000 800000 1e+06 1.2e+06 1.4e+06 1.6e+06 1.8e+06

Fraction Solved

#Flips

LA GSAT w/Random Walk (N=2)

GSAT w/Random Walk

Figure 10: (Left) Cumulative distribution for a 4713-

variable logistics problem with 21991 clauses (logistics.d).

algorithms. The number of search steps varies be-

tween the different trials and is signiﬁcantly higher

with GSATRW than that of LA-GSATRW. However,

note that the median search cost for LA-GSATRW

is 4%, 29%, 34% and 51% of that of GSATRW for

Logistics-d, Logistics-b, Logistics-c, and Logistics-a.

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

520

3.2 Wilcoxon Rank-Sum Test

Table 1 summarizes Wilcoxon Rank-Sum Test results

for a number of SAT encoded problems. It table re-

Table 1: Success rate (SR) and Wilcoxon statistical test.

Problem LA GSAT P value Null-H.

f600 53% 47% 0.19 Accept

f1000 62% 37% 0.00 Reject

f2000 32% 14% 0.00 Reject

logistic-a 74% 26% 0.00 Reject

logistic-b 54% 46% 0.09 Accept

logistic-c 59% 41% 0.02 Reject

logistic-d 54% 46% 0.29 Accept

bw-medium 36% 64% 0.02 Reject

bw-large-a 49% 51% 0.52 Accept

bw-huge 50% 50% 0.91 Accept

bw-large-b 53% 47% 0.82 Accept

bmc-ibm2 39% 61% 0.01 Reject

bmc-ibm3 52% 44% 0.18 Accept

bmc-ibm6 51% 49% 0.98 Accept

veals two pertinent observations. Firstly, the success

rate of LA-GSATRW was better in 10 problems and

this difference in the median search cost was sig-

niﬁcant in 6 of the problems. On the other hand,

GSATRW gave better results in 2 problems in terms

of success rate, but this performance difference was

signiﬁcant in only one case.

4 CONCLUSIONS AND FURTHER

WORK

In this work, we have introduced a new approach

based on combining Learning Automata and GSAT

w/Random Walk. The success rate of LA-GSATRW

was better in 10 of the benchmark problems used, and

the difference in the median search cost was signif-

icantly better for 6 of the problems. GSASTRW, on

the other hand, gave better results in 2 of the problems

in terms of success rate, while its performance was

signiﬁcantly better for only one of these problems.

Based on the empirical results, it can be seen

that the Learning Automata mechanism employed in

LA-GSATRW offers an efﬁcient way to escape from

highly attractive areas in the search space, leading to

a higher probability of success as well as reducing the

number of local search steps to ﬁnd a solution.

As further work, it is of interest to study how

Learning Automata can be used to enhance other

Stochastic Local Search based algorithms. Further-

more, more recent classes of Learning Automata,

such as the Bayesian Learning Automata family

(Granmo, 2009) may offer improved performance in

LA based SAT solvers.

REFERENCES

Cook, S. (1971). The complexity of theorem-proving pro-

cedures. In Proceedings of the Third ACM Symposuim

on Theory of Computing, pages 151–158.

Gale, W., S.Das, and Yu, C. (1990). Improvements to an

Algorithm for Equipartitioning. IEEE Transactions

on Computers, 39:706–710.

Granmo, O.-C. (2009). Solving Two-Armed Bernoulli Ban-

dit Problems Using a Bayesian Learning Automaton.

To Appear in International Journal of Intelligent Com-

puting and Cybernetics (IJICC).

Granmo, O.-C. and Bouhmala, N. (2007). Solving the Sat-

isﬁability Problem Using Finite Learning Automata.

International Journal of Computer Science and Ap-

plications, 4:15–29.

Granmo, O.-C., Oommen, B. J., Myrer, S. A., and Olsen,

M. G. (2007). Learning Automata-based Solutions

to the Nonlinear Fractional Knapsack Problem with

Applications to Optimal Resource Allocation. IEEE

Transactions on Systems, Man, and Cybernetics, Part

B, 37(1):166–175.

Narendra, K. S. and Thathachar, M. A. L. (1989). Learning

Automata: An Introduction. Prentice Hall.

Oommen, B. J. and Croix, E. V. S. (1996). Graph partition-

ing using learning automata. IEEE Transactions on

Computers, 45(2):195–208.

Oommen, B. J. and Hansen, E. R. (1987). List organizing

strategies using stochastic move-to-front and stochas-

tic move-to-rear operations. SIAM Journal on Com-

puting, 16:705–716.

Oommen, B. J. and Ma, D. C. Y. (1988). Deterministic

learning automata solutions to the equipartitioning pr

oblem. IEEE Transactions on Computers, 37(1):2–13.

Oommen, B. J., Misra, S., and Granmo, O.-C. (2007).

Routing Bandwidth Guaranteed Paths in MPLS Traf-

ﬁc Engineering: A Multiple Race Track Learning Ap-

proach. IEEE Transactions on Computers, 56(7):959–

976.

Selman, B., Kautz, H. A., and Cohen, B. (1994). Noise

Strategies for Improving Local Search. In Proceed-

ings of AAAI’94, pages 337–343. MIT Press.

Thathachar, M. A. L. and Sastry, P. S. (2004). Networks of

Learning Automata: Techniques for Online Stochastic

Optimization. Kluwer Academic Publishers.

Tsetlin, M. L. (1973). Automaton Theory and Modeling of

Biological Systems. Academic Press.

ENHANCING LOCAL-SEARCH BASED SAT SOLVERS WITH LEARNING CAPABILITY

521