Extracting Frequent Gradual Patterns Based on SAT

Jerry Lonlac

1 a

, Imen Ouled Dlala

2 b

, Said Jabbour

3 c

, Engelbert Mephu Nguifo

4 d

Badran Raddaoui

5 e

and Lakhdar Sais

3 f

Centre de Recherche, IMT Lille Douai, Universit

e Lille, Lens, France

ole Universitaire L

eonard de Vinci, Research Center, Paris La D

efense, France

CRIL, University of Artois & CNRS, Lens, France

Univ. Clermont Auvergne, CNRS, LIMOS, F-63000 Clermont-Ferrand, France

SAMOVAR, T

ecom SudParis, Institut Polytechnique de Paris, France

Keywords:

Data Mining, Gradual Patterns, Propositional Satisﬁability.

Abstract:

This paper proposes a constraint-based modeling approach for mining frequent gradual patterns from nu-

merical data. Our declarative approach provides a principle way to take advantage of recent advancements in

satisﬁability testing and several features of modern SAT solvers to enumerating gradual patterns. Interestingly,

our approach can easily be extended with extra requirements, such as temporal constraints used to extract more

speciﬁc patterns in a broad range of gradual patterns mining applications. An empirical evaluation on two real-

word datasets shows the efﬁciency of our approach.

1 INTRODUCTION

Nowadays, numerical data, also coined quantitative

data, are abundant due to the proliferation of measur-

ing and data collection devices and sensors. These

data can be obtained from a range of practical ar-

eas, including environment & ecology, humanities &

social science, trade & ﬁnance, and life sciences &

astronomy. Typically, machine learning, statistical,

and logical data analysis techniques are employed

for analyzing these large amount of numerical data.

Nevertheless, only a small number of pattern min-

ing techniques, which typically studied categorical

data, are designed to handle numerical data Among

these approaches, we can cite itemset/association rule

mining (Ramakrishnan and Rakesh, 1996; Aumann

and Lindell, 1999; Salleb-Aouissi et al., 2007), in-

terval patterns using formal concept analysis (Kay-

toue et al., 2011), rank-correlated sets of numerical

attributes mining (Calders et al., 2006), and gradual

pattern mining (Di-Jorio et al., 2009). in particu-

lar, gradual itemsets mining aims at discovering fre-

quent co-variations between attributes, such as “the

https://orcid.org/0000-0003-3278-9969

https://orcid.org/0000-0001-6928-4599

https://orcid.org/0000-0002-8389-8332

https://orcid.org/0000-0001-9119-678X

https://orcid.org/0000-0003-4712-0811

https://orcid.org/0000-0003-2879-8627

higher the age, the higher the salary, and the lower

the free time”. Gradual patterns are very expres-

sive as they represent the variability of numerical val-

ues, often sought in numerous application domains,

including for instance biology where most advances

are done by analyzing genome data, medicine where

researchers in psychology focus on correlations be-

tween memory and feeling points from the “Diagnos-

tic manual of mental disorders”, and also in ﬁnancial

markets for discovering covariation between various

economic and ﬁnancial indicators. More generally,

discovering gradual patterns is relevant in all numer-

ical data where one needs to recover the relationship

between attributes in terms of variability (e.g. (Ngo

et al., 2018; Aryadinata et al., 2014; Fan and Xiao,

2017)).

Numerous proposals have been introduced to

tackle the problem of gradual itemset mining or its

variants, e.g., (H

ullermeier, 2002; Berzal et al., 2007;

Masseglia et al., 2008; Di-Jorio et al., 2008; Di-Jorio

et al., 2009; Do et al., 2010; Laurent et al., 2010;

Oudni et al., 2013; N

egrevergne et al., 2014; Do et al.,

2015; Lonlac et al., 2018). For most of these methods,

an extracted gradual pattern is provided with a unique

support or sub-sequence of transactions, called exten-

sion, supporting it. Such information when provided,

allows us to explain to the user why such itemset is

gradual. Recently, in (Ngo et al., 2018), the authors

addressed the problem of mining spatial gradual pat-

136

Lonlac, J., Dlala, I., Jabbour, S., Nguifo, E., Raddaoui, B. and Sais, L.

Extracting Frequent Gradual Patterns Based on SAT.

DOI: 10.5220/0012126000003541

In Proceedings of the 12th International Conference on Data Science, Technology and Applications (DATA 2023), pages 136-143

ISBN: 978-989-758-664-4; ISSN: 2184-285X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

terns with an application to the measurement of po-

tentially avoidable hospitalizations. In this context,

they have shown that the analysis of the different se-

quences of objects associated to the gradual patterns

could reveal new relevant knowledge to the expert.

Let us mention that the sequence of objects verify-

ing a gradual pattern is not unique, to the best of our

knowledge, all the mentioned approaches do not pro-

vide all the possible extensions of the gradual item-

set. Providing an exhaustive set of extensions asso-

ciated to a given gradual itemset could be beneﬁcial

and would improve the robustness and the precision

of the information characterizing the pattern. More-

over, in (Di-Jorio et al., 2009), the authors presented

a ﬁrst effective algorithm for the discovery of gradual

itemsets and gradual rules that can handle databases

with hundreds of attributes (contrary to the algorithm

introduced in (Berzal et al., 2007) which is limited to

only six attributes).

One of the primary issues with mining gradual

pattern approaches is the exponential combination

space to be explored, coupled with the challenge of

the huge number of patterns, which can also be of ex-

ponential size. This combinatorial explosion is tack-

led in (Laurent et al., 2009; Ayouni et al., 2010;

Do et al., 2015). In fact, in (Laurent et al., 2009),

the authors proposed a meethod for extracting grad-

ual patterns from large datasets which takes advan-

tage of a binary representation of lattice structure. In

addition, other work (e.g., (Ayouni et al., 2010; Do

et al., 2015)) retrieved only closed frequent gradual

patterns in order to reduce the total number of ex-

tracted patterns. In this paper we propose a new ap-

proach for extracting all frequent closed gradual pat-

terns in a numerical dataset. Our approach differs

from all the previous specialized approaches. It fol-

lows the SAT-based framework proposed in (Jabbour

et al., 2013) for mining frequent closed itemsets. This

new framework offers a declarative and ﬂexible repre-

sentation model. In fact, specialized approaches often

require fresh implementations to accommodate novel

constraints, whereas such constraints can be easily

integrated within a Boolean satisﬁability framework.

This enables data mining problems to take advantage

of multiple generic and efﬁcient SAT solving tech-

niques. We propose in this paper to heavily exploit

the declarative language (SAT) and these associated

efﬁcient and generic solving techniques. First, For

that purpose, we present a new SAT-based model to

ﬁnd frequent gradual patterns that includes different

types of constraints. Then, we provide a boolean

formulation of the closeness constraints in order to

search for frequent closed gradual patterns. Differ-

ent experiments carried out over real datasets are pre-

sented to show the feasibility of the new SAT-based

approach. The paper is organized as follows: in Sec-

tion 2.1, we present the problem of mining gradual

itemsets from numerical databases and some efﬁcient

algorithms that have been proposed in the literature.

We also recall the problem of Boolean satisﬁability.

Section 4 describes the SAT-based enumeration pro-

cedure to deal with the problem of enumerating the

set of all models of a CNF formula. In Section 3, we

present our SAT encoding of frequent gradual itemset

mining problem and show through an example how it

can be applied to ﬁnd frequent gradual itemsets from a

numerical dataset. Finally, section 5 presents detailed

experiments carried out over real datasets, showing

the applicability and the interest of our approach.

2 PRELIMINARIES

In this section, we formally describe the problem of

mining frequent gradual itemsets (or patterns) in nu-

merical datasets as well as the propositional satisﬁa-

bility problem.

2.1 Gradual Itemsets Mining Problem

The problem of mining gradual patterns consists in re-

trieving attribute co-variations in numerical dataset of

the form ”The more/less X,. . ., the more/less Y”. We

assume herein that we are given a dataset ∆ containing

a set of objects T deﬁning a relation on an attribute

set I with numerical values, i.e., for t ∈ T and i ∈ I ,

i ↓ t denotes the value of the attribute i over object t.

Table 1 gives an example of a numerical dataset built

over the set of attributes I = {age, salary, cars}.

Table 1: An example of a numerical dataset ∆.

tid age salary cars

22 1200 2

28 1850 3

24 1200 4

35 2200 4

38 2000 1

44 3400 1

52 3400 3

41 5000 2

Each attribute will hereafter be considered twice:

once to indicate its increase (≤), and another to in-

dicate its decrease (≥). This leads to new kinds of

items, called gradual items.

Deﬁnition 1. Let ∆ be a dataset deﬁned on a numeri-

cal attribute set I . A gradual item is deﬁned under the

Extracting Frequent Gradual Patterns Based on SAT

137

form i

, where i is an attribute of I and o ∈ {≥, ≤}

represents an ascending or descending order or vari-

ation of the attribute values of i.

If we consider the numerical dataset of Table 1,

age

≥

(respectively age

≤

) is a gradual item express-

ing that the values of the attribute age are increasing

(respectively decreasing). Now, a gradual itemset (or

simply gradual pattern) is a non-empty set of grad-

ual items. Also, such an itemset is called a k-gradual

itemset if it contains exactly k gradual items. For

example, g

= {age

≥

, salary

≥

} is a 2-gradual item-

set, meaning that ”the higher the age, the higher is

the salary”. A gradual itemset sets a variation order

on several attributes simultaneously. The length of

a gradual itemset is equal to the number of gradual

items that it contains.

The support (also called frequency) of a grad-

ual itemset amounts to the extent to which a gradual

pattern is present in a numerical database. Several

support deﬁnitions have been proposed in the litera-

ture (H

ullermeier, 2002; Calders et al., 2006; Berzal

et al., 2007; Laurent et al., 2009; Di-Jorio et al., 2009;

Kendall and Smith, 1939), showing that gradual item-

sets can follow different semantics. In (H

ullermeier,

2002) the computation of the support of gradual item-

set is based on linear regression. In (Calders et al.,

2006; Berzal et al., 2007; Kendall and Smith, 1939),

the authors considered the proportion of couples of

tuples that veriﬁes the constraints expressed by all the

gradual items of the itemset, while in (Di-Jorio et al.,

2009), the support is deﬁned as the size of the longest

sequence of tuples supporting the gradual itemset. In

this paper, we adopt this last deﬁnition of support for

its relevance and generality. To introduce formally

this variant of support, let us ﬁrst introduce the fol-

lowing additional deﬁnitions:

Deﬁnition 2. Let g = (i

, ..., i

) be a gradual itemset

and s = ⟨t

→ t

→ . . . → t

⟩ a sequence of tuples.

Then, s is an extension of g if ∀ 1 ≤ p ≤ k and ∀ 1 ≤

j < n, we have:

↓ t

) o

↓ t

j+1

) (1)

It is important to note that there might be several

extensions or sequences of tuples validating g.

Deﬁnition 3. Let g be a gradual itemset in a numer-

ical database ∆. We deﬁne Cover(g, ∆) as the set of

the longest extensions of g in ∆ w.r.t. set inclusion.

Example 1. Let us consider the database ∆ de-

picted in Table 1 and the gradual itemset g

{age

≥

, salary

≥

}. Cover(g

, ∆) = {⟨t

→ t

→

→ t

⟩, ⟨ t

→ t

⟩, ⟨ t

→ t

⟩, ⟨ t

→ t

⟩}.

From the same example, the cover of the gradual item

salary

≥

(resp. cars

≥

) is {⟨t

→ t

→

→ t

⟩} (resp. {⟨ t

→ t

→

→ t

⟩}).

Now, we are ready to give the deﬁnition of sup-

port.

Deﬁnition 4. Let ∆ be a numerical database and g be

a gradual itemset of ∆. Then,

Supp(g, ∆) =

max{|s|, s ∈ Cover(g, ∆)}

|∆|

Example 2. Referring again to the database ∆

of Table 1 and the gradual itemset g

, we have

Supp(g

, ∆) =

. So, six among the eight input tu-

ples can be ordered according to g

. Note that the

support of a gradual item is equal to 100% as it is al-

ways possible to order all of the tuples according to

the values of a single attribute.

A gradual itemset is said to be frequent if its sup-

port is greater than or equal to a user-deﬁned support

threshold.

Deﬁnition 5. Let ∆ be a numerical database and λ

a minimum support threshold. The problem of min-

ing gradual itemsets is to ﬁnd the set of all frequent

gradual itemsets of ∆ w.r.t. λ, i.e., ﬁnding the set

{g | Supp(g, ∆) ≥ λ}.

Deﬁnition 6. Let g = (i

, ..., i

) be a gradual item-

set, and

f be a function such that

f (≥) =≤ and

f (≤) =≥. Then

f (g) = (i

f (o

)

, ..., i

f (o

)

) is the com-

plementary (symmetric) gradual itemset of g.

Interestingly, any gradual itemset admits a com-

plementary gradual one where the items are the

same but the variations are all reversed. For in-

stance, the complementary gradual itemset of g

(age

≤

, salary

≤

Proposition 1 ((Di-Jorio et al., 2009)). Let g be a

gradual itemset of a numerical database ∆. We have

Supp(g, ∆) = Supp(

f (g), ∆).

Proposition 1 avoids unnecessary computation, as

generating only a half of the set of the gradual item-

sets is sufﬁcient to automatically deduce the comple-

mentary ones. As far as we know, there is no exist-

ing algorithm to mining both the gradual itemsets and

all their corresponding extensions. For each gradual

itemset, all the state-of-the-art algorithms looks for an

extension with the maximum size while there are ap-

plication domains where the extensions of the gradual

itemsets bring new knowledge to the user.

2.2 Boolean Satisﬁability Problem

This section introduces the Boolean satisﬁability

problem, or simply SAT. It corresponds to the prob-

DATA 2023 - 12th International Conference on Data Science, Technology and Applications

138

lem of deciding if a formula of propositional classi-

cal logic is consistent or not. It is one of the most

studied NP-complete decision problem. In this work,

we consider the associated problem of boolean model

enumeration.

We consider the conjunctive normal form (CNF,

for short) representation for the propositional formu-

las. A CNF formula F is a conjunction of clauses,

where a clause is a disjunction of literals. A literal is

a positive (l) or negated (¬l) propositional variable.

The two literals (l) and (¬l) are called complemen-

tary. We note by

l the complementary literal of l. For

a set of literals L,

L is deﬁned as {

l | l ∈ L}. Let

us recall that any propositional formula can be trans-

lated to CNF using linear Tseitin’s encoding (Tseitin,

1968). The set of variables occurring in F is noted

Var(F ).

An interpretation ρ of a boolean formula F is

a function which associates a value ρ(l)∈ {0, 1} (0

correspond to false and 1 to true) to the variables

x ∈ Var(F ). A model of a formula is an interpretation

ρ that satisﬁes the formula. SAT problem consists in

deciding if a given formula admits a model or not.

3 SAT-BASED ENCODING FOR

DISCOVERING FREQUENT

GRADUAL PATTERNS

In this section, we show how the problem of mining

all the frequent gradual itemset in a numerical dataset

w.r.t. a minimum support threshold minSupp can be

encoded as a propositional formula. In order to for-

mally describe our encoding, we consider a numerical

dataset ∆ = T × A where A = {a

, . . . , a

} is a set of

attributes, T = {t

, ...,t

} a set of transactions, and

k the minimum support threshold. To model the fre-

quent gradual itemset mining task into SAT, we asso-

ciate with each gradual item a

≥

(resp. a

≤

) a Boolean

variable x

≥

(resp. x

≤

), meaning that the gradual

item is included in the gradual itemsets or not. Sim-

ilarly, with each transaction t

, we associate a set of

Boolean variables t

, . . . ,t

where t

i j

means that the

transaction t

is set on the j−th position. For con-

straints modeling, our objective is to range a set of

k transactions from m that highlights a gradual item-

set. Otherwise, we have k positions that have to be

assigned with transactions. Constraint (2) allows to

not consider gradual itemset involving both a

≥

and

≤

of each attribute a.

a∈{a

,...,a

}

(¬x

≥

∨ ¬x

≤

) (2)

This ﬁrst constraint solves the problem encoun-

tered with the specialized algorithm of frequent grad-

ual itemsets mining GLCM (Do et al., 2015) which of-

ten returns the gradual itemsets containing both the

gradual items and their corresponding complementary

gradual items.

The second constraint (3) allows us to indicate that

a position j ∈ {1, . . . , k} must be associated with one

transaction.

1≤ j≤k

(

∑

i=1

i j

= 1) (3)

The Constraint (4) is introduced to not allow a

transaction to be placed in more than one position

among {1, . . . , k}.

1≤i≤n

(

∑

j=1

i j

≤ 1) (4)

Note that the two constraints (3) and (4) encodes

the well known pigeon-hole problem.

Constraint (5) aims to express, given a gradual

item a

∗

, the set of transactions that can be set in posi-

tion j + 1 if transaction t

is putted in position j.

⋄

∈A

∗

1≤i≤n

1≤ j≤k

⋄

∧t

i j

→

(a) ⋄ t

(a)

k( j+1)

) (5)

Note that such constraint can be expressed differently

by considering only those that are not allowed as

stated in Constraint (6).

In contrast to (5), constraint (6) allows to add only

ternary clauses. However, their number is higher that

those of (5).

⋄

∈A

∗

1≤i≤n

1≤ j≤k

⋄

∧t

i j

→

(a)

⋄ t

(a)

¬t

k( j+1)

) (6)

Example 3. Let us consider the transaction database

of Table 1. Assume that k = 5. If the gradual item-

set contains the gradual item car

≥

, and if the support

is as t

is set on position 1, then the corresponding

constraints is as follows:

car

≥

∧t

→ t

∨t

Proposition 2. There is a one-to-one mapping be-

tween the gradual itemsets and the models of the for-

mula Φ

k,n

= (2) ∧ (3) ∧ (4) ∧ (5) ∧ (6).

Proposition 2 links the gradual itemsets to the

models of our encoding.

Finally, in order to eliminate symmetrical gradual

itemsets, we add the following constraint:

∈a

...a

(¬x

≥

∨

1≤ j<i

¬x

≤

) (7)

Extracting Frequent Gradual Patterns Based on SAT

139

In fact, the permutation σ = (a

≥

, a

≤

). . . (a

≥

, a

≤

) is a

symmetry of the proposed encoding. Consequently,

one can break such symmetry by adding the Symme-

try Breaking Predicates as deﬁned in (Crawford et al.,

1996). More precisely, in (Crawford et al., 1996), for

a symmetry σ = (x

, y

). . . , (x

, y

) the author show

that to break this symmetry one can add the following

constraint

i=1

i−1

j=1

= y

) → (x

≤ y

)

Combining this constraint with the one of (2) leads to

the simpliﬁed Constraint (7).

Note that the constraint (7) allows to avoid com-

puting all gradual patterns and their corresponding

symmetric gradual pattern. However, this constraint

will add a certain number of variables and clauses to

the ﬁnal boolean formula. We propose another direc-

tion to take into account this symmetrical without add

the constraint (7) but by adding two blocking clauses

in the NCF formula each time a model is found. One

clause to avoid ﬁnding the same model and another to

avoid ﬁnding a model corresponding to the symmetric

pattern.

Note that (

∑

i=1

i j

= 1) (respectively (

∑

j=1

i j

≤

1)) represent linear equality (respectively inequal-

ity) commonly called exact-One (respectively atMo-

stOne Constraint). Such constraint can be encoding

in respectively O(n) (respectively O(k)) clauses us-

ing O(n) (respectively O(k)) additional variables as

indicated in constraint (8) (Warners, 1998; Silva and

Lynce, 2007). A possible encoding of

∑

i=1

= 1 is

as follows using auxiliary variables {p

, . . . , p

n−1

(

1≤i≤n

) ∧ (¬x

∨ p

) ∧ (¬x

∨ ¬p

n−1

)∧

1<i<n

(¬x

∨ p

) ∧ (¬p

i−1

∨ p

) ∧ (¬x

∨ ¬p

i−1

) (8)

From complexity point of view, let us note that

our encoding introduces O(k ×n ×m) clauses. In fact,

Constraint 2 is on O(n). Constraint 3 and 4 leads to

O(n × m). For Constraint 5 requires O(k × n × m).

Finally, Constraint 4 requires O(n). So to summarize,

the encoding is in O(n + k × m + k × n × m) = O(k ×

n × m). For the introduced variables. Let us mention

than this number is in O(k × m). In fact, in addition

to x

≥

, x

≤

and t

i j

, new variables must be added to

encoded cardinality constraints of Constraints 3 and

4. This number remains bounded by O(k × m).

As mentioned encoding gradual itemsets mining

into propositional satisﬁability allows to have a more

ﬂexible approach where new constraints can be added

to mine particular patterns. Typically, in many ap-

plication ﬁelds, interesting gradual patterns can be

distinguished from irrelevant ones by specifying se-

mantic constraints on the gradual pattern itself. For

example, the authors of (Lonlac et al., 2017) de-

signed an algorithm to mine temporal gradual pat-

terns which are gradual patterns whose the longest

sequence of transactions respect the temporal order.

These kinds of gradual patterns are particularly in-

teresting in the paleoecological domain where the

experts search from their paleoecological numeri-

cal data the patterns which capture the simultane-

ously frequent co-evolutions between attributes. As

the transactions are encoded in our CNF formula as

Boolean variables, the temporal constraint can be cap-

tured by selecting in the temporal order the proposi-

tional variables t

i j

representing the transaction identi-

ﬁers of the numerical dataset.

4 SAT-BASED ENUMERATION

PROCEDURE

In this section, we describe the SAT-Based enumer-

ation procedure to deal with the problem of enumer-

ating all models of the CNF formula Φ

k,n

. SAT is a

decision problem. When the answer is positive, the

current SAT solvers provide a model satisfying the

formula. In the sequel, we brieﬂy describe the ba-

sic components of modern SAT solvers, also called

CDCL SAT solvers (Moskewicz et al., 2001; En and

orensson, 2003) designed to enumerate all the mod-

els of a given CNF formula. To be exhaustive, these

solvers incorporate unit propagation (enhanced by

efﬁcient and lazy data structures), variable activity-

based heuristic, literal polarity phase, clause learning,

restarts and a learnt clauses database reduction policy.

Algorithm 1 depicts the general scheme of CDCL

SAT solver extended for model enumeration. A SAT

solver is a tree-based backtrack search procedure; at

each node of the search tree, the assigned literals (de-

cision literal and the propagated ones) are labeled

with the same decision level starting from 1 and in-

creased at each decision (or branching).

Typically, this solver performs a tree-based back-

track search procedure. Each branch of the binary

search tree can be seen as a sequence of decision and

unit propagated literals. At each node, a decision vari-

able is chosen (ligne 23), and assigned to the true or

false polarity (selectPhase(l) - line 25). Then unit

propagation is performed in line 6. All these liter-

als (decision and propagated ones) assigned at a given

node are labelled with the same level dl. If all lit-

erals are assigned without contadiction, then ρ is a

model of F and the formula is answered to be sa-

tisﬁable (line 16). As our boolean formula represents

DATA 2023 - 12th International Conference on Data Science, Technology and Applications

140

Algorithm 1: CDCL Based Enumeration solver.

Input: a CNF formula Φ

Output: All models of Φ

1 ρ =

0 ; /* interpretation */

2 δ =

0 ; /* learnt clauses database */

3 dl = 0 ; /* decision level */

4 while (true) do

5 Prop ;

6 γ = unitPropagation(Φ, I) ;

7 if γ ̸= null then

8 β = con f lictAnalysis(Φ, I, γ) ;

9 btl = computeBack jumpLevel(β, I) ;

10 if btl == 0 then

11 return UNSAT ;

12 δ = δ ∪ {β} ;

13 if restart() then

14 btl = 0;

15 back jump(btl) ;

16 dl = btl ;

17 else

18 if ρ |= Φ then

19 extractPatternFromModel(ρ) ;

20 addBlockedClause(ρ) ;

21 back jumpUntil(0) ;

22 goto Prop ;

23 if (timeToReduce()) then

24 reduceDB(δ) ;

25 l = selectDecisionVariable(Φ) ;

26 dl = dl + 1 ;

27 ρ = ρ ∪ {selectPhase(l)} ;

the encoding of the closed frequent gradual itemset

mining problem, each time a model is found, an grad-

ual itemset is extracted from ρ (line 17). For model

enumeration, the search continue by adding a blocked

clause to avoid enumerating again the same models

(line 18). Search restart at level 0, to search for the

next models (lines 19-20). The other case, is reached

when unit propagation (lines 8-14) leads to a conﬂict

(γ is the conﬂict clause), a new asserting clause β is

derived by conﬂict analysis (line 8), mostly follow-

ing the First-UIP scheme (’Unique Implication Point’

(Zhang et al., 2001)) A backtrack level (btl) is derived

from the asserting clause (line 9). If btl is null, then

the formula is answered unsatisﬁable (line 10), oth-

erwise β is added to the learnt clauses database (line

11) and the algorithm backjump to the level btl (line

13). Regularly, the CDCL solver performs restarts, by

backtracking to level 0 (line 12) using one of the vari-

ous restart strategies ((Huang, )). Such restarts deﬁne

the frequency used by the solver to restart the search.

Finally, another component concern the learnt clauses

management policy. To maintain a learnt clauses

database of reasonable size, a reduction is performed

(line 22) using one the various strategies proposed in

the literature (Audemard and Simon, 2009; E

en and

orensson, 2003; Lonlac and Mephu Nguifo, 2017;

Jabbour et al., 2014).

5 EXPERIMENTS

In this section, we carried out an experimental eval-

uation of the performance of our proposed approach.

we ran experiments both on the real-world paleoeco-

logical datasets and on the synthetic datasets. The pa-

leoecological dataset are constituted of a set of nu-

merical attributes whose the values correspond to the

quantity of each paleoecological indicator contained

in a sediment record taken, by coring operations, in a

lake ecosystem. It contains 111 objects corresponding

to different dates identiﬁed on the considered Lacus-

trine recording, and 117 attributes corresponding to

paleoecological indicators. All the experiments were

done on Intel Xeon quad-core machines with 32GB

of RAM running at 2.66 Ghz.

To solve the obtained formulas, we use the solver

MiniSAT2.2 (E

en and S

orensson, 2003) adapted for

model enumeration since as proposed enumerating all

the models that satisfy the CNF formula which en-

codes the frequent gradual pattern mining problem

is equivalent to enumerating all frequent gradual pat-

terns.

The main procedure of our approach is given in

algorithm 2. This procedure compute and return all

frequent gradual patterns with respect to the mini-

mum support threshold minSupp. The procedure ﬁnd-

AllModel corresponds to the algorithm 1 modiﬁed by

adding to the CNF formula two blocking clauses in-

stead of one blocking clause at each time that a model

is found during the resolution process. One block-

ing clause to avoid ﬁnding the same model and an-

other one to avoid ﬁnding a model corresponding to

the symmetric pattern of the extracted gradual pattern.

Algorithm 2: SAT Based Gradual Patterns Enumeration.

Input: a numerical database DS , a

minimum support minSupp

Output: Set of all frequent gradual patterns

1 F ← SAT Encoding(DS, minSupp) ;

2 f indAllModel(F ) ;

Table 2 presents results obtained on the paleoe-

cological dataset. It yields the size of the CNF for-

mula (number of variables and clauses) encoding the

gradual patterns mining problemn with respect to a

minimum support. In this table, we mention the for-

mula encoding the whole problem in terms of number

Extracting Frequent Gradual Patterns Based on SAT

141

of variables (#vars) and clauses (#clauses) with re-

spect to a minimum support threshold (#minSupp).

The last column gives in seconds the cpu time need

for encoding. The ﬁrst observation that we can draw

from Table 2 is that our SAT-based approach gener-

ates huge CNF formulas in short time. For instance,

for a minimum support equal to 50%, our SAT en-

coding generates in 2.25 seconds, a CNF formula

with 18383 variables and 1438734 clauses. It is also

worth mentioning that the number of variables of the

CNF formula increases when the minimum support

increases (see Table 2) and the number of clauses

strongly increases.

Table 2: CNF encoding characteristics by varying the mini-

mum support threshold.

#minSupp #vars #clauses #encodingTime (in seconds)

5% 2 115 133 521 0.22

10% 3 775 266 706 0.43

20% 7 427 559 713 0.86

30% 11 079 852 720 1.31

40% 14 731 1 145 727 1.74

50% 18 383 1 438 734 2.25

60% 22 035 1 731 741 2.69

70% 25 687 2 024 748 3.12

80% 29 339 2 317 755 3.54

90% 32 991 2 610 762 4.03

Table 3 compares (run-times, in seconds) on a

synthetic dataset of 10 items and 100 transactions

our proposed SAT-based approach, which we coined

SAT4GIM to GRITE solver (Di-Jorio et al., 2009) and

, the efﬁcient specialized algorithm for extracting

frequent gradual itemsets from numerical databases

when varying the minimum support threshold. We

generate the synthetic dataset using an adapted ver-

sion of IBM Synthetic Data Generation Code for As-

sociations and Sequential Patterns

. We also com-

pare our SAT-based approach to the one proposed

in (Hidouri et al., 2021) called SATGIM. The results

from Table 3 show that, for the small minimum sup-

port thresholds, our SAT-based approach is faster

than the efﬁcient specialized algorithm for extracting

frequent gradual itemsets from numerical databases

GRITE (Di-Jorio et al., 2009) and SATGIM. On the

other hand, our proposal takes longer than other ap-

proaches to enumerate the complete set of gradual

patterns when the minimum support threshold is high.

It is worth mentioning that SAT4GIM makes it possible

to know for each frequent gradual pattern, the position

of each transaction belonging to its extension. That is

not the case for GRITE and SATGIM.

www.almaden.ibm.com/software/projects/hdb/resour

ces.shtml

Table 3: SAT4GIM vs (GRITE, SATGIM) for various min-

imum support values.

#minSupp GRITE SATGIM SAT4GIM #Gradual

0.02 3.82 1.51 0.61 59 001

0.03 3.71 5 0.91 38 923

0.04 3.51 7.4 1.40 14 507

0.05 3.45 8.32 1.90 5 741

0.1 3.29 7.38 4.71 411

0.15 3.09 6.98 7.2 201

0.2 2.62 6.70 10.35 75

0.3 2.59 2.04 16.37 33

0.4 2.58 0.80 22.04 27

0.5 2.50 0.17 28.19 21

6 CONCLUSION

In this paper, we proposed SAT encoding to address

the problem of mining frequent gradual patterns. This

declarative approach offers an additional possibility

to beneﬁt from the recent progress in satisﬁability

testing and to enumerate each gradual pattern with the

sequence of objects supporting it. We also performed

experiments with real-world and synthetic datasets to

show the effeciency of our proposal w.r.t. state-of-the-

art algorithms for mining gradual itemsets. Future di-

rections can be pursued to address various challenges.

First, we intend to develop a SAT-based encoding to

enumerate maximal frequent gradual patterns which

remains an open challenging and impactful problem

in gradual pattern mining. We also plan to perform

more experiments on large datasets.

REFERENCES

Aryadinata, Y. S., Lin, Y., Barcellos, C., Laurent, A., and

Libourel, T. (2014). Mining epidemiological dengue

fever data from brazil: A gradual pattern based geo-

graphical information system. In IMPU, pages 414–

423.

Audemard, G. and Simon, L. (2009). Predicting learnt

clauses quality in modern sat solvers. In Proceedings

of the 21st International Joint Conference on Artiﬁcial

Intelligence, IJCAI’09, pages 399–404.

Aumann, Y. and Lindell, Y. (1999). A statistical theory

for quantitative association rules. In SIGKDD, pages

261–270.

Ayouni, S., Laurent, A., Yahia, S. B., and Poncelet, P.

(2010). Mining closed gradual patterns. In Artiﬁcial

Intelligence and Soft Computing, 10th International

Conference, ICAISC 2010, Zakopane, Poland, June

13-17, 2010, Part I, pages 267–274.

Berzal, F., Cubero, J. C., S

anchez, D., Miranda, M. A. V.,

and Serrano, J. (2007). An alternative approach to

discover gradual dependencies. International Journal

of Uncertainty, Fuzziness and Knowledge-Based Sys-

tems, 15(5):559–570.

DATA 2023 - 12th International Conference on Data Science, Technology and Applications

142

Calders, T., Goethals, B., and Jaroszewicz, S. (2006). Min-

ing rank-correlated sets of numerical attributes. In

KDD, pages 96–105.

Crawford, J., Ginsberg, M. L., Luck, E., and Roy, A. (1996).

Symmetry-breaking predicates for search problems.

In Principles of Knowledge Representation and Rea-

soning (KR’96), pages 148–159.

Di-Jorio, L., Laurent, A., and Teisseire, M. (2008). Fast ex-

traction of gradual association rules: a heuristic based

method. In CSTST 2008: Proceedings of the 5th In-

ternational Conference on Soft Computing as Trans-

disciplinary Science and Technology, Cergy-Pontoise,

France, October 28-31, 2008, pages 205–210.

Di-Jorio, L., Laurent, A., and Teisseire, M. (2009). Min-

ing frequent gradual itemsets from large databases. In

Advances in Intelligent Data Analysis VIII, 8th Inter-

national Symposium on Intelligent Data Analysis, IDA

2009, Lyon, France, August 31 - September 2, 2009.

Proceedings, pages 297–308.

Do, T. D. T., Laurent, A., and Termier, A. (2010). PGLCM:

efﬁcient parallel mining of closed frequent gradual

itemsets. In ICDM, pages 138–147.

Do, T. D. T., Termier, A., Laurent, A., N

egrevergne, B.,

Tehrani, B. O., and Amer-Yahia, S. (2015). PGLCM:

efﬁcient parallel mining of closed frequent gradual

itemsets. Knowl. Inf. Syst., 43(3):497–527.

en, N. and S

orensson, N. (2003). An extensible sat-solver.

pages 502–518.

En, N. and S

orensson, N. (2003). An extensible SAT-solver.

pages 502–518.

Fan, C. and Xiao, F. (2017). Mining gradual patterns in

big building operational data for building energy ef-

ﬁciency enhancement. Energy Procedia, 143:119 –

124. Leveraging Energy Technologies and Policy Op-

tions for Low Carbon Cities.

Hidouri, A., Jabbour, S., Raddaoui, B., and Yaghlane, B. B.

(2021). Mining closed high utility itemsets based

on propositional satisﬁability. Data Knowl. Eng.,

136:101927.

Huang, J. The effect of restarts on the efﬁciency of clause

learning. pages 2318–2323.

ullermeier, E. (2002). Association rules for expressing

gradual dependencies. In Principles of Data Mining

and Knowledge Discovery, 6th European Conference,

PKDD 2002, Helsinki, Finland, August 19-23, 2002,

Proceedings, pages 200–211.

Jabbour, S., Lonlac, J., Sais, L., and Salhi, Y. (2014). Revis-

iting the learned clauses database reduction strategies.

CoRR, abs/1402.1956.

Jabbour, S., Sais, L., and Salhi, Y. (2013). The top-

k frequent closed itemset mining using top-k SAT

problem. In Machine Learning and Knowledge Dis-

covery in Databases - European Conference, ECML

PKDD 2013, Prague, Czech Republic, September 23-

27, pages 403–418.

Kaytoue, M., Kuznetsov, S. O., and Napoli, A. (2011). Re-

visiting numerical pattern mining with formal concept

analysis. In IJCAI, pages 1342–1347.

Kendall, M. and Smith, B. (1939). The problem of m rank-

ings. In The annals of mathematical statistics - Volume

10, pages 275–287.

Laurent, A., Lesot, M., and Rifqi, M. (2009). GRAANK:

exploiting rank correlations for extracting gradual

itemsets. In Flexible Query Answering Systems, 8th

International Conference, FQAS 2009, Roskilde, Den-

mark, October 26-28, 2009. Proceedings, pages 382–

393.

Laurent, A., N

egrevergne, B., Sicard, N., and Termier, A.

(2010). Pgp-mc: Towards a multicore parallel ap-

proach for mining gradual patterns. In DASFAA, Part

I, pages 78–84.

Lonlac, J. and Mephu Nguifo, E. (2017). Towards learned

clauses database reduction strategies based on domi-

nance relationship. CoRR, abs/1705.10898.

Lonlac, J., Miras, Y., Beauger, A., Mazenod, V., Peiry, J.-

L., and Mephu, E. (2018). An approach for extract-

ing frequent (closed) gradual patterns under temporal

constraint. In FUZZ-IEEE, pages 878–885.

Lonlac, J., Miras, Y., Beauger, A., Pailloux, M., Peiry, J.-L.,

and Nguifo, E. M. (2017). Une approche d’extraction

de motifs graduels (ferm

es) fr

equents sous contrainte

de la temporalit

e. Revue des Nouvelles Technologies

de l’Information, Extraction et Gestion des Connais-

sances, RNTI-E-33:213–224.

Masseglia, F., Laurent, A., and Teisseire, M. (2008). Grad-

ual trends in fuzzy sequential patterns. In In IPMU,

pages 456–463.

Moskewicz, M. W., Madigan, C. F., Zhao, Y., Zhang, L.,

and Malik, S. (2001). Chaff: Engineering an efﬁcient

SAT solver. In Proceedings of the 38th Design Au-

tomation Conference (DAC’01), pages 530–535.

egrevergne, B., Termier, A., Rousset, M., and M

ehaut, J.

(2014). Para miner: a generic pattern mining algo-

rithm for multi-core architectures. DMKD, 28(3):593–

633.

Ngo, T., Georgescu, V., Laurent, A., Libourel, T., and

Mercier, G. (2018). Mining spatial gradual patterns:

Application to measurement of potentially avoidable

hospitalizations. In SOFSEM, pages 596–608.

Oudni, A., Lesot, M., and Rifqi, M. (2013). Processing

contradiction in gradual itemset extraction. In FUZZ-

IEEE, pages 1–8.

Ramakrishnan, S. and Rakesh, A. (1996). Mining quanti-

tative association rules in large relational tables. SIG-

MOD Rec., 25(2):1–12.

Salleb-Aouissi, A., Vrain, C., and Nortet, C. (2007). Quant-

miner: A genetic algorithm for mining quantitative as-

sociation rules. In IJCAI, pages 1035–1040.

Silva, J. P. M. and Lynce, I. (2007). Towards robust cnf en-

codings of cardinality constraints. In CP, pages 483–

497.

Tseitin, G. (1968). On the complexity of derivations in the

propositional calculus. In Slesenko, H., editor, Struc-

tures in Constructives Mathematics and Mathematical

Logic, Part II, pages 115–125.

Warners, J. P. (1998). A linear-time transformation of linear

inequalities into conjunctive normal form. Informa-

tion Processing Letters, 68(2):63 – 69.

Zhang, L., Madigan, C. F., Moskewicz, M. W., and Malik,

S. (2001). Efﬁcient conﬂict driven learning in Boolean

satisﬁability solver. In IEEE/ACM CAD’2001, pages

279–285.

Extracting Frequent Gradual Patterns Based on SAT

143