Extracting Frequent Gradual Patterns Based on SAT
Jerry Lonlac
1 a
, Imen Ouled Dlala
2 b
, Said Jabbour
3 c
, Engelbert Mephu Nguifo
4 d
,
Badran Raddaoui
5 e
and Lakhdar Sais
3 f
1
Centre de Recherche, IMT Lille Douai, Universit
´
e Lille, Lens, France
2
P
ˆ
ole Universitaire L
´
eonard de Vinci, Research Center, Paris La D
´
efense, France
3
CRIL, University of Artois & CNRS, Lens, France
4
Univ. Clermont Auvergne, CNRS, LIMOS, F-63000 Clermont-Ferrand, France
5
SAMOVAR, T
´
el
´
ecom SudParis, Institut Polytechnique de Paris, France
Keywords:
Data Mining, Gradual Patterns, Propositional Satisfiability.
Abstract:
This paper proposes a constraint-based modeling approach for mining frequent gradual patterns from nu-
merical data. Our declarative approach provides a principle way to take advantage of recent advancements in
satisfiability testing and several features of modern SAT solvers to enumerating gradual patterns. Interestingly,
our approach can easily be extended with extra requirements, such as temporal constraints used to extract more
specific patterns in a broad range of gradual patterns mining applications. An empirical evaluation on two real-
word datasets shows the efficiency of our approach.
1 INTRODUCTION
Nowadays, numerical data, also coined quantitative
data, are abundant due to the proliferation of measur-
ing and data collection devices and sensors. These
data can be obtained from a range of practical ar-
eas, including environment & ecology, humanities &
social science, trade & finance, and life sciences &
astronomy. Typically, machine learning, statistical,
and logical data analysis techniques are employed
for analyzing these large amount of numerical data.
Nevertheless, only a small number of pattern min-
ing techniques, which typically studied categorical
data, are designed to handle numerical data Among
these approaches, we can cite itemset/association rule
mining (Ramakrishnan and Rakesh, 1996; Aumann
and Lindell, 1999; Salleb-Aouissi et al., 2007), in-
terval patterns using formal concept analysis (Kay-
toue et al., 2011), rank-correlated sets of numerical
attributes mining (Calders et al., 2006), and gradual
pattern mining (Di-Jorio et al., 2009). in particu-
lar, gradual itemsets mining aims at discovering fre-
quent co-variations between attributes, such as the
a
https://orcid.org/0000-0003-3278-9969
b
https://orcid.org/0000-0001-6928-4599
c
https://orcid.org/0000-0002-8389-8332
d
https://orcid.org/0000-0001-9119-678X
e
https://orcid.org/0000-0003-4712-0811
f
https://orcid.org/0000-0003-2879-8627
higher the age, the higher the salary, and the lower
the free time”. Gradual patterns are very expres-
sive as they represent the variability of numerical val-
ues, often sought in numerous application domains,
including for instance biology where most advances
are done by analyzing genome data, medicine where
researchers in psychology focus on correlations be-
tween memory and feeling points from the “Diagnos-
tic manual of mental disorders”, and also in financial
markets for discovering covariation between various
economic and financial indicators. More generally,
discovering gradual patterns is relevant in all numer-
ical data where one needs to recover the relationship
between attributes in terms of variability (e.g. (Ngo
et al., 2018; Aryadinata et al., 2014; Fan and Xiao,
2017)).
Numerous proposals have been introduced to
tackle the problem of gradual itemset mining or its
variants, e.g., (H
¨
ullermeier, 2002; Berzal et al., 2007;
Masseglia et al., 2008; Di-Jorio et al., 2008; Di-Jorio
et al., 2009; Do et al., 2010; Laurent et al., 2010;
Oudni et al., 2013; N
´
egrevergne et al., 2014; Do et al.,
2015; Lonlac et al., 2018). For most of these methods,
an extracted gradual pattern is provided with a unique
support or sub-sequence of transactions, called exten-
sion, supporting it. Such information when provided,
allows us to explain to the user why such itemset is
gradual. Recently, in (Ngo et al., 2018), the authors
addressed the problem of mining spatial gradual pat-
136
Lonlac, J., Dlala, I., Jabbour, S., Nguifo, E., Raddaoui, B. and Sais, L.
Extracting Frequent Gradual Patterns Based on SAT.
DOI: 10.5220/0012126000003541
In Proceedings of the 12th International Conference on Data Science, Technology and Applications (DATA 2023), pages 136-143
ISBN: 978-989-758-664-4; ISSN: 2184-285X
Copyright
c
2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
terns with an application to the measurement of po-
tentially avoidable hospitalizations. In this context,
they have shown that the analysis of the different se-
quences of objects associated to the gradual patterns
could reveal new relevant knowledge to the expert.
Let us mention that the sequence of objects verify-
ing a gradual pattern is not unique, to the best of our
knowledge, all the mentioned approaches do not pro-
vide all the possible extensions of the gradual item-
set. Providing an exhaustive set of extensions asso-
ciated to a given gradual itemset could be beneficial
and would improve the robustness and the precision
of the information characterizing the pattern. More-
over, in (Di-Jorio et al., 2009), the authors presented
a first effective algorithm for the discovery of gradual
itemsets and gradual rules that can handle databases
with hundreds of attributes (contrary to the algorithm
introduced in (Berzal et al., 2007) which is limited to
only six attributes).
One of the primary issues with mining gradual
pattern approaches is the exponential combination
space to be explored, coupled with the challenge of
the huge number of patterns, which can also be of ex-
ponential size. This combinatorial explosion is tack-
led in (Laurent et al., 2009; Ayouni et al., 2010;
Do et al., 2015). In fact, in (Laurent et al., 2009),
the authors proposed a meethod for extracting grad-
ual patterns from large datasets which takes advan-
tage of a binary representation of lattice structure. In
addition, other work (e.g., (Ayouni et al., 2010; Do
et al., 2015)) retrieved only closed frequent gradual
patterns in order to reduce the total number of ex-
tracted patterns. In this paper we propose a new ap-
proach for extracting all frequent closed gradual pat-
terns in a numerical dataset. Our approach differs
from all the previous specialized approaches. It fol-
lows the SAT-based framework proposed in (Jabbour
et al., 2013) for mining frequent closed itemsets. This
new framework offers a declarative and flexible repre-
sentation model. In fact, specialized approaches often
require fresh implementations to accommodate novel
constraints, whereas such constraints can be easily
integrated within a Boolean satisfiability framework.
This enables data mining problems to take advantage
of multiple generic and efficient SAT solving tech-
niques. We propose in this paper to heavily exploit
the declarative language (SAT) and these associated
efficient and generic solving techniques. First, For
that purpose, we present a new SAT-based model to
find frequent gradual patterns that includes different
types of constraints. Then, we provide a boolean
formulation of the closeness constraints in order to
search for frequent closed gradual patterns. Differ-
ent experiments carried out over real datasets are pre-
sented to show the feasibility of the new SAT-based
approach. The paper is organized as follows: in Sec-
tion 2.1, we present the problem of mining gradual
itemsets from numerical databases and some efficient
algorithms that have been proposed in the literature.
We also recall the problem of Boolean satisfiability.
Section 4 describes the SAT-based enumeration pro-
cedure to deal with the problem of enumerating the
set of all models of a CNF formula. In Section 3, we
present our SAT encoding of frequent gradual itemset
mining problem and show through an example how it
can be applied to find frequent gradual itemsets from a
numerical dataset. Finally, section 5 presents detailed
experiments carried out over real datasets, showing
the applicability and the interest of our approach.
2 PRELIMINARIES
In this section, we formally describe the problem of
mining frequent gradual itemsets (or patterns) in nu-
merical datasets as well as the propositional satisfia-
bility problem.
2.1 Gradual Itemsets Mining Problem
The problem of mining gradual patterns consists in re-
trieving attribute co-variations in numerical dataset of
the form ”The more/less X,. . ., the more/less Y”. We
assume herein that we are given a dataset containing
a set of objects T defining a relation on an attribute
set I with numerical values, i.e., for t T and i I ,
i t denotes the value of the attribute i over object t.
Table 1 gives an example of a numerical dataset built
over the set of attributes I = {age, salary, cars}.
Table 1: An example of a numerical dataset .
tid age salary cars
t
1
22 1200 2
t
2
28 1850 3
t
3
24 1200 4
t
4
35 2200 4
t
5
38 2000 1
t
6
44 3400 1
t
7
52 3400 3
t
8
41 5000 2
Each attribute will hereafter be considered twice:
once to indicate its increase (), and another to in-
dicate its decrease (). This leads to new kinds of
items, called gradual items.
Definition 1. Let be a dataset defined on a numeri-
cal attribute set I . A gradual item is defined under the
Extracting Frequent Gradual Patterns Based on SAT
137
form i
o
, where i is an attribute of I and o {≥, ≤}
represents an ascending or descending order or vari-
ation of the attribute values of i.
If we consider the numerical dataset of Table 1,
age
(respectively age
) is a gradual item express-
ing that the values of the attribute age are increasing
(respectively decreasing). Now, a gradual itemset (or
simply gradual pattern) is a non-empty set of grad-
ual items. Also, such an itemset is called a k-gradual
itemset if it contains exactly k gradual items. For
example, g
1
= {age
, salary
} is a 2-gradual item-
set, meaning that ”the higher the age, the higher is
the salary”. A gradual itemset sets a variation order
on several attributes simultaneously. The length of
a gradual itemset is equal to the number of gradual
items that it contains.
The support (also called frequency) of a grad-
ual itemset amounts to the extent to which a gradual
pattern is present in a numerical database. Several
support definitions have been proposed in the litera-
ture (H
¨
ullermeier, 2002; Calders et al., 2006; Berzal
et al., 2007; Laurent et al., 2009; Di-Jorio et al., 2009;
Kendall and Smith, 1939), showing that gradual item-
sets can follow different semantics. In (H
¨
ullermeier,
2002) the computation of the support of gradual item-
set is based on linear regression. In (Calders et al.,
2006; Berzal et al., 2007; Kendall and Smith, 1939),
the authors considered the proportion of couples of
tuples that verifies the constraints expressed by all the
gradual items of the itemset, while in (Di-Jorio et al.,
2009), the support is defined as the size of the longest
sequence of tuples supporting the gradual itemset. In
this paper, we adopt this last definition of support for
its relevance and generality. To introduce formally
this variant of support, let us first introduce the fol-
lowing additional definitions:
Definition 2. Let g = (i
o
1
1
, ..., i
o
k
k
) be a gradual itemset
and s = t
1
t
2
. . . t
n
a sequence of tuples.
Then, s is an extension of g if 1 p k and 1
j < n, we have:
(i
p
t
j
) o
p
(i
p
t
j+1
) (1)
It is important to note that there might be several
extensions or sequences of tuples validating g.
Definition 3. Let g be a gradual itemset in a numer-
ical database . We define Cover(g, ) as the set of
the longest extensions of g in w.r.t. set inclusion.
Example 1. Let us consider the database de-
picted in Table 1 and the gradual itemset g
1
=
{age
, salary
}. Cover(g
1
, ) = {⟨t
1
t
3
t
2
t
4
t
6
t
7
, t
1
t
3
t
2
t
5
t
6
t
7
, t
1
t
3
t
2
t
4
t
8
, t
1
t
3
t
2
t
5
t
8
⟩}.
From the same example, the cover of the gradual item
salary
(resp. cars
) is {⟨t
1
t
3
t
2
t
4
t
5
t
8
t
6
t
7
⟩} (resp. {⟨ t
5
t
6
t
8
t
1
t
2
t
7
t
3
t
4
⟩}).
Now, we are ready to give the definition of sup-
port.
Definition 4. Let be a numerical database and g be
a gradual itemset of . Then,
Supp(g, ) =
max{|s|, s Cover(g, )}
||
.
Example 2. Referring again to the database
of Table 1 and the gradual itemset g
1
, we have
Supp(g
1
, ) =
6
8
. So, six among the eight input tu-
ples can be ordered according to g
1
. Note that the
support of a gradual item is equal to 100% as it is al-
ways possible to order all of the tuples according to
the values of a single attribute.
A gradual itemset is said to be frequent if its sup-
port is greater than or equal to a user-defined support
threshold.
Definition 5. Let be a numerical database and λ
a minimum support threshold. The problem of min-
ing gradual itemsets is to find the set of all frequent
gradual itemsets of w.r.t. λ, i.e., finding the set
{g | Supp(g, ) λ}.
Definition 6. Let g = (i
o
1
1
, ..., i
o
k
k
) be a gradual item-
set, and
¯
f be a function such that
¯
f () = and
¯
f () =. Then
¯
f (g) = (i
¯
f (o
1
)
1
, ..., i
¯
f (o
k
)
k
) is the com-
plementary (symmetric) gradual itemset of g.
Interestingly, any gradual itemset admits a com-
plementary gradual one where the items are the
same but the variations are all reversed. For in-
stance, the complementary gradual itemset of g
1
is
(age
, salary
).
Proposition 1 ((Di-Jorio et al., 2009)). Let g be a
gradual itemset of a numerical database . We have
Supp(g, ) = Supp(
¯
f (g), ).
Proposition 1 avoids unnecessary computation, as
generating only a half of the set of the gradual item-
sets is sufficient to automatically deduce the comple-
mentary ones. As far as we know, there is no exist-
ing algorithm to mining both the gradual itemsets and
all their corresponding extensions. For each gradual
itemset, all the state-of-the-art algorithms looks for an
extension with the maximum size while there are ap-
plication domains where the extensions of the gradual
itemsets bring new knowledge to the user.
2.2 Boolean Satisfiability Problem
This section introduces the Boolean satisfiability
problem, or simply SAT. It corresponds to the prob-
DATA 2023 - 12th International Conference on Data Science, Technology and Applications
138
lem of deciding if a formula of propositional classi-
cal logic is consistent or not. It is one of the most
studied NP-complete decision problem. In this work,
we consider the associated problem of boolean model
enumeration.
We consider the conjunctive normal form (CNF,
for short) representation for the propositional formu-
las. A CNF formula F is a conjunction of clauses,
where a clause is a disjunction of literals. A literal is
a positive (l) or negated (¬l) propositional variable.
The two literals (l) and (¬l) are called complemen-
tary. We note by
¯
l the complementary literal of l. For
a set of literals L,
¯
L is defined as {
¯
l | l L}. Let
us recall that any propositional formula can be trans-
lated to CNF using linear Tseitin’s encoding (Tseitin,
1968). The set of variables occurring in F is noted
Var(F ).
An interpretation ρ of a boolean formula F is
a function which associates a value ρ(l) {0, 1} (0
correspond to false and 1 to true) to the variables
x Var(F ). A model of a formula is an interpretation
ρ that satisfies the formula. SAT problem consists in
deciding if a given formula admits a model or not.
3 SAT-BASED ENCODING FOR
DISCOVERING FREQUENT
GRADUAL PATTERNS
In this section, we show how the problem of mining
all the frequent gradual itemset in a numerical dataset
w.r.t. a minimum support threshold minSupp can be
encoded as a propositional formula. In order to for-
mally describe our encoding, we consider a numerical
dataset = T × A where A = {a
1
, . . . , a
n
} is a set of
attributes, T = {t
1
, ...,t
m
} a set of transactions, and
k the minimum support threshold. To model the fre-
quent gradual itemset mining task into SAT, we asso-
ciate with each gradual item a
(resp. a
) a Boolean
variable x
a
(resp. x
a
), meaning that the gradual
item is included in the gradual itemsets or not. Sim-
ilarly, with each transaction t
i
, we associate a set of
Boolean variables t
i1
, . . . ,t
ik
where t
i j
means that the
transaction t
i
is set on the jth position. For con-
straints modeling, our objective is to range a set of
k transactions from m that highlights a gradual item-
set. Otherwise, we have k positions that have to be
assigned with transactions. Constraint (2) allows to
not consider gradual itemset involving both a
and
a
of each attribute a.
^
a∈{a
1
,...,a
n
}
(¬x
a
¬x
a
) (2)
This first constraint solves the problem encoun-
tered with the specialized algorithm of frequent grad-
ual itemsets mining GLCM (Do et al., 2015) which of-
ten returns the gradual itemsets containing both the
gradual items and their corresponding complementary
gradual items.
The second constraint (3) allows us to indicate that
a position j {1, . . . , k} must be associated with one
transaction.
^
1 jk
(
n
i=1
t
i j
= 1) (3)
The Constraint (4) is introduced to not allow a
transaction to be placed in more than one position
among {1, . . . , k}.
^
1in
(
k
j=1
t
i j
1) (4)
Note that the two constraints (3) and (4) encodes
the well known pigeon-hole problem.
Constraint (5) aims to express, given a gradual
item a
, the set of transactions that can be set in posi-
tion j + 1 if transaction t
i
is putted in position j.
^
a
A
^
1in
^
1 jk
(x
a
t
i j
_
t
k
(a) t
i
(a)
t
k( j+1)
) (5)
Note that such constraint can be expressed differently
by considering only those that are not allowed as
stated in Constraint (6).
In contrast to (5), constraint (6) allows to add only
ternary clauses. However, their number is higher that
those of (5).
^
a
A
^
1in
^
1 jk
(x
a
t
i j
^
t
k
(a)
t
i
(a)
¬t
k( j+1)
) (6)
Example 3. Let us consider the transaction database
of Table 1. Assume that k = 5. If the gradual item-
set contains the gradual item car
, and if the support
is as t
7
is set on position 1, then the corresponding
constraints is as follows:
x
car
t
71
t
22
t
32
t
42
Proposition 2. There is a one-to-one mapping be-
tween the gradual itemsets and the models of the for-
mula Φ
D
k,n
= (2) (3) (4) (5) (6).
Proposition 2 links the gradual itemsets to the
models of our encoding.
Finally, in order to eliminate symmetrical gradual
itemsets, we add the following constraint:
^
a
i
a
1
...a
n
(¬x
a
i
_
1 j<i
¬x
a
i
) (7)
Extracting Frequent Gradual Patterns Based on SAT
139
In fact, the permutation σ = (a
1
, a
1
). . . (a
n
, a
n
) is a
symmetry of the proposed encoding. Consequently,
one can break such symmetry by adding the Symme-
try Breaking Predicates as defined in (Crawford et al.,
1996). More precisely, in (Crawford et al., 1996), for
a symmetry σ = (x
1
, y
1
). . . , (x
n
, y
n
) the author show
that to break this symmetry one can add the following
constraint
n
^
i=1
i1
^
j=1
(x
j
= y
j
) (x
i
y
i
)
Combining this constraint with the one of (2) leads to
the simplified Constraint (7).
Note that the constraint (7) allows to avoid com-
puting all gradual patterns and their corresponding
symmetric gradual pattern. However, this constraint
will add a certain number of variables and clauses to
the final boolean formula. We propose another direc-
tion to take into account this symmetrical without add
the constraint (7) but by adding two blocking clauses
in the NCF formula each time a model is found. One
clause to avoid finding the same model and another to
avoid finding a model corresponding to the symmetric
pattern.
Note that (
n
i=1
y
i j
= 1) (respectively (
k
j=1
y
i j
1)) represent linear equality (respectively inequal-
ity) commonly called exact-One (respectively atMo-
stOne Constraint). Such constraint can be encoding
in respectively O(n) (respectively O(k)) clauses us-
ing O(n) (respectively O(k)) additional variables as
indicated in constraint (8) (Warners, 1998; Silva and
Lynce, 2007). A possible encoding of
n
i=1
x
i
= 1 is
as follows using auxiliary variables {p
1
, . . . , p
n1
}.
(
_
1in
x
i
) (¬x
1
p
1
) (¬x
n
¬p
n1
)
^
1<i<n
(¬x
i
p
i
) (¬p
i1
p
i
) (¬x
i
¬p
i1
) (8)
From complexity point of view, let us note that
our encoding introduces O(k ×n ×m) clauses. In fact,
Constraint 2 is on O(n). Constraint 3 and 4 leads to
O(n × m). For Constraint 5 requires O(k × n × m).
Finally, Constraint 4 requires O(n). So to summarize,
the encoding is in O(n + k × m + k × n × m) = O(k ×
n × m). For the introduced variables. Let us mention
than this number is in O(k × m). In fact, in addition
to x
a
, x
a
and t
i j
, new variables must be added to
encoded cardinality constraints of Constraints 3 and
4. This number remains bounded by O(k × m).
As mentioned encoding gradual itemsets mining
into propositional satisfiability allows to have a more
flexible approach where new constraints can be added
to mine particular patterns. Typically, in many ap-
plication fields, interesting gradual patterns can be
distinguished from irrelevant ones by specifying se-
mantic constraints on the gradual pattern itself. For
example, the authors of (Lonlac et al., 2017) de-
signed an algorithm to mine temporal gradual pat-
terns which are gradual patterns whose the longest
sequence of transactions respect the temporal order.
These kinds of gradual patterns are particularly in-
teresting in the paleoecological domain where the
experts search from their paleoecological numeri-
cal data the patterns which capture the simultane-
ously frequent co-evolutions between attributes. As
the transactions are encoded in our CNF formula as
Boolean variables, the temporal constraint can be cap-
tured by selecting in the temporal order the proposi-
tional variables t
i j
representing the transaction identi-
fiers of the numerical dataset.
4 SAT-BASED ENUMERATION
PROCEDURE
In this section, we describe the SAT-Based enumer-
ation procedure to deal with the problem of enumer-
ating all models of the CNF formula Φ
D
k,n
. SAT is a
decision problem. When the answer is positive, the
current SAT solvers provide a model satisfying the
formula. In the sequel, we briefly describe the ba-
sic components of modern SAT solvers, also called
CDCL SAT solvers (Moskewicz et al., 2001; En and
S
¨
orensson, 2003) designed to enumerate all the mod-
els of a given CNF formula. To be exhaustive, these
solvers incorporate unit propagation (enhanced by
efficient and lazy data structures), variable activity-
based heuristic, literal polarity phase, clause learning,
restarts and a learnt clauses database reduction policy.
Algorithm 1 depicts the general scheme of CDCL
SAT solver extended for model enumeration. A SAT
solver is a tree-based backtrack search procedure; at
each node of the search tree, the assigned literals (de-
cision literal and the propagated ones) are labeled
with the same decision level starting from 1 and in-
creased at each decision (or branching).
Typically, this solver performs a tree-based back-
track search procedure. Each branch of the binary
search tree can be seen as a sequence of decision and
unit propagated literals. At each node, a decision vari-
able is chosen (ligne 23), and assigned to the true or
false polarity (selectPhase(l) - line 25). Then unit
propagation is performed in line 6. All these liter-
als (decision and propagated ones) assigned at a given
node are labelled with the same level dl. If all lit-
erals are assigned without contadiction, then ρ is a
model of F and the formula is answered to be sa-
tisfiable (line 16). As our boolean formula represents
DATA 2023 - 12th International Conference on Data Science, Technology and Applications
140
Algorithm 1: CDCL Based Enumeration solver.
Input: a CNF formula Φ
Output: All models of Φ
1 ρ =
/
0 ; /* interpretation */
2 δ =
/
0 ; /* learnt clauses database */
3 dl = 0 ; /* decision level */
4 while (true) do
5 Prop ;
6 γ = unitPropagation(Φ, I) ;
7 if γ ̸= null then
8 β = con f lictAnalysis(Φ, I, γ) ;
9 btl = computeBack jumpLevel(β, I) ;
10 if btl == 0 then
11 return UNSAT ;
12 δ = δ {β} ;
13 if restart() then
14 btl = 0;
15 back jump(btl) ;
16 dl = btl ;
17 else
18 if ρ |= Φ then
19 extractPatternFromModel(ρ) ;
20 addBlockedClause(ρ) ;
21 back jumpUntil(0) ;
22 goto Prop ;
23 if (timeToReduce()) then
24 reduceDB(δ) ;
25 l = selectDecisionVariable(Φ) ;
26 dl = dl + 1 ;
27 ρ = ρ {selectPhase(l)} ;
the encoding of the closed frequent gradual itemset
mining problem, each time a model is found, an grad-
ual itemset is extracted from ρ (line 17). For model
enumeration, the search continue by adding a blocked
clause to avoid enumerating again the same models
(line 18). Search restart at level 0, to search for the
next models (lines 19-20). The other case, is reached
when unit propagation (lines 8-14) leads to a conflict
(γ is the conflict clause), a new asserting clause β is
derived by conflict analysis (line 8), mostly follow-
ing the First-UIP scheme (’Unique Implication Point’
(Zhang et al., 2001)) A backtrack level (btl) is derived
from the asserting clause (line 9). If btl is null, then
the formula is answered unsatisfiable (line 10), oth-
erwise β is added to the learnt clauses database (line
11) and the algorithm backjump to the level btl (line
13). Regularly, the CDCL solver performs restarts, by
backtracking to level 0 (line 12) using one of the vari-
ous restart strategies ((Huang, )). Such restarts define
the frequency used by the solver to restart the search.
Finally, another component concern the learnt clauses
management policy. To maintain a learnt clauses
database of reasonable size, a reduction is performed
(line 22) using one the various strategies proposed in
the literature (Audemard and Simon, 2009; E
´
en and
S
¨
orensson, 2003; Lonlac and Mephu Nguifo, 2017;
Jabbour et al., 2014).
5 EXPERIMENTS
In this section, we carried out an experimental eval-
uation of the performance of our proposed approach.
we ran experiments both on the real-world paleoeco-
logical datasets and on the synthetic datasets. The pa-
leoecological dataset are constituted of a set of nu-
merical attributes whose the values correspond to the
quantity of each paleoecological indicator contained
in a sediment record taken, by coring operations, in a
lake ecosystem. It contains 111 objects corresponding
to different dates identified on the considered Lacus-
trine recording, and 117 attributes corresponding to
paleoecological indicators. All the experiments were
done on Intel Xeon quad-core machines with 32GB
of RAM running at 2.66 Ghz.
To solve the obtained formulas, we use the solver
MiniSAT2.2 (E
´
en and S
¨
orensson, 2003) adapted for
model enumeration since as proposed enumerating all
the models that satisfy the CNF formula which en-
codes the frequent gradual pattern mining problem
is equivalent to enumerating all frequent gradual pat-
terns.
The main procedure of our approach is given in
algorithm 2. This procedure compute and return all
frequent gradual patterns with respect to the mini-
mum support threshold minSupp. The procedure find-
AllModel corresponds to the algorithm 1 modified by
adding to the CNF formula two blocking clauses in-
stead of one blocking clause at each time that a model
is found during the resolution process. One block-
ing clause to avoid finding the same model and an-
other one to avoid finding a model corresponding to
the symmetric pattern of the extracted gradual pattern.
Algorithm 2: SAT Based Gradual Patterns Enumeration.
Input: a numerical database DS , a
minimum support minSupp
Output: Set of all frequent gradual patterns
1 F SAT Encoding(DS, minSupp) ;
2 f indAllModel(F ) ;
Table 2 presents results obtained on the paleoe-
cological dataset. It yields the size of the CNF for-
mula (number of variables and clauses) encoding the
gradual patterns mining problemn with respect to a
minimum support. In this table, we mention the for-
mula encoding the whole problem in terms of number
Extracting Frequent Gradual Patterns Based on SAT
141
of variables (#vars) and clauses (#clauses) with re-
spect to a minimum support threshold (#minSupp).
The last column gives in seconds the cpu time need
for encoding. The first observation that we can draw
from Table 2 is that our SAT-based approach gener-
ates huge CNF formulas in short time. For instance,
for a minimum support equal to 50%, our SAT en-
coding generates in 2.25 seconds, a CNF formula
with 18383 variables and 1438734 clauses. It is also
worth mentioning that the number of variables of the
CNF formula increases when the minimum support
increases (see Table 2) and the number of clauses
strongly increases.
Table 2: CNF encoding characteristics by varying the mini-
mum support threshold.
#minSupp #vars #clauses #encodingTime (in seconds)
5% 2 115 133 521 0.22
10% 3 775 266 706 0.43
20% 7 427 559 713 0.86
30% 11 079 852 720 1.31
40% 14 731 1 145 727 1.74
50% 18 383 1 438 734 2.25
60% 22 035 1 731 741 2.69
70% 25 687 2 024 748 3.12
80% 29 339 2 317 755 3.54
90% 32 991 2 610 762 4.03
Table 3 compares (run-times, in seconds) on a
synthetic dataset of 10 items and 100 transactions
our proposed SAT-based approach, which we coined
SAT4GIM to GRITE solver (Di-Jorio et al., 2009) and
, the efficient specialized algorithm for extracting
frequent gradual itemsets from numerical databases
when varying the minimum support threshold. We
generate the synthetic dataset using an adapted ver-
sion of IBM Synthetic Data Generation Code for As-
sociations and Sequential Patterns
1
. We also com-
pare our SAT-based approach to the one proposed
in (Hidouri et al., 2021) called SATGIM. The results
from Table 3 show that, for the small minimum sup-
port thresholds, our SAT-based approach is faster
than the efficient specialized algorithm for extracting
frequent gradual itemsets from numerical databases
GRITE (Di-Jorio et al., 2009) and SATGIM. On the
other hand, our proposal takes longer than other ap-
proaches to enumerate the complete set of gradual
patterns when the minimum support threshold is high.
It is worth mentioning that SAT4GIM makes it possible
to know for each frequent gradual pattern, the position
of each transaction belonging to its extension. That is
not the case for GRITE and SATGIM.
1
www.almaden.ibm.com/software/projects/hdb/resour
ces.shtml
Table 3: SAT4GIM vs (GRITE, SATGIM) for various min-
imum support values.
#minSupp GRITE SATGIM SAT4GIM #Gradual
0.02 3.82 1.51 0.61 59 001
0.03 3.71 5 0.91 38 923
0.04 3.51 7.4 1.40 14 507
0.05 3.45 8.32 1.90 5 741
0.1 3.29 7.38 4.71 411
0.15 3.09 6.98 7.2 201
0.2 2.62 6.70 10.35 75
0.3 2.59 2.04 16.37 33
0.4 2.58 0.80 22.04 27
0.5 2.50 0.17 28.19 21
6 CONCLUSION
In this paper, we proposed SAT encoding to address
the problem of mining frequent gradual patterns. This
declarative approach offers an additional possibility
to benefit from the recent progress in satisfiability
testing and to enumerate each gradual pattern with the
sequence of objects supporting it. We also performed
experiments with real-world and synthetic datasets to
show the effeciency of our proposal w.r.t. state-of-the-
art algorithms for mining gradual itemsets. Future di-
rections can be pursued to address various challenges.
First, we intend to develop a SAT-based encoding to
enumerate maximal frequent gradual patterns which
remains an open challenging and impactful problem
in gradual pattern mining. We also plan to perform
more experiments on large datasets.
REFERENCES
Aryadinata, Y. S., Lin, Y., Barcellos, C., Laurent, A., and
Libourel, T. (2014). Mining epidemiological dengue
fever data from brazil: A gradual pattern based geo-
graphical information system. In IMPU, pages 414–
423.
Audemard, G. and Simon, L. (2009). Predicting learnt
clauses quality in modern sat solvers. In Proceedings
of the 21st International Joint Conference on Artificial
Intelligence, IJCAI’09, pages 399–404.
Aumann, Y. and Lindell, Y. (1999). A statistical theory
for quantitative association rules. In SIGKDD, pages
261–270.
Ayouni, S., Laurent, A., Yahia, S. B., and Poncelet, P.
(2010). Mining closed gradual patterns. In Artificial
Intelligence and Soft Computing, 10th International
Conference, ICAISC 2010, Zakopane, Poland, June
13-17, 2010, Part I, pages 267–274.
Berzal, F., Cubero, J. C., S
´
anchez, D., Miranda, M. A. V.,
and Serrano, J. (2007). An alternative approach to
discover gradual dependencies. International Journal
of Uncertainty, Fuzziness and Knowledge-Based Sys-
tems, 15(5):559–570.
DATA 2023 - 12th International Conference on Data Science, Technology and Applications
142
Calders, T., Goethals, B., and Jaroszewicz, S. (2006). Min-
ing rank-correlated sets of numerical attributes. In
KDD, pages 96–105.
Crawford, J., Ginsberg, M. L., Luck, E., and Roy, A. (1996).
Symmetry-breaking predicates for search problems.
In Principles of Knowledge Representation and Rea-
soning (KR’96), pages 148–159.
Di-Jorio, L., Laurent, A., and Teisseire, M. (2008). Fast ex-
traction of gradual association rules: a heuristic based
method. In CSTST 2008: Proceedings of the 5th In-
ternational Conference on Soft Computing as Trans-
disciplinary Science and Technology, Cergy-Pontoise,
France, October 28-31, 2008, pages 205–210.
Di-Jorio, L., Laurent, A., and Teisseire, M. (2009). Min-
ing frequent gradual itemsets from large databases. In
Advances in Intelligent Data Analysis VIII, 8th Inter-
national Symposium on Intelligent Data Analysis, IDA
2009, Lyon, France, August 31 - September 2, 2009.
Proceedings, pages 297–308.
Do, T. D. T., Laurent, A., and Termier, A. (2010). PGLCM:
efficient parallel mining of closed frequent gradual
itemsets. In ICDM, pages 138–147.
Do, T. D. T., Termier, A., Laurent, A., N
´
egrevergne, B.,
Tehrani, B. O., and Amer-Yahia, S. (2015). PGLCM:
efficient parallel mining of closed frequent gradual
itemsets. Knowl. Inf. Syst., 43(3):497–527.
E
´
en, N. and S
¨
orensson, N. (2003). An extensible sat-solver.
pages 502–518.
En, N. and S
¨
orensson, N. (2003). An extensible SAT-solver.
pages 502–518.
Fan, C. and Xiao, F. (2017). Mining gradual patterns in
big building operational data for building energy ef-
ficiency enhancement. Energy Procedia, 143:119
124. Leveraging Energy Technologies and Policy Op-
tions for Low Carbon Cities.
Hidouri, A., Jabbour, S., Raddaoui, B., and Yaghlane, B. B.
(2021). Mining closed high utility itemsets based
on propositional satisfiability. Data Knowl. Eng.,
136:101927.
Huang, J. The effect of restarts on the efficiency of clause
learning. pages 2318–2323.
H
¨
ullermeier, E. (2002). Association rules for expressing
gradual dependencies. In Principles of Data Mining
and Knowledge Discovery, 6th European Conference,
PKDD 2002, Helsinki, Finland, August 19-23, 2002,
Proceedings, pages 200–211.
Jabbour, S., Lonlac, J., Sais, L., and Salhi, Y. (2014). Revis-
iting the learned clauses database reduction strategies.
CoRR, abs/1402.1956.
Jabbour, S., Sais, L., and Salhi, Y. (2013). The top-
k frequent closed itemset mining using top-k SAT
problem. In Machine Learning and Knowledge Dis-
covery in Databases - European Conference, ECML
PKDD 2013, Prague, Czech Republic, September 23-
27, pages 403–418.
Kaytoue, M., Kuznetsov, S. O., and Napoli, A. (2011). Re-
visiting numerical pattern mining with formal concept
analysis. In IJCAI, pages 1342–1347.
Kendall, M. and Smith, B. (1939). The problem of m rank-
ings. In The annals of mathematical statistics - Volume
10, pages 275–287.
Laurent, A., Lesot, M., and Rifqi, M. (2009). GRAANK:
exploiting rank correlations for extracting gradual
itemsets. In Flexible Query Answering Systems, 8th
International Conference, FQAS 2009, Roskilde, Den-
mark, October 26-28, 2009. Proceedings, pages 382–
393.
Laurent, A., N
´
egrevergne, B., Sicard, N., and Termier, A.
(2010). Pgp-mc: Towards a multicore parallel ap-
proach for mining gradual patterns. In DASFAA, Part
I, pages 78–84.
Lonlac, J. and Mephu Nguifo, E. (2017). Towards learned
clauses database reduction strategies based on domi-
nance relationship. CoRR, abs/1705.10898.
Lonlac, J., Miras, Y., Beauger, A., Mazenod, V., Peiry, J.-
L., and Mephu, E. (2018). An approach for extract-
ing frequent (closed) gradual patterns under temporal
constraint. In FUZZ-IEEE, pages 878–885.
Lonlac, J., Miras, Y., Beauger, A., Pailloux, M., Peiry, J.-L.,
and Nguifo, E. M. (2017). Une approche d’extraction
de motifs graduels (ferm
´
es) fr
´
equents sous contrainte
de la temporalit
´
e. Revue des Nouvelles Technologies
de l’Information, Extraction et Gestion des Connais-
sances, RNTI-E-33:213–224.
Masseglia, F., Laurent, A., and Teisseire, M. (2008). Grad-
ual trends in fuzzy sequential patterns. In In IPMU,
pages 456–463.
Moskewicz, M. W., Madigan, C. F., Zhao, Y., Zhang, L.,
and Malik, S. (2001). Chaff: Engineering an efficient
SAT solver. In Proceedings of the 38th Design Au-
tomation Conference (DAC’01), pages 530–535.
N
´
egrevergne, B., Termier, A., Rousset, M., and M
´
ehaut, J.
(2014). Para miner: a generic pattern mining algo-
rithm for multi-core architectures. DMKD, 28(3):593–
633.
Ngo, T., Georgescu, V., Laurent, A., Libourel, T., and
Mercier, G. (2018). Mining spatial gradual patterns:
Application to measurement of potentially avoidable
hospitalizations. In SOFSEM, pages 596–608.
Oudni, A., Lesot, M., and Rifqi, M. (2013). Processing
contradiction in gradual itemset extraction. In FUZZ-
IEEE, pages 1–8.
Ramakrishnan, S. and Rakesh, A. (1996). Mining quanti-
tative association rules in large relational tables. SIG-
MOD Rec., 25(2):1–12.
Salleb-Aouissi, A., Vrain, C., and Nortet, C. (2007). Quant-
miner: A genetic algorithm for mining quantitative as-
sociation rules. In IJCAI, pages 1035–1040.
Silva, J. P. M. and Lynce, I. (2007). Towards robust cnf en-
codings of cardinality constraints. In CP, pages 483–
497.
Tseitin, G. (1968). On the complexity of derivations in the
propositional calculus. In Slesenko, H., editor, Struc-
tures in Constructives Mathematics and Mathematical
Logic, Part II, pages 115–125.
Warners, J. P. (1998). A linear-time transformation of linear
inequalities into conjunctive normal form. Informa-
tion Processing Letters, 68(2):63 – 69.
Zhang, L., Madigan, C. F., Moskewicz, M. W., and Malik,
S. (2001). Efficient conflict driven learning in Boolean
satisfiability solver. In IEEE/ACM CAD’2001, pages
279–285.
Extracting Frequent Gradual Patterns Based on SAT
143