ing FPTrees are also constructed. Though, the perfor-
mance gain shown, the original version of FPGrowth
induces, sadly, an abundant memory and time over-
head due to repetitive sorting and reconstructions.
The algorithms of the second class focus on the
minimum set of FI, called the cover, which allows to
generate all the rest (Pasquier et al., 1999). Thereby,
the closed and maximal FI notions have been intro-
duced. These approaches use Formal Concept Analy-
sis (FCA) (Wille, 1982) to extract the set of frequent
concepts, that constitutes a condensed representation
of the entire set of FI.
The concern of the algorithms of the third class
was the incrementality. That is, how to generate
the set of FI, and to maintain it in the case of dy-
namic datasets (Valtchev et al., 2008). Here, the same
philosophies were adopted in either algorithmic fash-
ion or using FCA.
Summing up, after more than two decades of ac-
tive research on the subject, with countless techniques
including various efficient algorithms and judicious
data structures each with its benefits and drawbacks,
we believe that it will be convenient to go back and
ask a key question: Besides the existing ones (Godin
et al., 1995; Zaki and Ogihara, 1998), are there other
formalisms for this basic problem ? In other words,
we aim to develop a general unifying model able to
express the works done so far in the main state of the
art approaches. We wish, moreover, that the proposed
formalism should enjoy some capitals characteristics
such as : the completeness while remaining simple
and intuitive, extensibility, and efficiency. That is,
provide, why not, an implementation having a better
performances, if not stand at least comparable to those
of the existing techniques.
The elaboration of unifying models is a well-
established issue in DM (Yang and Wu, 2006). We
postulate that the unification can be facilitated if we
focus on a particular DM-task. In this paper, we ad-
dress this question for the FI-mining problem. In-
deed, we introduce a new model for enumerating all
FI based on formal series, which meets the above pro-
prieties. First, it defines a unified theoretical frame-
work, which leads to see the equivalence of the al-
gorithms as stipulated in (Goethals and Zaki, 2003)
and confirmed in one of the early comparative stud-
ies (Hipp et al., 2000). Second, it allows their gener-
alization for mining more complex objects. We prove
also a natural decomposition scheme, often required
in many aspects of the problem such that incremen-
tality or parallelization. Moreover, we explain how
this problem can be transposed to that of the realiza-
tion of a formal series by a weighted automaton (Sa-
lomaa et al., 1978), and consequently, to that of word
recognition, which is a largely invested topic with a
very mature algorithmic. Finally, we propose an ef-
ficient algorithm to enumerate all FI, which runs in
place without extra memory.
The remaining of this paper is organized as fol-
lows. We begin, in Section 2, by some preliminaries
on the basic concepts and notions to be used through-
out this article. In Section 3, we recall the FI mining
problem, and introduce our model. Section 4 is de-
voted to the definition, the proofs, the construction of
the proposed automaton, and the analysis of the min-
ing algorithm. In Section 5, we discuss our model
against the existing techniques and show how these
can be derived from it, and conclude in Section 6 with
some extensions.
2 PRELIMINARIES
A set M with an associative binary internal opera-
tion ∗ admitting a unique e ∈ M as an identity ele-
ment forms a structure of monoid, which we denote
(M,∗,e). When the operation ∗ is also commutative
then the monoid is commutative. The popular exam-
ple is the free monoid A
∗
of the set of words over an
alphabet A equipped with the concatenation of two
words, and having the empty word ε as an identity
element.
A word u is a prefix (respectively a suffix) of a
word w if there exists a word v such that w = uv
(respectively w = vu). The set of the prefixes of a
word u will be denoted Pref(u). This concept can
be extended to a set of words by performing the
union of the prefixes of its elements. A word u =
u
1
... u
k
is a subsequence of a word w = w
1
... w
l
(k ≤
l) if there exist words v
1
,.. .,v
k+1
, such that w =
v
1
u
1
v
2
u
2
... v
k
u
k
v
k+1
. We write then u 4 w.
A semiring is a tuple (K,+,×,0,1) such that:
(K,+,0) is a commutative monoid, (K,×, 1) is a
monoid, × distributes on both sides over +, and 0 is
an absorbing element with respect to ×. Examples
of semirings are (N,+,×,0,1) of positive integers,
(B,∨,∧, ⊥,>) of booleans, and the tropical semiring
(N ∪ {∞},min,+, ∞,0).
Over the monoid A
∗
, we define a formal series
S
with coefficients in a semiring K as a mapping
S
:
A
∗
→ K, which associates with each w its coefficient
h
S
,wi. The series
S
itself will be written as a sum:
S
=
∑
w∈A
∗
h
S
,wiw (2.1)
The set range(
S
) = {w ∈ A
∗
| h
S
,wi 6= 0} of words
with non-null coefficients is called the range of the
series
S
(also called its support, but we prefer range
DATA2015-4thInternationalConferenceonDataManagementTechnologiesandApplications
50