administrators to focus on a subset of most advanta-
geous indexes and that it avoids the generation of un-
wanted indexes.
The remainder of this paper is organized as fol-
lows: in Section 2 we present existing works re-
lated to bitmap join indexes selection problem and
constraint-based mining. Section 3 describes the pro-
posed approach for the bitmap join indexes selec-
tion. We experimentally study the efficiency of our
approach in Section 4. We conclude the paper and
present future directions in Section 5.
2 RELATED WORK
2.1 Bitmap Join Index Selection
The index selection problem has been studied first
in traditional databases context (Chaudhuri and
Narasayya, 1997), (Agrawal et al., 2000), (Chaudhuri
et al., 2004), (Feldman and Reouven, 2003), (Frank
et al., 1992), (Valentin et al., 2000). With the ad-
vent of data warehouse, indexation has become an im-
portant option in physical design and its importance
is well recognized (Golfarelli et al., 2002). The in-
dex selection problem has been proven to be NP-hard
(Chaudhuri et al., 2004). Thus, most studies in the
literature have focused on finding approximate solu-
tions using greedy strategies or heuristics-based ap-
proaches.
The aim of the proposed approaches is to deter-
mine a set of candidate indexes from a given work-
load of queries, then to propose a final indexes config-
uration providing the best profit, under storage space
constraint. However, considered indexes usually con-
cern one table. Bitmap join indexes are multi-attribute
indexes involving several tables. Selecting a suitable
configuration of Bitmap join indexes is more compli-
cated than the classical mono-table indexes, since it
requires the exploration of a large search space. To
the best of our knowledge, only few studies dealing
with the problem of selecting bitmap join indexes are
carried out (Aouiche et al., 2005), (Bellatreche et al.,
2007), (Bellatreche and Boukhalfa, 2010), (Ziani and
Ouinten, 2011). Due to the large number of can-
didate indexes, the proposed approaches mainly fo-
cused on pruning the search space of potential in-
dexes. They have used frequent itemsets (Aouiche
et al., 2005), (Bellatreche et al., 2007), (Ziani and
Ouinten, 2011) or heuristic strategies (Bellatreche and
Boukhalfa, 2010) to perform the pruning process. In
(Aouiche et al., 2005), (Bellatreche et al., 2007) the
Close algorithm (Pasquier et al., 1999) for mining
closed frequent itemsets is used to prune the search
space of candidate indexes. Due to the large number
of indexes generated as closed frequent itemsets, the
authors in (Ziani and Ouinten, 2011) propose a max-
imal frequent itemsets based approach to perform the
selection.
In (Bellatreche and Boukhalfa, 2010), the authors
propose an intuitive algorithm for bitmap join indexes
selection. As an initial configuration, the algorithm
selects an index for each query having indexable at-
tributes. When the size of the configuration exceeds
the storage capacity S , some selected indexes should
be reduced until the satisfaction of S.
The principal weakness of the proposed ap-
proaches is the large number of generated indexes,
that is very difficult to manage, according to the sys-
tem limitations (number of indexes per table and stor-
age space constraint). Indeed, the pruning is done af-
ter the generation of the indexes configuration.
An alternative is to constrain the input data earlier
in the selection process, thereby reducing the output
size to directly discover indexes that are of interest
for the administrator. We believe that a constraint-
based approach will help to mine a reduced and more
relevant indexes configuration.
2.2 Constraint-based Pattern Mining
Mining frequent itemsets (FI) in datasets is a de-
manding task common to several important data min-
ing applications, that look for interesting patterns
within databases (e.g., association rules, correlations,
sequences, episodes, classifiers, clusters). It was
originally proposed in (Agrawal and Srikant, 1994),
(Agrawal et al., 1993) with the Apriori algorithm.
The drawback of mining frequent itemsets is that,
if there is a large frequent itemset with size s, then
almost all 2
s
candidate subsets of the itemset might
be generated and tested. Furthermore, the number of
frequent itemsets grows very quickly as the minimum
support threshold decreases.
Moreover, the huge size of the output compli-
cates the task of the analyst, who has to extract use-
ful knowledge from a very large amount of frequent
patterns. To overcome this problem, the paradigm of
pattern discovery based on constraints was introduced
with the aim at providing a tool for driving the discov-
ery process towards potentially interesting patterns.
Using constraints can be of a great help to purge a
lot of patterns that are irrelevant for the user.
Constraint-based mining has then been widely ad-
dressed, with really different approaches. The mostly
used constraints are the minimum or maximum sup-
port threshold, including (or being included in) some
specific itemset, aggregated computation (sum, aver-
ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems
94