4 PERFORMANCE EVALUATION
In this section, we present the performance of Bal-
ancedHider and compare it to that of CyclicHider
and BBHider. The test computer used was equipped
with Intel Core2Duo 3.0Ghz processor, 2GB of main
memory and running Windows XP64 operating sys-
tem. For performance evaluation, a synthetic dataset
was generated using IBM synthetic dataset generator,
namely
T10.I4.50K
. We also used a real world mar-
ket basket database
Retail
(Brijs et al., 1999) from
FIMI repository.
Retail
contains 88163 transactions,
16470 different items, and 13 items per transaction on
average. For each of the two databases, 20 sensitive
frequent itemsets were selected somewhat arbitrarily,
and each sensitive itemset contains either two or three
items. Average support of sensitive itemsets are 249.8
for
T10.I4.50K
, 977.95 for
Retail
.
We always use runtime as the only efficiency met-
ric and use five different (M0 through M4) effec-
tiveness metrics to measure the distortion as follows.
Note that all the metrics except M2 have the ’the
smaller is the better’ property, and vice versa for M2.
• Runtime (in seconds): It equals to the completion
time.
• Data Dist. (M0): It equals to
∑
T∈D
|T| −
∑
T∈D
′ |T|.
• Information Loss (M1) (Oliveira and Za¨ıane,
2003) : It equals to
∑
i∈I
sup
D
({i})−sup
D
′
({i})
∑
i∈I
sup
D
({i})
.
• Quality (M2): It equals to
F
(
D
′
,ψ
)
|
F
(D,ψ)
−P
h
|
.
• Freq. Support Dist. (M3) (Abul et al., 2007b): It
equals to
1
F
(
D
′
,ψ
)
∑
X∈F
(
D
′
,ψ
)
sup
D
(X)−sup
D
′
(X)
sup
D
(X)
.
• Freq. Pattern Dist. (M4) (Abul et al., 2007b): It
equals to
|
F
(D,ψ)
|
−
F
(
D
′
,ψ
)
|
F
(D,ψ)
|
.
The results are plotted in Fig. 4. The results,
in summary, show that the effectiveness performance
of BalancedHider ranges between that of CyclicHider
and BBHider and efficiency performance is close to
that of CyclicHider.
5 CONCLUSIONS
In this work, we introduced a new algorithm for
sensitive frequent itemset hiding problem, which
aimed at finding solutions balancing the effi-
ciency/effectiveness tradeoff. The motivation was
built on our analysis that there is a big efficiency gap
between simple and sophisticated algorithms while
that of the effectiveness gap is relatively small. The
experimental results on two datasets confirm that the
algorithm indeed achieved its design criteria.
Our algorithm is very practical, making it useful
in many domains like online database publishing. Our
future work will include extension of the algorithm to
other knowledge formats.
ACKNOWLEDGEMENTS
The work is supported by TUBITAK under the grant
number 108E016.
REFERENCES
Abul, O., Atzori, M., Bonchi, F., and Giannotti, F. (2007a).
Hiding sensitive trajectory patterns. In 6th Int. Work-
shop on Privacy Aspects of Data Mining (PADM’07).
Abul, O., Atzori, M., Bonchi, F., and Giannotti, F. (2007b).
Hiding sequences. In Third ICDE Int. Workshop on
Privacy Data Management (PDM’07).
Abul, O., G¨okc¸e, H., and S¸engez, Y. (2009). Frequent item-
sets hiding: A performance evaluation framework. In
ISCIS’09.
Agrawal, R., Imielienski, T., and Swami, A. (1993). Min-
ing association rules between sets of items in large
databases. In SIGMOD ’93, pages 207–216.
Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., and
Verykios, V. S. (1999). Disclosure limitation of sensi-
tive rules. In KDEX’99, pages 45–52.
Brijs, T., Swinnen, G., Vanhoof, K., and Wets, G. (1999).
Using association rules for product assortment deci-
sions: A case study. In Knowledge Discovery and
Data Mining, pages 254–260.
Lee, G., Chang, C.-Y., and Chen, A. L. P. (2004). Hid-
ing sensitive patterns in association rules mining. In
COMPSAC’04.
Moustakides, G. V. and Verykios, V. S. (2006). A max-min
approach for hiding frequent itemsets. In ICDM’06.
O’Leary, D. E. (1991). Knowledge discovery as a threat
to database security. In Piatetsky-Shapiro, G. and
Frawley, W. J., editors, Knowledge Discovery in
Databases, pages 507–516. AAAI/MIT Press.
Oliveira, S. R. M. and Za¨ıane, O. R. (2003). Protecting sen-
sitive knowledge by data sanitization. In ICDM’03.
Sun, X. and Yu, P. S. (2005). A border-based approach for
hiding sensitive frequent itemsets. In ICDM’05.
Verykios, V. S., Elmagarmid, A. K., Bertino, E., Saygin, Y.,
and Dasseni, E. (2004). Association rule hiding. IEEE
TKDE, 16/4:434–447.
Weng, C.-C., Chen, S.-T., and Chang, Y.-C. (2007). A novel
algorithm for hiding sensitive frequent itemsets. In
ISIS’07.
A TRADEOFF BALANCING ALGORITHM FOR HIDING SENSITIVE FREQUENT ITEMSETS
205