Skip Search Approach for Mining Probabilistic Frequent Itemsets from Uncertain Data

Takahiko Shintani, Tadashi Ohmori, Hideyuki Fujita

Abstract

Due to wider applications of data mining, data uncertainty came to be considered. In this paper, we study mining probabilistic frequent itemsets from uncertain data under the Possible World Semantics. For each tuple has existential probability in probabilistic data, the support of an itemset is a probability mass function (pmf). In this paper, we propose skip search approach to reduce evaluating support pmf for redundant itemsets. Our skip search approach starts evaluating support pmf from the average length of candidate itemsets. When an evaluated itemset is not probabilistic frequent, all its superset of itemsets are deleted from candidate itemsets and its subset of itemset is selected as a candidate itemset to evaluate next. When an evaluated itemset is probabilistic frequent, its superset of itemset is selected as a candidate itemset to evaluate next. Furthermore, our approach evaluates the support pmf by difference calculus using evaluated itemsets. Thus, our approach can reduce the number of candidate itemsets to evaluate their support pmf and the cost of evaluating support pmf. Finally, we show the effectiveness of our approach through experiments.

References

  1. Aggarwal, C., Li, Y., and Wang, J. (2009). Frequent pattern mining with uncertain data. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
  2. Aggarwal, C. and Yu, P. (2009). A survey of uncertain data algorithms and applications. In IEEE Transactions on Knowledge and Data Enginerring.
  3. Agrawal, R. and R.Srikant (1994). Fast algorithm for mining association rules. In 20th International Conference on Very Large Data Bases.
  4. Bayardo, R. (1998). Efficiently mining long patterns from databases. In 1998 ACM SIGMOD International Conference on Management of Data.
  5. Bernecker, T., Kriegel, H., Renz, M., Verhein, F., and Zuefle, A. (2009). Probabilistic frequent itemset mining in uncertain databases. In 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
  6. Chuim, C., Kao, B., and Hung, E. (2007). Mining frequent itemsets from uncertain data. In 11th PacificAsia Conference on Knowledge Discovery and Data Mining.
  7. Cuzzocrea, A., Leung, C., and MacKinnon, R. (2015). Approcimation to expected support of frequent itemsets in mining probabilistic sets of uncertain data. In 19th Annual Conference in Knowledge-Based and Intelligent Information and Engineering Systems.
  8. Dalvi, N. and Suciu, D. (2004). Efficient query evaluation on probabilistic databases. In 13th International Conference on Very Large Data Bases.
  9. Han, J., Pei, J., and Y.Yin (2000). Mining frequent patterns without candidate generation. In 2000 ACM SIGMOD International Conference on Management of Data.
  10. Leung, C., Carmichael, C., and Hao, B. (2007). Efficient mining of frequent patterns from uncertain data. In Workshops of 7th IEEE International Conference on Data Mining.
  11. Leung, C., Mateo, M., and Brajczuk, D. (2008). A treebased approach for frequent pattern mining from uncertain data. In 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining.
  12. Leung, C. and Tanbeer, S. (2013). Puf-tree: a compact tree structure for frequent pattern mining of uncertain data. In 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining.
  13. MacKinnon, R., Strauss, T., and Leung, C. (2014). Disc: efficient uncertain frequent pattern mining with tightened upper bound. In Workshops of 14th IEEE International Conference on Data Mining.
  14. Sun, L., Cheng, R., Cheung, D., and Cheng, J. (2010). Mining uncertain data with probabilistic gurantees. In 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
  15. Tateshima, H., Shintani, T., Ohmori, T., and Fujita, H. (2015). Skip search approach for mining frequent itemsets from uncerdain dataset. In DBSJ Japanese Journal.
  16. Wang, L., Cheung, D., and Cheung, R. (2012). Efficient mining of frequent item sets on large uncertain databases. In IEEE Transactions on Data Engineering.
  17. Wang, L., Feng, L., and Wu, M. (2013). At-mine: an efficient algorithm of frequent itemset mininf on uncertain dataset. In Journal of Computers.
  18. Zhang, Q., Li, F., and Yi, K. (2008). Finding frequent items in probabilistic data. In 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Download


Paper Citation


in Harvard Style

Shintani T., Ohmori T. and Fujita H. (2016). Skip Search Approach for Mining Probabilistic Frequent Itemsets from Uncertain Data . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 174-180. DOI: 10.5220/0006035401740180


in Bibtex Style

@conference{kdir16,
author={Takahiko Shintani and Tadashi Ohmori and Hideyuki Fujita},
title={Skip Search Approach for Mining Probabilistic Frequent Itemsets from Uncertain Data},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},
year={2016},
pages={174-180},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006035401740180},
isbn={978-989-758-203-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - Skip Search Approach for Mining Probabilistic Frequent Itemsets from Uncertain Data
SN - 978-989-758-203-5
AU - Shintani T.
AU - Ohmori T.
AU - Fujita H.
PY - 2016
SP - 174
EP - 180
DO - 10.5220/0006035401740180