Multiple Behavioral Models: A Divide and Conquer Strategy to Fraud Detection in Financial Data Streams
Roberto Saia, Ludovico Boratto, Salvatore Carta
2015
Abstract
The exponential and rapid growth of the E-commerce based both on the new opportunities offered by the Internet, and on the spread of the use of debit or credit cards in the online purchases, has strongly increased the number of frauds, causing large economic losses to the involved businesses. The design of effective strategies able to face this problem is however particularly challenging, due to several factors, such as the heterogeneity and the non stationary distribution of the data stream, as well as the presence of an imbalanced class distribution. To complicate the problem, there is the scarcity of public datasets for confidentiality issues, which does not allow researchers to verify the new strategies in many data contexts. Differently from the canonical state-of-the-art strategies, instead of defining a unique model based on the past transactions of the users, we follow a Divide and Conquer strategy, by defining multiple models (user behavioral patterns), which we exploit to evaluate a new transaction, in order to detect potential attempts of fraud. We can act on some parameters of this process, in order to adapt the models sensitivity to the operating environment. Considering that our models do not need to be trained with both the past legitimate and fraudulent transactions of a user, since they use only the legitimate ones, we can operate in a proactive manner, by detecting fraudulent transactions that have never occurred in the past. Such a way to proceed also overcomes the data imbalance problem that afflicts the machine learning approaches. The evaluation of the proposed approach is performed by comparing it with one of the most performant approaches at the state of the art as Random Forests, using a real-world credit card dataset.
References
- Assis, C., Pereira, A., Pereira, M., and Carrano, E. (2013). Using genetic programming to detect fraud in electronic transactions. In Proceedings of the 19th Brazilian symposium on Multimedia and the web, pages 337-340. ACM.
- Batista, G. E., Carvalho, A. C., and Monard, M. C. (2000). Applying one-sided selection to unbalanced datasets. In MICAI 2000: Advances in Artificial Intelligence, pages 315-325. Springer.
- Batista, G. E., Prati, R. C., and Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6(1):20-29.
- Bhattacharyya, S., Jha, S., Tharakunnel, K. K., and Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3):602-613.
- Bolton, R. J. and Hand, D. J. (2002). Statistical fraud detection: A review. Statistical Science, pages 235-249.
- Drummond, C., Holte, R. C., et al. (2003). C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on learning from imbalanced datasets II, volume 11. Citeseer.
- Fan, G. and Zhu, M. (2011). Detection of rare items with target. Statistics and Its Interface, 4:11-17.
- Gao, J., Fan, W., Han, J., and Philip, S. Y. (2007). A general framework for mining concept-drifting data streams with skewed distributions. In SDM, pages 3- 14. SIAM.
- Hamilton, J. D. (1994). Time series analysis, volume 2. Princeton university press Princeton.
- Holte, R. C., Acker, L., Porter, B. W., et al. (1989). Concept learning and the problem of small disjuncts. In IJCAI, volume 89, pages 813-818. Citeseer.
- Japkowicz, N. and Stephen, S. (2002). The class imbalance problem: A systematic study. Intell. Data Anal., 6(5):429-449.
- Kuncheva, L. I. (2008). Classifier ensembles for detecting concept change in streaming data: Overview and perspectives. In 2nd Workshop SUEMA, pages 5-10.
- Phua, C., Lee, V. C. S., Smith-Miles, K., and Gayler, R. W. (2010). A comprehensive survey of data mining-based fraud detection research. CoRR, abs/1009.6119.
- Pozzolo, A. D., Caelen, O., Borgne, Y. L., Waterschoot, S., and Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl., 41(10):4915-4928.
- Wang, H., Fan, W., Yu, P. S., and Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 226-235. ACM.
Paper Citation
in Harvard Style
Saia R., Boratto L. and Carta S. (2015). Multiple Behavioral Models: A Divide and Conquer Strategy to Fraud Detection in Financial Data Streams . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 496-503. DOI: 10.5220/0005637104960503
in Bibtex Style
@conference{kdir15,
author={Roberto Saia and Ludovico Boratto and Salvatore Carta},
title={Multiple Behavioral Models: A Divide and Conquer Strategy to Fraud Detection in Financial Data Streams},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={496-503},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005637104960503},
isbn={978-989-758-158-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Multiple Behavioral Models: A Divide and Conquer Strategy to Fraud Detection in Financial Data Streams
SN - 978-989-758-158-8
AU - Saia R.
AU - Boratto L.
AU - Carta S.
PY - 2015
SP - 496
EP - 503
DO - 10.5220/0005637104960503