An Accurate Tax Fraud Classifier with Feature Selection based on Complex Network Node Centrality Measure

Tales Matos, José Antonio F. de Macedo, José Maria Monteiro, Francesco Lettich

2017

Abstract

Fiscal evasion represents a very serious issue in many developing countries. In this context, tax fraud detection constitutes a challenging problem, since fraudsters change frequently their behaviors to circumvent existing laws and devise new kinds of frauds. Detecting such changes proves to be challenging, since traditional classifiers fail to select features that exhibit frequent changes. In this paper we provide two contributions that try to tackle effectively the tax fraud detection problem: first, we introduce a novel feature selection algorithm, based on complex network techniques, that is able to capture determinant fraud indicators -- over time, this kind of indicators turn out to be more stable than new fraud indicators. Secondly, we propose a classifier that leverages the aforementioned algorithm to accurately detect tax frauds. In order to prove the validity of our contributions we provide an experimental evaluation, where we use real-world datasets, obtained from the State Treasury Office of Cear{\'a} (SEFAZ-CE), Brazil, to show how our method is able to outperform, in terms of F1 scores achieved, state-of-the-art approaches available in the literature.

References

  1. Abbott, L. J., Park, Y., and Parker, S. (2000). The effects of audit committee activity and independence on corporate fraud. Managerial Finance, 26(11):55-68.
  2. Agrawal, R., Srikant, R., et al. (1994). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487- 499.
  3. Bhattacharyya, S., Jha, S., Tharakunnel, K., and Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3):602-613.
  4. Bland, J. M. and Altman, D. G. (1996). Statistics notes: measurement error. Bmj, 313(7059):744.
  5. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., and Hwang, D.-U. (2006). Complex networks: Structure and dynamics. Physics reports, 424(4):175-308.
  6. David Meyer (2017). Support Vector Machines. FH Technikum Wien, Austria.
  7. Fanning, K. M. and Cogger, K. O. (1998). Neural network detection of management fraud using published financial data. International Journal of Intelligent Systems in Accounting, Finance & Management, 7(1):21-41.
  8. Glancy, F. H. and Yadav, S. B. (2011). A computational model for financial reporting fraud detection. Decision Support Systems, 50(3):595-601.
  9. Golub, G. H. and Reinsch, C. (1970). Singular value decomposition and least squares solutions. Numerische Mathematik, 14(5):403-420.
  10. Kirkos, E., Spathis, C., and Manolopoulos, Y. (2007). Data mining techniques for the detection of fraudulent financial statements. Expert systems with applications, 32(4):995-1003.
  11. Kohavi, R. et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI, volume 14, pages 1137-1145.
  12. Li, S.-H., Yen, D. C., Lu, W.-H., and Wang, C. (2012). Identifying the signs of fraudulent accounts using data mining techniques. Computers in Human Behavior, 28(3):1002-1013.
  13. Matos, T., de Macedo, J. A. F., and Monteiro, J. M. (2015). An empirical method for discovering tax fraudsters: A real case study of brazilian fiscal evasion. InProceedings of the 19th International Database Engineering & Applications Symposium, pages 41-48. ACM.
  14. Montgomery, D. C. (2007). Introduction to statistical quality control. John Wiley & Sons.
  15. Ngai, E., Hu, Y., Wong, Y., Chen, Y., and Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50(3):559-569.
  16. Phua, C., Lee, V., Smith, K., and Gayler, R. (2010). A comprehensive survey of data mining-based fraud detection research. arXiv preprint arXiv:1009.6119.
  17. R Core Team (2016). R: A Language and Environment for Statistical Computing - version 0.99.903. R Foundation for Statistical Computing, Vienna, Austria.
  18. Ravisankar, P., Ravi, V., Rao, G. R., and Bose, I. (2011). Detection of financial statement fraud and feature selection using data mining techniques. Decision Support Systems, 50(2):491-500.
  19. Sánchez, D., Vila, M., Cerda, L., and Serrano, J.-M. (2009). Association rules applied to credit card fraud detection. Expert Systems with Applications, 36(2):3630- 3640.
  20. Shavers, C., Li, R., and Lebby, G. (2006). An svm-based approach to face detection. In 2006 Proceeding of the Thirty-Eighth Southeastern Symposium on System Theory, pages 362-366. IEEE.
  21. Sonegometro (2016 (Retrieved 2016)). Tax Evasion in www.quantocustaobrasil.com.br. December Brazil.
  22. Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduction to Data Mining, (First Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
Download


Paper Citation


in Harvard Style

Matos T., Macedo J., Monteiro J. and Lettich F. (2017). An Accurate Tax Fraud Classifier with Feature Selection based on Complex Network Node Centrality Measure . In Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-247-9, pages 145-151. DOI: 10.5220/0006335501450151


in Bibtex Style

@conference{iceis17,
author={Tales Matos and José Antonio F. de Macedo and José Maria Monteiro and Francesco Lettich},
title={An Accurate Tax Fraud Classifier with Feature Selection based on Complex Network Node Centrality Measure},
booktitle={Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2017},
pages={145-151},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006335501450151},
isbn={978-989-758-247-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - An Accurate Tax Fraud Classifier with Feature Selection based on Complex Network Node Centrality Measure
SN - 978-989-758-247-9
AU - Matos T.
AU - Macedo J.
AU - Monteiro J.
AU - Lettich F.
PY - 2017
SP - 145
EP - 151
DO - 10.5220/0006335501450151