Evaluating Open Source Data Mining Tools for Business

Pedro Almeida, Le Gruenwald, Jorge Bernardino


Businesses are struggling to stay ahead of competition in a globalized economy where there are more and stronger competitors. Managers are constantly looking for advantages that can generate benefits at low costs. One way to have such advantage is using the data about customers, demographic data, purchase history, customer behavior and preferences that can help to take better business decisions. Data Mining addresses the challenges of collecting value inside data and the ways to put that value to use for virtually any area of our lives, including business. In this paper, we address the interest of Data Mining for business and analyze three popular Open Source Data Mining Tools – KNIME, Orange and RapidMiner – considered as a good starting point for enterprises to begin exploring the power of Data Mining and its benefits.


  1. Abbott, D. Elder, J. (1998) A Comparison of Leading Data Mining Tools. Fourth International Conference on Knowledge Discovery & Data Mining, New York.
  2. Alexander, D. Data Mining. [Online] Available from http://www.laits.utexas.edu/anorman/BUS.FOR/cour se.mat/Alex/ [Accessed: 2nd December 2015].
  3. Almeida, P., Bernardino, J. (2015) Big Data Open Source Platforms. IEEE International Congress on Big Data, pp. 268-275.
  4. Almeida, P., Bernardino, J. (2016) A Survey on Open Source Data Mining Tools for SMEs. New Advances in Information Systems and Technologies, Volume 444 of the series Advances in Intelligent Systems and Computing, pp. 253-262.
  5. Borges, C. L., Marques, M. V., Bernardino, J.. (2013). Comparison of data mining techniques and tools for data classification. C3S2E 7813 Proceedings of the International C* Conference on Computer Science and Software Engineering, pp. 113-116.
  6. Demšar, J., Curk, T. & Erjavec, A. (2013) Orange: Data Mining Toolbox in Python; Journal of Machine Learning Research, 14. p. 2349-2353.
  7. Dogan, N. & Tankrikulu, Z. (2012) A comparative analysis of classification algorithms in data mining for accuracy, speed and robustness. Information Technology and Management, 14 (2). p. 105-124.
  8. Fawcett, T. (2006). An introduction to ROC analysis. Journal Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition, 27(8). p. 861- 874.
  9. Fayyad, M. U., Piatetsky-Shapiro, G. and Smyth, P. (1996) Advances in knowledge discovery and data mining. p. 1-34. American Association for Artificial Intelligence, Menlo Park, CA.
  10. Fernández, A., Río, S., López, V., Bawakid, A., Jesus, M. J., Benítez, J. M., & Herrera, F. (2014) Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks. WIREs Data Mining Knowledge Discovery, 4. p. 380-409.
  11. Goebel, M. & Gruenwald, L. (1999) A survey of data mining and knowledge discovery software tools. ACM SIGKDD Explorations Newsletter, Vol. 1, No., 1, pp. 20-33.
  12. Grzymala-Busse, J, W. & Marepally, S, R. (2010) Sensitivity and Specificity for Mining Data with Increased Incompleteness. Artificial Intelligence and Soft Computing. Volume 6113 of the series Lecture Notes in Computer Science. p. 355-362.
  13. Hanczar, B., Hua, J., Sima, C., Weinstein, J., Bittner, M., Dougherty, E, R. (2010). Small-sample precision of ROC-related estimates. Bioinformatics, Vol. 26, No., 6, pp. 822-830.
  14. Hand, D, J. (2009). Measuring classifier performance: a coherent alternative to the area under the ROC curve, Vol. 77, No., 1, pp. 103-123.
  15. Hasim, N. & Haris, A. N. (2015) A study of open-source data mining tools for forecasting. IMCOM 7815 Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication. Article nº79.
  16. Jovic, A., Brkic, K. and Bogunovic, N. (2014) An overview of free software tools for general data mining. 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). p. 1112 - 1117.
  17. KNIME. [Online] Avaliable from http://www.knime.org [Accessed: 2nd December 2015].
  18. Lichman, M. (2013). UCI Machine Learning Repository [Online] Available from http://archive.ics.uci.edu/ml [Accessed: 2nd December 2015] Irvine, CA: University of California, School of Information and Computer Science.
  19. Medri, D. (2013) Big Data & Business: An on-going revolution. [Online] Available from http://www.statisticsviews.com/details/feature/539325 1/Big-Data--Business-An-on-going-revolution.html [Accessed: 30th November 2015]
  20. O'Brien, J. A. and Marakas, G. M. (2011) Management Information Systems, 10th Edition, McGraw-Hill, New York, USA.
  21. Petre, R. (2013). Data Mining Solutions for the Business Environment. Database Systems Journal, 4 (4), p. 21- 29.
  22. Powers, D. (2007). Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. Technical Report SIE-07-001. School of Informatics and Engineering, Adelaide, Australia.
  23. Rajagopal, S. (2011). Customer Data Clustering Using Data Mining Technique. International Journal of Database Management Systems (IJDMS), 3 (4), p. 1- 12.
  24. RapidMiner. [Online] Available from http:// rapidminer.com [Accessed: 2nd December 2015].
  25. Shen, D., Ruvini, J. & Sarwar B. (2012) Large-scale item categorization for e-commerce. CIKM 7812 Proceedings of the 21st ACM International Conference on Information and Knowledge Management. p. 595-604.
  26. Wahbeh, A., Al-Radaieh, Q., Al-Kabi, M., & AlShawakfa, E. (2011) International Journal of Advanced Computer Science and Applications, Special Issue on Artificial Intelligence. p. 18-26.
  27. Witten, H. I., Frank, E. & Hall, A. M. (2011) Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition. Morgan Kaufmann, Massachusetts.

Paper Citation

in Harvard Style

Almeida P., Gruenwald L. and Bernardino J. (2016). Evaluating Open Source Data Mining Tools for Business . In Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-193-9, pages 87-94. DOI: 10.5220/0005939900870094

in Bibtex Style

author={Pedro Almeida and Le Gruenwald and Jorge Bernardino},
title={Evaluating Open Source Data Mining Tools for Business},
booktitle={Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},

in EndNote Style

JO - Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Evaluating Open Source Data Mining Tools for Business
SN - 978-989-758-193-9
AU - Almeida P.
AU - Gruenwald L.
AU - Bernardino J.
PY - 2016
SP - 87
EP - 94
DO - 10.5220/0005939900870094