For the others categories RapidMiner and Knime
has almost the same score, the biggest difference is
seen in the categories “Documentation” and
“Community and Adoption”. Although Knime has
increased its community of users, RapidMiner is still
on the top of the most used datamining tools, and
because of this it has a lot of documentation on the
internet and grants access to the help of a huge
community.
Weka is the tool that presents the worst results in
all the categories, which means that between this data
mining tools it is the worst.
After the evaluation for each category, the last
step in this methodology is to calculate the final score.
For each category, it is necessary to multiply the score
with the respective weight assigned.
RapidMiner = 5 x 0.25 + 4.3 x 0.20 + 4.2 x 0.15 + 5
x 0.12 + 4.8 x 0.12 + 4.2 x 0.1+ 4.5 x 0.06 = 4.606
Knime = 4 x 0.25 + 4.4 x 0.20 + 4.1 x 0.15 + 3.5 x
0.12 + 4.1 x 0.12 + 4.1 x 0.1 + 4.3 x 0.06 = 4.075
Weka = 4 x 0.25 + 2 x 0.20 + 2 x 0.15 + 1.5 x 0.12 +
2 x 0.12 + 1.5 x 0.1 + 3.5 x 0.06 = 2.48
Table 4: OSSpal final score.
Score
RapidMiner Knime Weka
TOTAL
4.606 4.07 2.48
As shown in Table 4, RapidMiner is the tool that
obtained the best final score with the application of
the OSSpal methodology, with a final score of 4.606
(from 1 to 5). Next Knime appears with 4.07 and then
Weka with the worst score 2.48.
5 CONCLUSIONS AND FUTURE
WORK
The rise of the Internet has meant that there are more
and more open source tools that have the same quality
and functionality as commercial tools. Therefore,
companies need to be aware of how they can lower
their costs using the open source ones according to
their specific needs.
In this paper, we analysed three of the most used
Open Source data mining tools. To do this evaluation
the information needed was collected technical
documentation, through the usability of the tools and
on the websites of the respective tools.
The application of the OSSpal methodology
allowed us to obtain a more precise assessment,
assigning a numeric value to each category tool, thus,
allowing for comparisons.
After applying the OSSpal methodology we
conclude that RapidMiner is the tool that obtained the
best final score, and this justifies the number of users
that this tool has. Knime occupy the second place
with a high score near to RapidMiner and this could
justify the huge increase of Knime users compared to
other tools over the last years and then Weka appears
with the worst score which justifies (according to the
KDnuggets Full Results and 3-year data mining tools
trends) the decrease in the number of user: 11.2% in
2015, 10.9% in 2016 and 9.8% in 2017.
As a future work, we intend to apply a greater number
of measures for each category and see if it is still the
same tool to have the best score. We also plan to
extend this study by including a higher number of
Open Source data mining tools and see if the results
would be similar.
REFERENCES
Almeida, P. and Bernardino, J. (2016) ‘A survey on open
source data mining tools for SMEs’, Advances in
Intelligent Systems and Computing, 444, pp. 253–262.
doi: 10.1007/978-3-319-31232-3_24.
Borges, L. C., Marques, V. M. and Bernardino, J. (2013)
‘Comparison of data mining techniques and tools for
data classification’, Proceedings of the International
C* Conference on Computer Science and Software
Engineering. doi: 10.1145/2494444.2494451.
Chauhan, N. and Gautam, N. (2015) ‘Parametric
Comparison of Data Mining Tools’, v, pp. 291–298.
Courses (2015) Software Quality Characteristics. Available
at: https://courses.cs.vt.edu/csonline/SE/Lessons/Quali
ties/index.html.
Ferreira, T., Pedrosa, I. and Bernardino, J. (2017)
‘Evaluating Open Source Business Intelligence Tools
using OSSpal Methodology’, Proceedings of the 9th
International Joint Conference on Knowledge
Discovery, Knowledge Engineering and Knowledge
Management, (Kdir), pp. 283–288. doi: 10.5220/0006
516402830288.
Ferreira, T., Pedrosa, I. and Bernardino, J. (2018)
‘Evaluating Open Source E-commerce Tools using
OSSpal Methodology’. 20th International Conference
on Enterprise Information Systems. doi: 10.5220/
0006790902130220.
Giraud-Carrier, C. and Povel, O. (2003) ‘Characterising
Data Mining Software’, Intelligent Data Analysis, 7(3),
pp. 181–192.
Kohli, T. (2014) What are the five most important
characteristics of a good software? Available at: