tain results worth noting:
• Talend reached a higher maximum CPU load than
Pentaho in most scenarios.
• In transformations from all data stores (except
Neo4j) to MySQL, Pentaho proved to be faster
than Talend.
• There was a significant difference in the results of
the transformation from all stores to Cassandra.
Talend was much faster in performing transfor-
mations, sometimes even 10,000 times faster than
Pentaho in these cases. Ironically, for the transfor-
mations from Cassandra to all other data stores,
Pentaho was much faster compared to Talend.
Throwing light on the other transformation technolo-
gies: Although manual method is a multi-step (ex-
port, data validation, load) process of transformation,
it was much faster than Pentaho or Talend. While its
counterparts were extremely slow in some cases and
transformed only 5,80,000 records in others, manual
method was able to complete the jobs with 2 mil-
lion records. A big drawback was that in some cases
Neo4j became unresponsive.
7 CONCLUSION AND FUTURE
WORK
The main aim of our work was to compare these tech-
nologies in detail using well defined characteristics
and datasets.Though there exist other ways of trans-
forming data like using commercial tools, but the crux
of this study was to compare the open source tools
which are freely available resources for every user.
When tools like Pentaho and Talend did not deliver,
other alternatives like manual methods were defined.
All the 74 transformation methodologies in this work
were implemented individually and evaluated. Fur-
ther as a reference, all the challenges faced during the
course of this work have been documented. Our vi-
sion for our work in this paper is that it could serve as
a guidelines to choose suitable transformation tech-
nologies for organizations looking to transform data,
migrate to other data stores, exchange data with other
organizations and the like. Although the number of
technologies could be limited with some factor, there
can always be more data stores that can be a part of
this comparative study. Every organisation has it’s
own needs and thereby follows different database so-
lutions. Adding more databases will help widen the
study and prove beneficial for future users.
REFERENCES
Abramova, V., Bernardino, J., and Furtado, P. (2015). Sql or
nosql? performance and scalability evaluation. Inter-
national Journal of Business Process Integration and
Management, 7(4):314.
DBEngines (2008). Db-engines ranking. http://db-
engines.com/en/ranking. Accessed: 2016-01-25.
Doan, A., Halevy, A., and Ives, Z. G. (2012). Principles
of data integration. Morgan Kaufmann Publishers In,
Waltham, MA.
Kovacs, K. (2016). Cassandra vs mongodb vs couchdb
vs redis vs riak vs hbase vs couchbase vs hyper-
table vs elasticsearch vs accumulo vs voltdb vs
scalaris comparison: Software architect kristof ko-
vacs. http://kkovacs.eu/cassandra-vs-mongodb-vs-
couchdb-vs-redis. Accessed: 2016-01-20.
Labs, Y. (2016). Webscope datasets. http://webscope. sand-
box.yahoo.com/. accessed: 2016-02-22.
Netflix (2015). Case study: Netflix. http://www.datas
tax.com/resources/casestudies/netflix. Accessed:
2016-02-02.
Otto, B., Juerjens, J., Schon, J., Auer, S., Menz, N., Wenzel,
S., and Cirullies, J. (2016). Industrial data space - dig-
ital sovereignity over data. Fraunhofer-Gesellschaft
zur Foerderung der angewandten Forschung e.V.
Schoenberger, V. M., Cukier, K., and Schonberger, V. M.
(2013). Big data: A revolution that will transform
how we live, work, and think. Eamon Dolan/Houghton
Mifflin Harcourt, Boston.
Thusoo, A., Anthony, S., Jain, N., Murthy, R., Shao, Z.,
Borthakur, D., Sharma, J. S., and Liu, H. (2010). Data
warehousing and analytics infrastructure at facebook.
SIGMOD’10, 978-1-4503-0032-2(10):06.
Tudorica, B. G. and Bucur, C. (n.d). A comparison between
several nosql databases with comments and notes.
White, T. and Cutting, D. (2009). Hadoop: The definitive
guide. O’Reilly Media, Inc, USA, United States.
Xchange, I. D. (2015). Industrial communications, indus-
trial it, opc, profibus - industrial data xchange (idx).
http://www.idxonline.com/Default.aspx?tabid=188.
accessed: 2015-12-04.
DATA 2017 - 6th International Conference on Data Science, Technology and Applications
248