anomalies in Apache Cassandra. Proceedings of the
VLDB Endowment, 8(7), 810-813.
H2O.ai. AirlinesWithWeatherDemo. 2016. Available at:
https://github.com/h2oai/sparkling-
water/tree/master/examples/ [Accessed on 20 Feburary
2016].
Khan, Z., Anjum, A., Soomro, K., and Tahir, M. A. 2015.
Towards cloud based big data analytics for smart future
cities. Journal of Cloud Computing, 4(1), 1.
Kreps, J., Narkhede, N. and Rao, J. 2011, June. Kafka: A
distributed messaging system for log processing. In
Proceedings of the NetDB (pp. 1-7).
Lapi, E., Tcholtchev, N., Bassbouss, L., Marienfeld, F. and
Schieferdecker, I. 2012, July. Identification and
utilization of components for a linked open data
platform. In Computer Software and Applications
Conference Workshops (COMPSACW), 2012 IEEE
36th Annual (pp. 112-115). IEEE.
Liu, Z., Li, H. and Miao, G. 2010, August. MapReduce-
based backpropagation neural network over large scale
mobile data. In 2010 Sixth International Conference on
Natural Computation (Vol. 4, pp. 1726-1730). IEEE.
Marienfeld, F., Schieferdecker, I., Lapi, E., and Tcholtchev,
N. 2013, August. Metadata aggregation at GovData. de:
an experience report. In Proceedings of the 9th
International Symposium on Open Collaboration (p.
21). ACM.
Matheus, R. and Manuella, M. 2014. Case study: open
government data in Rio de Janeiro City. Open Research
Network.
Mercader, A. et al. 2012. ckanext-harvest - remote
harvesting extension. Available at: https://github.com/
ckan/ckanext-harvest [Accessed on 20 Feburary 2016].
Momjian, B. 2001. PostgreSQL: introduction and concepts
(Vol. 192). New York: Addison-Wesley.
Red Hat, Inc. Using Hadoop with CephFS.. 2014. Available
at: http://docs.ceph.com/docs/jewel/cephfs/hadoop
[Accessed on 20 Feburary 2016].
Rosado, T. and Bernardino, J., 2014, July. An overview of
openstack architecture. In Proceedings of the 18th
International Database Engineering & Applications
Symposium (pp. 366-367). ACM.
Shvachko, K., Kuang, H., Radia, S. and Chansler, R. 2010,
May. The hadoop distributed file system. In 2010 IEEE
26th symposium on mass storage systems and
technologies (MSST) (pp. 1-10). IEEE.
Steinberger, R., Ebrahim, M., Poulis, A., Carrasco-Benitez,
M., Schlüter, P., Przybyszewski, M., & Gilbro, S. 2014.
An overview of the European Union’s highly
multilingual parallel corpora. Language Resources and
Evaluation, 48(4), 679-707.
Thaha, A.F., Singh, M., Amin, A.H., Ahmad, N.M. and
Kannan, S., 2014, December. Hadoop in openstack:
Data-location-aware cluster provisioning. In
Information and Communication Technologies
(WICT), 2014 Fourth World Congress on (pp. 296-
301). IEEE.
The Apache Software Foundation. WebHDFS REST API.
2013. Available at: http://hadoop.apache.org/docs/
r1.0.4/webhdfs.html [Accessed on 20 Feburary 2016].
The Apache Software Foundation. Apache Spark:
Lightning-fast cluster computing. 2016. Available at:
http://spark.apache.org/ [Accessed on 20 Feburary
2016].
The Apache Software Foundation. Hadoop Project
Webpage. 2017a. Available at:
http://hadoop.apache.org/ [Accessed on 20 Feburary
2016].
The Apache Software Foundation. Apache Cassandra.
2017b. Available at: http://cassandra.apache.org/
[Accessed on 20 Feburary 2016].
Tierney, B., Kissel, E., Swany, M. and Pouyoul, E. 2012.
Efficient data transfer protocols for big data. In E-
Science (e-Science), 2012 IEEE 8th International
Conference on (pp. 1-9). IEEE.
Weil, S. A., Brandt, S. A., Miller, E. L., Long, D. D. and
Maltzahn, C. 2006, November. Ceph: A scalable, high-
performance distributed file system. In Proceedings of
the 7th symposium on Operating systems design and
implementation (pp. 307-320). USENIX Association.
Winn, J. 2013. Research Data Management using CKAN:
A Datastore, Data Repository and Data Catalogue.
IASSIST Conference.
Wuebker, J, Ney, H and Zens, R. 2012. Fast and scalable
decoding with language model look-ahead for phrase-
based statistical machine translation. In Proceedings of
the 50th Annual Meeting of the Association for
Computational Linguistics: Short Papers-Volume 2.