that concerns our work, SpatialHadoop has the
infrastructure to process geographic databases, and
many tools have been developed for operations, joins,
and indexing in geodatabases.
Cloud computing provides infrastructure to
process big geospatial data that needs high
performance but requires a computational
infrastructure that can be expensive, when working
on public cloud providers. With this, it is necessary to
use a cost-efficient method to avoid wasting
computational resources and increases in financial
costs.
The method proposed in this paper and
demonstrated by the case study presented, achieves
the goal of supporting a SpatialHadoop environment
on public cloud providers, while avoiding the waste
of computational resources. The formula to define the
number of data nodes was validated in the case study
and about 263% of the cost was econimized.
As future works we suggest optimizations on
performance that can be obtained using task nodes –
for job processing only - and data nodes together. In
this way, it is possible to apply scalability in
SpatialHadoop applications based on user-defined
threads, mainly in indexing task, that demands
powerful computing. Others applications can also be
tested, like SpatialSpark and ISP-MC.
REFERENCES
Ahmed, Elmustafa Sayed Ali, and Rashid A. Saeed. "A
Survey of Big Data Cloud Computing Security."
International Journal of Computer Science and
Software Engineering (IJCSSE) 3.1 (2014): 78-85.
Akdogan, Afsin. Cost-efficient partitioning of spatial data
on cloud. Big Data (Big Data), 2015 IEEE International
Conference on. IEEE, 2015.
Alarabi, L., Eldawy, A., Alghamdi, R., & Mokbel, M. F.
(2014, June). TAREEG: a MapReduce-based web
service for extracting spatial data from OpenStreetMap.
ACM SIGMOD international conference on
Management of data. ACM.
Das, J., Dasgupta, A., Ghosh, S. K., & Buyya, R. A
Geospatial Orchestration Framework on Cloud for
Processing User Queries. In IEEE International
Conference on Cloud Computing for Emerging
Markets, 2016.
Distributed System Archicteture. Hadoop cluster size.
[Online]. Available from: https://0x0fff.com/hadoop-
cluster-sizing/ 2016.10.26.
Eldawy, A., Li, Y., Mokbel, M. F., & Janardan, R. (2013,
November). CG_Hadoop: computational geometry in
MapReduce. The 21st ACM SIGSPATIAL
International Conference on Advances in Geographic
Information Systems.
Eldawy, A., Mokbel, M. F., Alharthi, S., Alzaidy, A.,
Tarek, K., & Ghani, S. (2015, April). Shahed: A
mapreduce-based system for querying and visualizing
spatio-temporal satellite data. In 2015 IEEE 31st
International Conference on Data Engineering (pp.
1585-1596). IEEE.
Eldawy, Ahmed, and Mohamed F. Mokbel. "A
demonstration of SpatialHadoop: an efficient
mapreduce framework for spatial data." Proceedings of
the VLDB Endowment 6.12 (2013): 1230-1233.
Eldawy, Ahmed, and Mohamed F. Mokbel. "Pigeon: A
spatial mapreduce language." 2014 IEEE 30th
International Conference on Data Engineering.
Eldawy, Ahmed, and Mohamed F. Mokbel.
"Spatialhadoop: A mapreduce framework for spatial
data." 2015 IEEE 31st International Conference on
Data Engineering. IEEE, 2015.
Eldawy, Ahmed, Louai Alarabi, and Mohamed F. Mokbel.
"Spatial partitioning techniques in SpatialHadoop."
Proceedings of the VLDB Endowment 8.12 (2015).
Eldawy, Ahmed, M. Mokbel, and Christopher Jonathan.
"HadoopViz: A MapReduce framework for extensible
visualization of big spatial data." IEEE Intl. Conf. on
Data Engineering (ICDE). 2016.
Eldawy, Ahmed. "SpatialHadoop: towards flexible and
scalable spatial processing using mapreduce."
Proceedings of the 2014 SIGMOD PhD symposium.
ACM, 2014.
Hadoop Online Tutorial. Formula to calculate NDFS nodes
storage. [Online]. Avilable from:
http://hadooptutorial.info/ formula-to-calculate-hdfs-
nodes-storage/ 2016.11.03.
Joshi, Pramila. "Cloud Architecture for Big Data."
International Journal of Engineering and Computer
Science. 2015.
Krämer, Michel, and Ivo Senner. "A modular software
architecture for processing of big geospatial data in the
cloud." Computers & Graphics 49 (2015): 69-81.
Leong, L., Petri, G., Gill, B., Dorosh, M. The Gartner Magic
Quadrant for Cloud Infrastructure as a Service,
Worldwide. [Online]. Available from:
https://www.gartner.com/doc/reprints?id=1-2G2O5FC
&ct=150519. 2016.11.02.
Mell, Peter, and Tim Grance. "The NIST definition of cloud
computing." (2011).
Mokbel, M. F., Alarabi, L., Bao, J., Eldawy, A., Magdy, A.,
Sarwat, M., ... & Yackel, S. (2014, March). A
demonstration of MNTG-A web-based road network
traffic generator. In 2014 IEEE 30th International
Conference on Data Engineering (pp. 1246-1249).
IEEE.
Qu, Chenhao, Rodrigo N. Calheiros, and Rajkumar Buyya.
"Auto-scaling Web Applications in Clouds: A
Taxonomy and Survey." arXiv preprint
arXiv:1609.09224 (2016).
Sagiroglu, Seref, and Duygu Sinanc. "Big data: A review."
Collaboration Technologies and Systems (CTS), 2013
International Conference on. IEEE, 2013.
Cost Optimization on Public Cloud Provider for Big Geospatial Data
61