8 CONCLUSION
This work presented Lambda architecture
implementations for different public cloud vendors.
Also, this research offered a comparison of such
implementations to support decision makers when
they need to select specific vendors’ SaaSs in the
context of BDA. Based on the results obtained, we
recommend the most suitable SaaS for each layer
depending on the criteria selected.
In terms of performance, AWS obtained the best
metrics in the batch and speed layers. In the batch
layer, AWS showed the best performance in terms of
reading, processing, and writing time, whereas
Google Cloud seems to be affected by increasing data
size. Focusing on serving layer performance, Azure
presented a constant and efficient behavior compared
to other competitors.
Regarding the time-to-market, AWS required
more man-hours, especially in the speed and serving
layers. Azure had the fastest development in the
serving layer, but batch layer implementations
required more effort because they implied the
development and integration of Data Lake Store,
Stream Analytics, Data Factory, and Data Lake
Analytics services. Google Cloud development was
the fastest, which could be due to the unified
programming model for batch and speed processing
offered by Google Dataflow.
In terms of the cost of services, Azure was the
most expensive provider in the serving layer, whereas
AWS consumed more credits in the serving layer due
to the Cosmos DB service. In contrast, Google Cloud
presented the lowest price in all layers and offers the
widest free tier to initiate the training.
In summary, when performance is a strong
concern, despite the high cost, AWS (in the batch and
speed layers) is the best choice, and Azure (in the
serving layer) should be selected to obtain the best
response times. If the time-to-market guides the SaaS
selection, Google Cloud is recommended although
the performance could be affected. Finally, if service
pricing is an important constraint, Google Cloud
again offers the best choice by a factor of 1/4.
ACKNOWLEDGMENTS
This research was carried out by the Center of
Excellence and Appropriation in Big Data and Data
Analytics (CAOBA), supported by the Ministry of
Information Technologies and Telecommunications
of the Republic of Colombia (MinTIC) through the
Colombian Administrative Department of Science,
Technology and Innovation (COLCIENCIAS) under
contract no. FP44842-anex46-2015. Special thanks
are due to CAOBA’s members: Miguel Rodriguez,
Felipe Gonzalez-Casabianca, Miguel Barrera, and
Camilo Ortiz.
REFERENCES
Batyuk, A. and Voityshyn, V. (2016). Apache storm based
on topology for real-time processing of streaming data
from social networks. In 2016 IEEE DSMP, pages 345–
349. IEEE.
Dissanayake, D. M. C. and Jayasena, K. P. N. (2017). A
cloud platform for big iot data analytics by combining
batch and stream processing technologies. In 2017
NITC, pages 40–45.
Gribaudo, M., Iacono, M., and Kiran, M. (2017). A
performance modeling framework for lambda
architecture based applications. Future Generation
Computer Systems.
Grulich, P. M. and Zukunft, O. (2017). Smart stream-based
car information systems that scale: An experimental
evaluation. In 2017 IEEE iThings, pages 1030–1037.
Hasani, Z., Kon-Popovska, M., and Velinov, G. (2014).
Lambda architecture for real time big data analytic. ICT
Innovations, pages 133–143.
ISO (2001). Intelligent transport systems - Reference model
architecture(s) for de ITS sector. Part 1: ITS service
domains, service groups and services.
Kiran, M., Murphy, P., Monga, I., Dugan, J., and Baveja, S.
S. (2015). Lambda architecture for cost-effective batch
and speed big data processing. In 2015 IEEE
International Conference on Big Data (Big Data),
pages 2785–2792. IEEE.
Marz, N. and Warren, J. (2015). Big Data, Principles and
best practices of scalable real-time data systems.
Manning Publications Co.
Pham, L. M. (2015). A Big Data Analytics Framework for
IoT Applications in the Cloud. VNU Journal of Science:
Computer Science and Communication Engineering,
31(2):44–55.
Thota, C., Manogaran, G., Lopez, D., and Sundarasekar,
R.(2018). Architecture for Big Data Storage in
Different Cloud Deployment Models. In Handbook of
Research on Big Data Storage and Visualization
Techniques, pages 196–226. IGI Global.
TransLink (2013). 2011 Metro Vancouver Regional Trip
Diary Survey Analysis Report. Technical report,
TransLink, Vancouver.
Villari, M., Celesti, A., Fazio, M., and Puliafito, A. (2014).
AllJoyn Lambda: An architecture for the management
of smart environments in IoT. In 2014 International
Conference on Smart Computing Workshops, pages 9–
14. IEEE.