Authors:
Eliana Fernandes
1
;
Ana Carolina Salgado
2
and
Jorge Bernardino
3
Affiliations:
1
Polytechnic of Coimbra – ISEC, Rua Pedro Nunes, Quinta da Nora, 3030-199 Coimbra, Portugal
;
2
Centre for Informatics, Universidade Federal de Pernambuco, Recife, Brazil
;
3
Polytechnic of Coimbra – ISEC, Rua Pedro Nunes, Quinta da Nora, 3030-199 Coimbra, Portugal, Centre for Informatics and Systems of the University of Coimbra (CISUC), Portugal
Keyword(s):
Streaming, Real-time Analytics, Big Data, Fault-Tolerance.
Abstract:
In recent years data has grown exponentially due to the evolution of technology. The data flow circulates in a very fast and continuous way, so it must be processed in real time. Therefore, several big data streaming platforms have emerged for processing large amounts of data. Nowadays, companies have difficulties in choosing the platform that best suits their needs. In addition, the information about the platforms is scattered and sometimes omitted, making it difficult for the company to choose the right platform. This work focuses on helping companies or organizations to choose a big data streaming platform to analyze and process their data flow. We provide a description of the most popular platforms, such as: Apache Flink, Apache Kafka, Apache Samza, Apache Spark and Apache Storm. To strengthen the knowledge about these platforms, we also approached their architectures, advantages and limitations. Finally, a comparison among big data streaming platforms will be provided, using as
attributes the characteristics that companies usually most need.
(More)