Authors:
Thi Thu Trang Ngo
1
;
David Sarramia
2
;
Myoung-Ah Kang
1
and
François Pinet
3
Affiliations:
1
Université Clermont Auvergne, ISIMA, LIMOS-UMR CNRS 6158, Aubière, France
;
2
Université Clermont Auvergne, CNRS/IN2P3, LPC, Clermont-Ferrand, France
;
3
Université Clermont Auvergne, INRAE, UR TSCF, Clermont-Ferrand, France
Keyword(s):
ELK Stack, Elasticsearch, Spatial Data Warehouse, Georeferenced Sensor Data, ETL, Streaming Data, NoSQL, Data Lake, Data Integration.
Abstract:
In the context of the French CAP 2025 I-Site project, an environmental data lake called CEBA is built at an Auvergne regional level. Its goal is to integrate data from heterogeneous sensors, provide end users tools to query and analyse georeferenced environmental data, and open data. The sensors collect different environmental measures according to their location (air and soil temperature, water quality, etc.). The measures are used by different research laboratories to analyse the environment. The main component for data shipping and storing is the ELK stack. Data are collected from sensors through Beats and streamed by Logstash to Elasticsearch. Scientists can query the data through Kibana. In this paper, we propose a data warehouse frontend to CEBA based on the ELK stack. We as well propose an additional component to the ELK stack that operates streaming ETL which allows integrating and aggregating streaming data from different sensors and sources given the user configuration in o
rder to provide end users more analytical capabilities on the data. We show the architecture of this system, we present the functionalities of the data lake through examples, and finally, we present an example dashboard of the data on Kibana.
(More)