Finally, the communication between the Dash-
board and the Data Warehouse is handled by the se-
lected MVC framework. The Dashboard is accessed
through a web browser and forwards all user actions
to the controller, which communicates with the Data
Warehouse and updates the view accordingly.
Data Staging Technologies. Django (Forcier et al.,
2008) has been selected as the underlying framework
to enable the functionalities and communications pre-
sented above. Therefore, the Data Staging is being
implemented in Python programming language. The
data extraction part, which is the consumer part of the
RabbitMQ architecture, uses the Pika library
7
, a pure-
Python implementation of the AMQP 0-9-1 protocol,
which is the recommended solution by the RabbitMQ
team (Boschi and Santomaggio, 2013).
To support the data validation part, the Django
framework has been enhanced with the Django REST
framework (Holovaty and Kaplan-Moss, 2009),
which offers the ability to define custom data serializ-
ers. Data validation is based on JSON schema and on
serializers’ validation functionality. Upon validation,
data are deserialized and stored directly in the Data
Warehouse using the Django mechanisms.
Data Warehouse Technologies. The Data Ware-
house is deployed as a PostgreSQL (Momjian, 2001)
database. The structured, clearly defined and unmod-
ifiable data model led to the decision of using a rela-
tional database for the data storage.
PostgreSQL offers full support for all the func-
tionalities of the Structured Query Language (SQL),
geometric data types including a point of floating
numbers necessary for the storage of the GPS coor-
dinates as well as a wide range of indices that can be
used to support the computation of the KPIs.
PostgreSQL supports GiST indexes which are not
a single kind of index, but rather an infrastructure
where many different indexing strategies can be im-
plemented. Accordingly, the operators with which a
GiST index can be used vary depending on the in-
dexing strategy (the operator class). This is very im-
portant as it allows spatial queries over several two-
dimensional geometric data types. In addition, it of-
fers GIN indices which are appropriate for data val-
ues that contain multiple component values, such as
arrays. This allows the flexibility for multiple choice
answers. Multicolumn indices and indices on expres-
sions are also used to support the computation of the
KPIs when needed.
Dashboard Technologies. The Dashboard is a web
application that accesses the data using the above-
mentioned RESTful API and it is using common web
technologies, i.e. HTML, CSS and JavaScript. For
7
https://pika.readthedocs.io/en/stable/
the KPI visualisation purposes, JavaScript libraries
such as Chart.js (Downie, 2015) and Plotly (Sievert
et al., 2017) are used.
The ICT platform is a distributed system, conse-
quently, its individual components are deployed on
separate machines which communicate to each other
using the network.
On the one hand, the data adapters are imple-
mented in the corresponding ICT Tool, in order to
enable messages transmission to the middleware. The
various copies of the message publishers are delivered
to all interested partners, along with the appropriate
credentials to access the exchange. In order to acti-
vate the API and send the JSON messages, the Data
Adaptor part needs to be created by the partner that is
responsible for the relevant ICT tool.
On the other hand, the components of the ICT
Platform, namely the middleware, the Data Staging,
the Data Warehouse and the Dashboard, are deployed
on private cloud infrastructure. In order to assure the
high availability of the provided services, a cluster of
servers is considered for each ICT Platform compo-
nent.
Load balancers can be used in order to reduce la-
tency and to ensure a fault-tolerant configuration. A
NGINX web server
8
is used to provide a load bal-
ancer and reverse proxy. It must be noted, however,
that the load on the majority of the ICT Platform com-
ponents is expected to be small so this solution will
only be implemented in case that performance issues
are identified during the demonstrations.
Multiple instances of components that are ex-
pected to have increased workload are deployed. The
instances are designed in a way that allow them to
perform their work in parallel and independently from
each other. This mechanism is applied in Data Stag-
ing in order for the data manipulation, consuming,
processing and validating data coming from the mid-
dleware, and the storage to be performed faster and
the fault tolerance of the system to be increased.
In the Data Warehouse that incorporate a database,
it must be ensured that the stored data is always avail-
able and cannot become unusable, corrupted or lost.
For this purpose, the underlying databases must be
replicated on other machines. To deal with data syn-
chronisation, a synchronous solution is being imple-
mented: a data-modifying transaction is not consid-
ered committed until all servers have committed the
transaction. This guarantees consistency after a fail-
ure.
Cybersecurity and Privact Aspects. Cybersecurity
and privacy mechanisms have been implemented in
the platform presented in this paper in order to as-
8
https://www.nginx.com/
An ICT Platform for the Understanding of the User Behaviours towards EL-Vs
239