deduce what programs will probably be efficient for
sites of similar nature. Hence, clustering sites based
on their characteristics and consumption will enhance
their evaluation and the recommendations system.
Therefore the topic of our paper is as following:
How to cluster a large number of heterogeneous sites
based on their energy consumption profiles to recom-
mend the most relevant energy optimisation solution
possible?
In this article, we will consider that the energy
consumption profile encompasses all the physical
characteristics of a site as well as the external fac-
tors and the consumption data (time series, categori-
cal data and numerical data). The latter is considered
as a time series.
Our goal is to study a group of sites to optimize
their consumption thanks to recommendations done
on similar sites. This can be assimilated to portfo-
lio analysis. Portfolio analysis represents a domain in
which a large group of buildings, often located in the
same geographical area or owned or managed by the
same entity, are analyzed for the purpose of managing
or optimizing the group as a whole (Miller, 2016).
The key contribution of this paper is to provide a
clustering method adapted to portfolio analysis based
on a pretopological framework.
The paper is structured as follows: the section 2
introduces clustering methods and some relevant ex-
amples on energy systems. The section 3 presents the
pretopology theory and its application as a clustering
method. The section 4 shows a pedagogical example
of the presented method. We conclude in the section
5.
2 LITERATURE REVIEW
Formally, clustering refers to a set of unsupervised
machine learning methods which group unlabeled
items in clusters. In this section, we present cluster-
ing methods and their application on energy systems.
The journal paper of Iglesia et al. in Energies (Igle-
sias and Kastner, 2013) presents a deeper analysis of
clustering in energy system. To consult an exhaustive
list of clustering algorithms, we invite you to read Xu
et Al. survey (Xu and Tian, 2015).
There are four classes of clustering algorithms
with their pros and cons: centroid-based cluster-
ing, density-based clustering, hierarchical clustering,
distribution-based clustering. Let us present each
class and their application to portfolio analysis in en-
ergy system.
Centroid-based Clustering: In such methods, a
cluster is a set of items such that an item in a cluster
is nearest to the center of a cluster than to the center
of any other cluster. The center of a cluster is called
a centroid, the average of all the points in the clus-
ter, or a medoid, the most representative point of a
cluster. The most known centroid-based algorithm is
the K-means algorithm and its extensions. K-means
is a powerful tool for clustering but it requires to de-
termine in advance how many clusters the algorithm
should find.
Therefore, centroid-based algorithms are sensitive
to initial conditions. Clusters vary in size and den-
sity and include outliers (isolated item) to the nearest
cluster. Lastly, centroid-based algorithms don’t scale
with the number of items and dimensions. In those
cases, centroid-based algorithms are combined with
principal component analysis or spectral analysis to
be more effective.
About the portfolio analysis in energy systems,
Gao et al. (Gao and Malkawi, 2014) benchmark mul-
tidimensional energy use dataset using a k-means al-
gorithm. Freischhacker et al. (Fleischhacker et al.,
2019) design a spatial aggregation method, combined
with k-means, based on city blocks’ characteristics to
reduce reductions due to energy use.
Density-based Clustering: In density-based clus-
tering, a cluster is a set of items spread in the
data space over a contiguous region of high den-
sity of items. Items located in low-density regions
are typically considered noise or outliers (Kriegel
et al., 2011). The most known methods in this class
are Density-Based Spatial Clustering of Applications
with Noise (DBSCAN) and its extensions.
The formation of clusters is sensitive to two pa-
rameters: the density and the reachability. Hence, the
clusters are distinct depending on those parameters.
The main advantages are this density-based cluster-
ing algorithm does not require a-priori specification
and it is able to identify noisy data while clustering.
It fails in case of neck type datasets and it does not
work well in case of high dimensionality data.
About the portfolio analysis in energy systems,
Li et al. (Li et al., 2020) present a density-based
method with a particle swarm optimization of param-
eters of buildings portfolio. Their method forecasts
next-day electricity usage thanks to the clustering.
Marquant et al. (Marquant et al., 2018) use a den-
sity and loads based algorithm to facilitate large-scale
modelling and optimisation of urban energy systems.
Hierarchical Clustering: Hierarchical clustering is
usually a procedure to transform a proximity matrix
Application of Pretopological Hierarchical Clustering for Buildings Portfolio
229