Application of Pretopological Hierarchical Clustering for Buildings

Portfolio

Loup-No

e L

evy

1,2

, J

emie Bosom

2,3

, Guillaume Guerard

, Souﬁan Ben Amor

, Marc Bui

and Hai Tran

LI-PARAD Laboratory EA 7432, Versailles University, 55 Avenue de Paris, 78035 Versailles, France

Energisme, 88 Avenue du G

eral Leclerc, 92100 Boulogne-Billancourt, France

EPHE, PSL Research University, 4-14 Rue Ferrus, 75014 Paris, France

De Vinci Research Center, Pole Universitaire L

eonard de Vinci, 12 Avenue L

eonard de Vinci, 92400 Courbevoie, France

{f author.s author}@devinci.fr

Keywords:

Artiﬁcial Intelligence, Data Analysis, Clustering Algorithms, Pretopology.

Abstract:

Our paper deals with the problem of the comparison of heterogeneous energy consumption proﬁles for energy

optimization. Doing case-by-case in depth auditing of thousands of buildings would require a massive amount

of time and money as well as a signiﬁcant number of qualiﬁed people. Thus, an automated method must be

developed in order to establish a relevant and effective recommendations system. Comparing sites to extract

similar proﬁles refers to a machine learning set of methods called clustering. To answer this problematic, pre-

topology is used to model the sites’ consumption proﬁles and a multi-criteria hierarchical clustering algorithm,

using the properties of pretopological space, has been developed using a Python library. The pretopological

hierarchical clustering algorithm is able to identify the clusters and provide a hierarchy between complex

items. Tested on benchmarks of generated time series (from literature and from french energy company), the

algorithm is able to identify the clusters using Pearson’s correlation with an Adjusted Rand Index of 1 and

returns relevant results on real energy systems’ consumption data.

1 INTRODUCTION

In 2015 was signed the Paris agreement in which gov-

ernment from all over the world undertook to keep

global warming behind a 2

◦

C increase compared to

the temperatures of 1990. The year of the Cop21, the

worldwide buildings sector was responsible for 30%

of global ﬁnal energy consumption and nearly 28%

of total direct and indirect CO

emissions. Yet the en-

ergy demand from buildings and building’s construc-

tion continues to rise, driven by improved access to

energy in developing countries, greater ownership and

use of energy-consuming devices and rapid growth in

global buildings ﬂoor area, at nearly 3% per year

There are various ways to decrease buildings’ en-

ergy consumption (Guerard et al., 2017): social pro-

grams, incentive programs, new energies, energy efﬁ-

ciency, dynamic pricing, demand-response programs.

Most of the time, buildings having the same proﬁle of

consumption are sensitive to similar programs.

http://www.eia.gov/

However, the systems we study are not always

buildings. They can be a building ﬂoor or simply a

place inside a building. In consequence, it’s more ac-

curate to talk about sites.

Sites present an important heterogeneity both in

intrinsic properties and geographic situation (Miller,

2016). In addition, the scales of analysis are various

both in time (consumption time series are analysed on

a 24h proﬁle as well as on a yearly proﬁle) and space

(the studied system can go from one room to a group

of buildings across a country). Because of that, there

is no universal performance scale on which to com-

pare a site to another.

Unfortunately, doing case-by-case in depth audit-

ing of thousands of buildings would require a massive

amount of time and money as well as a signiﬁcant

number of qualiﬁed people.

A comparison between similar sites might be

meaningful to understand the performance of a new

site. Comparing different sites to categorize them by

proximity is called clustering. By investigating the

works that were effective on a certain site, one can

228

Lévy, L., Bosom, J., Guerard, G., Ben Amor, S., Bui, M. and Tran, H.

Application of Pretopological Hierarchical Clustering for Buildings Portfolio.

DOI: 10.5220/0010485802280235

In Proceedings of the 10th International Conference on Smart Cities and Green ICT Systems (SMARTGREENS 2021), pages 228-235

ISBN: 978-989-758-512-8

deduce what programs will probably be efﬁcient for

sites of similar nature. Hence, clustering sites based

on their characteristics and consumption will enhance

their evaluation and the recommendations system.

Therefore the topic of our paper is as following:

How to cluster a large number of heterogeneous sites

based on their energy consumption proﬁles to recom-

mend the most relevant energy optimisation solution

possible?

In this article, we will consider that the energy

consumption proﬁle encompasses all the physical

characteristics of a site as well as the external fac-

tors and the consumption data (time series, categori-

cal data and numerical data). The latter is considered

as a time series.

Our goal is to study a group of sites to optimize

their consumption thanks to recommendations done

on similar sites. This can be assimilated to portfo-

lio analysis. Portfolio analysis represents a domain in

which a large group of buildings, often located in the

same geographical area or owned or managed by the

same entity, are analyzed for the purpose of managing

or optimizing the group as a whole (Miller, 2016).

The key contribution of this paper is to provide a

clustering method adapted to portfolio analysis based

on a pretopological framework.

The paper is structured as follows: the section 2

introduces clustering methods and some relevant ex-

amples on energy systems. The section 3 presents the

pretopology theory and its application as a clustering

method. The section 4 shows a pedagogical example

of the presented method. We conclude in the section

2 LITERATURE REVIEW

Formally, clustering refers to a set of unsupervised

machine learning methods which group unlabeled

items in clusters. In this section, we present cluster-

ing methods and their application on energy systems.

The journal paper of Iglesia et al. in Energies (Igle-

sias and Kastner, 2013) presents a deeper analysis of

clustering in energy system. To consult an exhaustive

list of clustering algorithms, we invite you to read Xu

et Al. survey (Xu and Tian, 2015).

There are four classes of clustering algorithms

with their pros and cons: centroid-based cluster-

ing, density-based clustering, hierarchical clustering,

distribution-based clustering. Let us present each

class and their application to portfolio analysis in en-

ergy system.

Centroid-based Clustering: In such methods, a

cluster is a set of items such that an item in a cluster

is nearest to the center of a cluster than to the center

of any other cluster. The center of a cluster is called

a centroid, the average of all the points in the clus-

ter, or a medoid, the most representative point of a

cluster. The most known centroid-based algorithm is

the K-means algorithm and its extensions. K-means

is a powerful tool for clustering but it requires to de-

termine in advance how many clusters the algorithm

should ﬁnd.

Therefore, centroid-based algorithms are sensitive

to initial conditions. Clusters vary in size and den-

sity and include outliers (isolated item) to the nearest

cluster. Lastly, centroid-based algorithms don’t scale

with the number of items and dimensions. In those

cases, centroid-based algorithms are combined with

principal component analysis or spectral analysis to

be more effective.

About the portfolio analysis in energy systems,

Gao et al. (Gao and Malkawi, 2014) benchmark mul-

tidimensional energy use dataset using a k-means al-

gorithm. Freischhacker et al. (Fleischhacker et al.,

2019) design a spatial aggregation method, combined

with k-means, based on city blocks’ characteristics to

reduce reductions due to energy use.

Density-based Clustering: In density-based clus-

tering, a cluster is a set of items spread in the

data space over a contiguous region of high den-

sity of items. Items located in low-density regions

are typically considered noise or outliers (Kriegel

et al., 2011). The most known methods in this class

are Density-Based Spatial Clustering of Applications

with Noise (DBSCAN) and its extensions.

The formation of clusters is sensitive to two pa-

rameters: the density and the reachability. Hence, the

clusters are distinct depending on those parameters.

The main advantages are this density-based cluster-

ing algorithm does not require a-priori speciﬁcation

and it is able to identify noisy data while clustering.

It fails in case of neck type datasets and it does not

work well in case of high dimensionality data.

About the portfolio analysis in energy systems,

Li et al. (Li et al., 2020) present a density-based

method with a particle swarm optimization of param-

eters of buildings portfolio. Their method forecasts

next-day electricity usage thanks to the clustering.

Marquant et al. (Marquant et al., 2018) use a den-

sity and loads based algorithm to facilitate large-scale

modelling and optimisation of urban energy systems.

Hierarchical Clustering: Hierarchical clustering is

usually a procedure to transform a proximity matrix

Application of Pretopological Hierarchical Clustering for Buildings Portfolio

229

into a sequence of hierarchically structured partitions.

There are two methods of hierarchical clustering:

ascending (or agglomerating) or descending (or divid-

ing). The ascending methods begin with disjointed

classes and place each of the items in an individ-

ual class. Based on the proximity matrix, the proce-

dure searches at each step for the two closest classes,

merges them, and then snaps into a second partition.

The process is repeated to construct a sequence of

nested partitions in which the number of classes de-

creases as the sequence progresses until a unique class

contains all the items. The descending methods do the

inverse process.

The primary problem with those algorithms is to

deﬁne the criterion of grouping or aggregation crite-

rion of two classes, i.e. a distance measure. Sites are

deﬁned as complex systems (Ahat et al., 2013; Bosom

et al., 2018; Gu

erard et al., 2015). They are deﬁned

with numerical and categorical data as well as time se-

ries, calculating a distance between two items is chal-

lenging and doesn’t allow to use each characteristic

of the site in a relevant way. Another drawback is the

difﬁculty to identify an accurate number of clusters,

especially in a large dataset.

About the portfolio analysis in energy systems,

Wang et al. (Wang et al., 2020) analyse the spatial

disparity of ﬁnal energy consumption in China thanks

to hierarchical clustering and spatial autocorrelation.

Li et al. (Li et al., 2019) implement an agglomera-

tive hierarchical clustering-based strategy to identify

typical daily electricity usage proﬁles.

Distribution-based Clustering: The application to

large spatial databases raises the following require-

ments for clustering algorithms: no input parameters

or the strict minimum, clusters with arbitrary shape.

Distribution-based clustering produces clusters which

assume concisely deﬁned mathematical models un-

derlying the items, a relatively plausible assumption

for some items distributions.

Most of the time, mathematical models are based

on Gaussian distribution, multinomial or multivari-

ate normal distribution. The clusters are considered

fuzzy, which means an item may be in various clus-

ters at a deﬁned percent. The most known algo-

rithm is Expectation-Maximization (EM) clustering

with Gaussian mixture models (GMM). That way, the

GMM algorithm provides two parameters to describe

the shape of the clusters: the mean and the standard

deviation. The chief drawback of those algorithms is

that it cannot work on categorical dimensions.

About the portfolio analysis in energy systems, Lu

et al. (Lu et al., 2019) use a GMM clustering for heat-

ing load patterns identiﬁcation. Habib et al. (Habib

et al., 2015) provide a EM clustering to detect outliers

in energy buildings portfolio.

Conclusion about Clustering Methods: None of

the methods described above can answer the speci-

ﬁcities of the studied system, either because they re-

quire the deﬁnition of a distance between the items,

or because they cannot return the hierarchical cluster-

ing necessary to apprehend the different scales of a

complex system.

Relevance of Pretopology-based Clustering: A

pretopological space is deﬁned by a relation between

any set of items and a bigger set of items. It is there-

fore adapted to the creation of a hierarchical structure.

It is based on the concept of abstract space. In such a

space the nature of the item is not relevant, it is rather

the relations and property linking the items to another

that matters. This allows us to manipulate heteroge-

neous and complex items such as our sites. Because

of that, pretopology can be considered as a mathemat-

ical tool for modeling the concept of proximity for

complex systems (Auray et al., 2009). Pretopology

is, therefore, the approach chosen to build our hierar-

chical clustering.

3 PRETOPOLOGY

In this section we explain the key concepts and deﬁni-

tions of pretopology, such as pretopological space and

pseudo-closure. Then, we provide the main algorithm

for the pretopological hierarchical clustering.

3.1 Pretopological Space

Let us start with some deﬁnitions.

Deﬁnition 1. A pseudoclosure function a : ℘(U) →

℘(U) on a set of items U, is a function such that:

• a(

0) =

• ∀A | A ⊆ U : A ⊆ a(A)

where ℘(U) is the power set of U

Deﬁnition 2. A tuple (U, a(.)), where U is a set of

items and a(.) is a pseudoclosure function on U, con-

stitutes a pretopological space.

Deﬁnition 3. In a pretopological space, we can ﬁnd

the closure by repeatedly applying the pseudoclosure

operator to the set and its subsequent images until it

stops expanding.

Deﬁnition 4. In a pretopological space the closure

of a part A of U is the smallest closure containing A.

Denoted F(A) (see Figure 2).

SMARTGREENS 2021 - 10th International Conference on Smart Cities and Green ICT Systems

230

Figure 1: Example of a pseudoclosure function (Laborde,

2019).

Figure 2: Closure of set A (Laborde, 2019).

A pretopological space is deﬁned by establishing

a relation between any set of items and a bigger set.

Each step of a pseudoclosure is interesting in the con-

struction of a hierarchy. An example of pseudoclosure

function is shown in Figure 1.

Now let us present our framework formalizing

a pretopological space adapted from Julio Laborde

works (Laborde, 2019). In this framework each

pretopological space is characterized by a tuple

(G, Θ, DNF(.)), where:

• G = {G

, E

), G

, E

), ..., G

, E

)} is a

set of n weighted directed graphs.

• Θ = {θ

, θ

, ..., θ

} is a set of n thresholds, each

associated to one graph.

• DNF(.) : (℘(U), U ) → {True, False}, where

℘(U) is the power set of U, which is a boolean

function expressed as a positive disjunctive nor-

mal form in terms of the n boolean functions

(A, x), ..., V

(A, x), each associated to a graph,

and whose truth value depends on the set A and

the item x.

We determine if an item x ∈ U belongs to the pseu-

doclosure of a set A in the following way:

Figure 3: Example of a pseudoclosure under the framework

(Laborde, 2019).

• ∀V

(A, x), V

(A, x) = True ⇐⇒

∑

∈G

,y∈A

w(e

) ≥ θ

, where e

is the edge

going from x to y, and w(e) is the weight of the

edge e.

• The item x ∈ U will belong to the pseudoclosure

of A ⇐⇒ the DNF(.) evaluates to True

Simply put, this checks in every graph if the sum

of the weights of the edges going from the item x to

the items inside A is bigger than the threshold asso-

ciated to the graph. When this happens, the boolean

variable associated to that graph acquires a value of

True, otherwise it gets a value of False. If DNF(.)

evaluates to True with those values for the boolean

functions V

(A, x), then the items belongs to the pseu-

doclosure. An example of this is illustrated in ﬁgure

3.2 Algorithms

This section describes the algorithms used for the

construction of a closure and to build a hierarchical

clustering of sites.

The clustering procedure is structured in three

phases:

1. Calculation of a family of elementary sets called

seeds.

2. Construction of the subsets by applying pseudo-

closure iteratively.

Application of Pretopological Hierarchical Clustering for Buildings Portfolio

231

3. Establishing a structural relation among all the

subsets using quasihierarchy.

Calculation of a Family of Seeds: The purpose of

this procedure is to generate a small set from which

the elementary closure subset will be calculated. Cal-

culating those seeds from each item cause a lot of cal-

culation. This can be avoided by starting with sets of

2, 3 or 4 items.

A seed of multiple items is calculated by prox-

imity. The distance measure depends on the at-

tributes (numeric attributes, binary attributes, nominal

attributes, ordinal attributes, mixed-type attributes).

Construction of the Subset: This algorithm ap-

plies the pseudo-closure on the seeds. That will pro-

duce bigger sets. The pseudo-closure is applied iter-

atively until providing closure. Since we have started

applying the pseudoclosure on seeds, the closure we

have determined are called closure subsets. By keep-

ing the structure of all the pseudo-closure between

the seed and the closure subset, the algorithm keeps

a range of sets deﬁning a hierarchy.

Construction of the Hierarchy from Subsets: Our

objective is now to determine a hierarchy between the

subsets, called quasihierarchy. The algorithm is build

following these rules:

• Two subsets are connected only if their intersec-

tion is not empty.

• The more of a set A is contained in a set B, the

stronger the relation from A to B.

• The bigger the set B is compared to A, the lesser

the part of A that should be contained in B to

have a strong relation going from A to B. In other

words, a very big set will attract smaller ones even

if their intersection is not very large.

• Two sets that have a mutualy strong relation are

considered equivalent, unless one is contained in

the other, in which case the bigger of the two is a

parent of the other in the quasihierarchy.

The algorithm takes as input a set of subsets and a

threshold, and returns a quasihierarchy by (see Fig.

4):

• Quantifying the relation between each pair of sets

determined with non-empty intersection.

• Creating a link in the quasihierarchy when the

value of the relation is above the threshold.

• Sets having links going in both direction are con-

sidered equivalent and one is selected randomly.

• The resulting closures with the respective links

determine the quasihierarchy.

Figure 4: Construction of the quasihierarchy (Laborde,

2019).

3.3 Model Validation and Visualization

of Results

Validation Tool: To evaluate the pretopological hi-

erarchical clustering, we also provide a set of tools to

validate the model and to show the results.

This program is developed to create a dataset of

points with the following parameters:

- the number of groups of dense items;

- the number of items of each group;

- the spatial dispersion of each group;

- the position of each group.

To evaluate multi-criteria clustering, the size of an

item is added as a second parameter. Groups with

different item size can be produced with the following

parameters:

- the number of groups;

- the number of items of each group;

- the range of sizes of each group.

This program helps to evaluate our method in differ-

ent kinds of situations and make corrections or adjust-

ments easily.

Visualization Tool: To observe the results of the

classiﬁcation, the program colors each of the biggest

sets determined by our algorithm in a unique color.

The validation tool is tested with two groups of items

with both big and small size and a 2-dimensional po-

sition. Items are shown on ﬁgure 5. In this example,

four clusters have been determined: blue, green, or-

ange and red. The black dot at the leftest side of the

ﬁgure 5 is an item identiﬁed by the algorithms as an

outlier. For example red and orange items are close

to one another yet separated into two clusters because

of their different sizes and orange and green points

are similar in size yet divided into two sets because of

their different positions.

The program also displays the hierarchical classi-

ﬁcation composed of the seeds, the intermediate sets

and the ﬁnal clusters. The hierarchical classiﬁcation

is displayed as a tree in which each set is identiﬁed by

a number and is represented as a node.

SMARTGREENS 2021 - 10th International Conference on Smart Cities and Green ICT Systems

232

Figure 5: The four clusters determined by our algorithm using both size and position as parameters, on a 2D disks dataset.

Figure 6: A tree representing the pseudohierarchy relation

between each intermediate set from the seed to the cluster.

For instance, the hierarchy presented in ﬁgure 6

shows the relations between the sets determined by

our algorithm applied to the dataset displayed on ﬁg-

ure 5. This tree presents only the sets of more than

two items. We can recognize the four clusters that

were colored on ﬁgure 5, they are entitled 20, 21, 22

and 23. The ﬁgure 7 displays the set 14 which is a

child of the set 21 (colored in green) in the hierarchi-

cal clustering. This hierarchy determines large groups

of relatively similar items and provides more details

about smaller groups of very similar items.

Figure 7: The subset 14 in red representing a subgroup of

the green clusters (subset 22) in ﬁgure 5.

4 EXPERIMENTS AND RESULTS

4.1 Benchmark Dataset

Because the main data we have on sites are power

consumption time series, the clustering of a set of

time series had to be tested, visualized and evaluated.

This section presents this test set and the results of

our algorithm. The created test set, composed of six

clusters is presented on ﬁgure 8. Each clusters is com-

posed of 30 time series of 60 points.

The similarity measure used to establish the value

between two items is Pearson’s coefﬁcient. The Pear-

son correlation coefﬁcient measures the linear rela-

tionship between each pair of items, which in this case

are time series.

Our program colored the time series according to

the clusters it had determined (see ﬁgure 8).

Figure 8: The clusters identiﬁed by our algorithm.

4.2 Results Analysis on Benchmark

Dataset

The program identiﬁed the exact same clusters as the

ground truth given by the benchmark. To evaluate the

validity of the clusters determined by the algorithm,

our metric is the Adjusted Rand Score also called Ad-

justed Rand Index (ARI). Since we perfectly identi-

ﬁed the clusters the ARI of our clustering is 1. The

Application of Pretopological Hierarchical Clustering for Buildings Portfolio

233

ﬁgure 9 shows the confusion matrix between the clus-

ter found by our method and the ground truth given by

the benchmark.

Further experiments will be conducted in a future

contribution.

Figure 9: Confusion matrix of the clusterization.

4.3 Real Dataset

This dataset is build from Enedis (Power Grid Oper-

ator in France) consumption times series of 400 sites

over a year. It is resampled with a time step of half

an hour, a day, a week and a month. The proxim-

ity between the Enedis delivery points is evaluated

on each resampled time series, each resampled time

series corresponding to one characteristic of a site.

Once the Enedis data set is build, the algorithm de-

scribed in section 3 is applied on the time series.

4.4 Result Analysis on Real Dataset

Figure 10, displays the clustering of 50 Enedis time

series representing the whole clusters. Three clusters

were identiﬁed, in the red clusters there is a single

peak per day that lasts for half the day, in the green

Figure 10: Clustering of the Enedis time series.

cluster there are two peaks a day, one in the morning,

one in the evening, and in the blue cluster the con-

sumption is constant during the day.

The algorithm has identiﬁed relevant clusters in

the sense that each items shares one trait with the

items of their clusters that they don’t share with items

of a different cluster.

5 CONCLUSION

Important energy savings can be made by acquiring

better insight over building consumption proﬁles. To

determine what savings can be made on a building, an

important element is to compare its energy consump-

tion with the one of other buildings. However, energy

systems (building and sites) are heterogeneous, com-

plex, and are described by numerical and categorical

data as well as consumption time series and are there-

fore hard to compare to one another. Hence the need

for an adapted clustering method. Studying the state

of the art methods of clustering made us create a new

hierarchical algorithm based on pretopology. Indeed,

pretopology theory provides tools to determine rela-

tion of proximity between heterogeneous sets. These

algorithms were developed in a Python library along-

side tools of visualization and evaluation. Results on

generated test data sets demonstrated the efﬁciency

and the relevance of this library.

ACKNOWLEDGEMENTS

This paper is the result of research conducted at the

energy data management company Energisme. We

thank Energisme for the resources that have been

made available to us and Julio Laborde for his assis-

tance with the conception of our pretopological hier-

archical algorithm library.

REFERENCES

Ahat, M., Amor, S. B., Bui, M., Bui, A., Gu

erard, G.,

and Petermann, C. (2013). Smart Grid and Opti-

mization. American Journal of Operations Research,

03(01):196–206.

Auray, J.-P., Bonnevay, S., Bui, M., Duru, G., and Lamure,

M. (2009). Pr

etopologie et applications : un

etat

de l’art. Studia Informatica Universalis (Hermann),

7:27–44.

Bosom, J., Scius-Bertrand, A., Tran, H., and Bui, M.

(2018). Multi-agent architecture of a mibes for smart

energy management. Innovations for Community Ser-

vices. I4CS 2018, 863.

SMARTGREENS 2021 - 10th International Conference on Smart Cities and Green ICT Systems

234

Fleischhacker, A., Lettner, G., Schwabeneder, D., and Auer,

H. (2019). Portfolio optimization of energy communi-

ties to meet reductions in costs and emissions. Energy,

173:1092 – 1105.

Gao, X. and Malkawi, A. (2014). A new methodology for

building energy performance benchmarking: An ap-

proach based on intelligent clustering algorithm. En-

ergy and Buildings, 84:607 – 616.

Guerard, G., Pichon, B., and Nehai, Z. (2017). Demand-

response: Let the devices take our decisions. In

SMARTGREENS, pages 119–126.

erard, G., Ben Amor, S., and Bui, A. (2015). A context-

free smart grid model using pretopologic structure. In

2015 International Conference on Smart Cities and

Green ICT Systems (SMARTGREENS), pages 1–7.

Habib, U., Zucker, G., Blochle, M., Judex, F., and Haase, J.

(2015). Outliers detection method using clustering in

buildings data. In IECON 2015-41st Annual Confer-

ence of the IEEE Industrial Electronics Society, pages

000694–000700. IEEE.

Iglesias, F. and Kastner, W. (2013). Analysis of similarity

measures in times series clustering for the discovery

of building energy patterns. Energies, 6(2):579–597.

Kriegel, H.-P., Kr

oger, P., Sander, J., and Zimek, A.

(2011). Density-based clustering. Wiley Interdisci-

plinary Reviews: Data Mining and Knowledge Dis-

covery, 1(3):231–240.

Laborde, J. (2019). Pretopology, a mathematical tool for

structuring complex systems: methods, algorithms

and applications. PhD thesis, EPHE.

Li, K., Ma, Z., Robinson, D., Lin, W., and Li, Z. (2020).

A data-driven strategy to forecast next-day electricity

usage and peak electricity demand of a building port-

folio using cluster analysis, cubist regression models

and particle swarm optimization. Journal of Cleaner

Production, 273:123115.

Li, K., Yang, R. J., Robinson, D., Ma, J., and Ma, Z. (2019).

An agglomerative hierarchical clustering-based strat-

egy using shared nearest neighbours and multiple dis-

similarity measures to identify typical daily electricity

usage proﬁles of university library buildings. Energy,

174:735 – 748.

Lu, Y., Tian, Z., Peng, P., Niu, J., Li, W., and Zhang,

H. (2019). Gmm clustering for heating load patterns

in-depth identiﬁcation and prediction model accuracy

improvement of district heating system. Energy and

Buildings, 190:49 – 60.

Marquant, J. F., Bollinger, L. A., Evins, R., and Carmeliet,

J. (2018). A new combined clustering method to anal-

yse the potential of district heating networks at large-

scale. Energy, 156:73 – 83.

Miller, C. (2016). Screening Meter Data: Characterization

of Temporal Energy Data from Large Groups of Non-

Residential Buildings. PhD thesis, ETH Zurich.

Wang, S., Liu, H., Pu, H., and Yang, H. (2020). Spatial dis-

parity and hierarchical cluster analysis of ﬁnal energy

consumption in china. Energy, 197:117195.

Xu, D. and Tian, Y. (2015). A comprehensive survey

of clustering algorithms. Annals of Data Science,

2(2):165–193.

Application of Pretopological Hierarchical Clustering for Buildings Portfolio

235