existing AutoML frameworks tend to focus on super-
vised learning tasks that require labeled data as in-
put (Oliveira, 2019). One of the challenges is that the
identification of the most similar clusters can be sub-
jective and it usually requires multiple approaches to
automate this process (Poulakis, 2020). The difficulty
in clustering is finding the results that aligns with a
practitioner’s needs because in many complex data
sets, there are several plausible clusters, and practi-
tioners may have different priorities and preferences.
An unsupervised clustering algorithm has no way to
intrinsically infer which clusters embody desired pri-
orities and preferences (Bae et al., 2020).
Additionally, in data with a temporal component
such as EV charging events for example, assessing the
structure consistency of discovered clusters over dif-
ferent temporal granularities, is often a lengthy man-
ual undertaking. Metrics such as inter-cluster separa-
tion, inter-cluster homogeneity, density, and uniform
cluster sizes can be computed to determine structure
consistency. However, the question of how to select
a particular clustering result that is more meaningful
than another based on user priorities and preferences,
still depends on the practitioner’s capacity of distin-
guishing similar clusters. Towards this challenge, this
research work explores whether given the prospect of
a clustering result of interest, a process of objectively
highlighting and recommending similar clustering re-
sults can be automated in order to support practition-
ers in evaluating how clustering patterns persist over
multiple temporal granularities, allowing practition-
ers to find meaningful clusters according to their pref-
erences and priorities. The overall motivation of this
work is to assist the practitioner in navigating multiple
clustering results for different temporal partitions of
the same data. Providing the practitioner with an ini-
tial ranked list of clustering results and a mechanism
to identify clustering similarities can assist practition-
ers in downstream analytical tasks such as improving
regression or classification model performance.
Therefore, we propose a clustering process which
uses internal cluster validity indices to enable the
identification of similar clustering results across var-
ious temporal slices of data. Of primary concern in
this work is the comparison of clustering results from
a-priori selected temporal granularity (e.g weekly,
monthly and seasonal) and how to support practition-
ers in identifying similar results using a reference re-
sult of interest. A case study using real-world charg-
ing event data from EV station operators in Atlantic
Canada is used to evaluate the proposed clustering
process in identifying similar clusters of charging sta-
tions according to their usage patterns (e.g high vs low
usage).
The scientific contributions of this paper are as fol-
lows.
• Our work is unique in proposing a combination of
eight internal cluster validity indices to character-
ize clusters at different granularities (e.g.weekly,
monthly or seasonally). Previous research work
has usually focused on using these indices apart
from each other.
• These internal validity indices are then used to
compute a proximity measure (i.e. Euclidean dis-
tance) for helping practitioners to identify similar
clusters. To the best of our knowledge, this clus-
tering procedure has never been used as an objec-
tive measure to reduce the cognitive load of prac-
titioners in understanding clustering results.
• The use of real-world data from EV charging sta-
tions advances the understanding of charging be-
havior. To the best of our knowledge, no previous
work has implemented an end-to-end automated
clustering process that facilitates the comparison
of clustering results by practitioners with differ-
ent priorities and preferences.
The rest of the paper is organized as follows. In
Section 2, previous research work is described. Sec-
tion 3 describes the proposed clustering process un-
derpinning our work. Section 4 provides a detailed
description of the real-world EV charging event data
and the end-to-end automated implementation of our
proposed clustering process. In Section 5, we discuss
the results. Finally, Section 6 concludes and indicates
future research work.
2 RELATED WORK
In clustering, various steps must be taken by a prac-
titioner such as the selection of an appropriate algo-
rithm and its hyperparameters, the choice of an ad-
equate proximity measure, and how to validate the
modeling results. Fig. 1 outlines a typical cluster anal-
ysis process.
Additionally, the temporal granularity of an algo-
rithm’s input data can generate different clusters over
time. A common problem in clustering is how to ob-
jectively and quantitatively evaluate the results. Clus-
ter validation is an important task in the clustering
process because it aims to compare clustering results
and solve the question of optimal cluster count. Many
internal validity indices have been proposed to as-
sess the level of “success” that a clustering algorithm
achieves in finding the natural clusters in data without
any class label information (Rend
´
on et al., 2011), (Liu
et al., 2010).
SMARTGREENS 2021 - 10th International Conference on Smart Cities and Green ICT Systems
68