mation is required to determine a realistic similar-
ity value. In the context of this work, the presence
and movement of people can be highly dictated by
the learning activities that takes place across the Col-
lege. For example, the regular presence of students
and the teaching staff in lecture-rooms is highly dic-
tated by the modules taught in these rooms. Similar to
other academic institutions, these learning activities
such as lectures and lab sessions are highly dictated
by the timetable, which gives the location and time
allocation for the different learning activities across
the academic year. Here at Birkbeck College, this al-
location is usually different for the different academic
terms, with exception to a selection of core modules
that continue to run for more than one term. Nonethe-
less, within the term period many people are likely
to be present at the same location at the same time at
least once a week. This observation was confirmed by
the regularity found in the temporal patterns as shown
in Figure 3. Based on this finding, we decided to com-
pute the similarity over the 11 week periods - each 11
week period correspondsto one of the academic terms
contained in the data set described in Section 5.2.
4.4 Detecting Regular Learning
Activities
To explain how our proposed method successfully de-
tects the occurrence of a class, we rely on the intuition
that the visitors to a target location, where the regular
sessions of a module are delivered, naturally form a
social group that most likely meet on a regular ba-
sis over the number of weeks that the module covers.
The experiments we conducted, as shown in the anal-
ysis presented in Section 6, were designed to discover
such groups by performing a two stage process, which
addresses the following challenges.
4.4.1 Noise Reduction
With the kind of WLAN data utilised in this work,
it is not guaranteed that all the individuals who vis-
ited a particular location were there, merely to attend
the learning activity taking place at that location. In
order to successfully detect a regular class that takes
place at a target location, we discard from our pro-
cessing the data of any individual whose total number
of visits to the target location was less than a minimum
attendance threshold.
Another concept that is closely related to level of
attendance is the minimum class size, which is the
smallest percentage of the total number of students
registered for the class that must be present for a
learning session to hold. Note here that the minimum
attendance and the minimum class size vary between
the different schools and departments within the Col-
lege.
4.4.2 Coherence of Attendance
Even with the noise being eliminated, we still cannot
guarantee that those individuals who visited a partic-
ular location were there merely to attend the learn-
ing activity that was taking place there. Therefore,
it is imperative to verify that those individuals that at-
tended the potential class are coherentin attending the
individual sessions of that class over the 11 week aca-
demic term. A coherent cluster is defined as a group
of individual users that have similar attendance. For
example, if two or more individuals consistently at-
tended the same sessions of a class then they are mem-
bers of a coherent cluster.
To verify coherence of attendance, we apply our
proposed clustering method to find out whether those
individuals, whose attendance satisfy the minimum
requirement, form a single cohesive cluster with re-
spect to their attendance of individual sessions across
the different weeks of the academic term period.
4.4.3 Discovering Coherent Clusters
The clustering approach we are proposing is based
on the DBSCAN algorithm, the density-based spa-
tial clustering of applications with noise (Ester et al.,
1996), which scales well for large amount of data
(Kriegel et al., 2011). The original DBSCAN takes
two parameters, namely epsilon (a distance threshold)
and minPts (a minimum number of points which is
used as a density threshold). Given some data points
for clustering, DBSCAN relies on these two param-
eters to identify density connected points in the data.
It uses the concepts of direct and density connectiv-
ity to group points together forming transitive hull of
density-connected points, which yields density-based
clusters of arbitrary shapes. In DBSCAN, two points
are said to be directly connected if they are at dis-
tance less than the threshold epsilon and a point is
said to be a core point if it has more directly con-
nected neighbouring points than the threshold minPts.
Furthermore, two points is said to be density con-
nected if they are connected to core points that are
themselves density connected to one another (Kriegel
et al., 2011).
In our proposed social variant of DBSCAN, which
we refer to as Social-DBSCAN, we use information
from the semantic context of the human presence to
inform the DBSCAN algorithm about the distance
and the density threshold values, which the algorithm
utilises to discover the social clusters present in the