exploitability of datasets, particularly, when involving
a user input channel for ML training purposes. While
reactive security mechanisms, such as intrusion detec-
tion systems (IDS), can help detect such attacks, they
have received little attention in the context of the IoT.
Indeed, the majority of the available IDSs tend to be
designed for conventional ICT infrastructure or wire-
less sensor networks, but not for the IoT (Anthi et al.,
2018).
In this paper, we propose a data-centric anomaly-
based IDS based on ML to detect anomalies associ-
ated with integrity attacks targeting interactive online
learning scenarios in the IoT. We focus on training-
only attacks that affect the user feedback channel.
Specifically, we focus on a representative type of poi-
soning integrity attack, known as a label-flipping at-
tack (Biggio et al., 2012). A label-flipping attack is
a type of adversarial attack that exploits classifica-
tion algorithms by corrupting their training data with
small perturbations. Thus, the main goal of this at-
tack type is to fool target systems into misclassifying
benign inputs as malicious ones, or vice versa. Un-
like most previous research, which has focused on
network traffic, we focus on application layer data.
Applying anomaly detection on the application layer
can detect intrusions that may be missed if only lower
layers of network traffic are analyzed (Meyer-Berg
et al., 2020). As an example, a manipulated ther-
mostat may show no irregularities on lower layers,
e.g., on the network layer, whereas actual temper-
ature readings, e.g., as captured in an activity log,
might indicate an anomaly. An attack on smart ther-
mostats was demonstrated by security researchers at
DefCon 24
3
, where they uploaded a proof-of-concept
ransomware to a smart thermostat, allowing them to
manipulate the temperature until the homeowner paid
a ransom; potentially evidence of such an attack could
have been captured at the application layer in the form
of anomalous temperature readings. We interpret an
anomaly as an intrusion (Khraisat and Alazab, 2021),
which represents any significant deviation between an
observed behavior and the learned ML model.
Our proposed data-centric anomaly-based detec-
tion system is demonstrated in a case study consist-
ing of a smart campus setup that involves: a smart
camera, a climate sensmitter, smart lighting, a smart
phone, and a user feedback channel, over which users
can provide feedback to the training process. For cre-
ating this setup, we leverage a concept known as the
Dynamic Intelligent Virtual Sensor (DIVS) (Tegen
et al., 2019). The DIVS essentially extends the no-
tion of a virtual sensor, which is typically used in de-
3
https://defcon.org/html/defcon-24/dc-24-news.html
[Accessed on 19-September-2022].
vices with a fixed set of sensors, to a dynamic setting
with heterogeneous sensors. Through the application
of supervised ML algorithms trained on application
layer data, we demonstrate that anomalies targeting
the user feedback channel of interactive ML setups
can be accurately detected at 98% using the Random
Forest classifier.
2 RELATED WORK
Approaches to building anomaly detectors for IDSs
can be broadly categorized as design-centric and data-
centric (MR et al., 2021).
Design-centric approaches make use of physical
relationships, captured as invariants, among a sys-
tem’s components (MR et al., 2021). This means that,
if an invariant exists for a system, it can be used as a
basis for detecting anomalies in the system’s behavior.
However, design-centric approaches tend to be based
on the assumption that the system itself is a closed en-
vironment, such as a private home, in which all com-
ponents are known. In data-centric approaches, such
relationships among system components are learned
and modelled through the application of ML and
computational intelligence techniques, namely, super-
vised, unsupervised, and hybrid (semi-supervised) al-
gorithms (Alsoufi et al., 2021)(MR et al., 2021)(Al-
bulayhi et al., 2021). This also means that they can
better cater to open or semi-open environments such
as a building or campus.
Given their ability to automatically learn the dy-
namics and strategies deployed in a system and the
dynamic and heterogeneous nature of an IoT system,
we focus on data-centric approaches for developing
our anomaly detector. Another advantage of the data-
centric approaches is that they could be used to better
detect new attacks, such as zero-day attacks, and also
need fewer human interventions. Moreover, we fo-
cus on the supervised learning approach to anomaly
detection (Lin et al., 2015). Supervised learning in-
volves the collection and analysis of every input vari-
able and an output variable, and an algorithm to learn
the normal user behaviour from the input to the output
(Khraisat and Alazab, 2021).
There have been several similar works done in IoT
domains. The following are some of the most recent
notable works on IDSs that have used a data-centric
approach to anomaly detection in the IoT context.
Liu et al. (Liu et al., 2018) proposed a light
probe routing mechanism for detecting On-Off at-
tacks caused by malicious network nodes in an indus-
trial IoT site. An On-Off attack in this context means a
malicious network node could target the IoT network
A Data-Centric Anomaly-Based Detection System for Interactive Machine Learning Setups
183