Backdoor Attacks During Retraining of Machine Learning Models: A Mitigation Approach

Matthew Yudin, Achyut Reddy, Sridhar Venkatesan, Rauf Izmailov

2024

Abstract

Machine learning (ML) models are increasingly being adopted to develop Intrusion Detection Systems (IDS). Such models are usually trained on large, diversified datasets. As a result, they demonstrate excellent performance on previously unseen samples provided they are generally within the distribution of the training data. However, as operating environments and the threat landscape change over time (e.g., installations of new applications, discovery of a new malware), the underlying distributions of the modeled behavior also change, leading to a degradation in the performance of ML-based IDS over time. Such a shift in distribution is referred to as concept drift. Models are periodically retrained with newly collected data to account for concept drift. Data curated for retraining may also contain adversarial samples i.e., samples that an attacker has modified in order to evade the ML-based IDS. Such adversarial samples, when included for re-training, would poison the model and subsequently degrade the model’s performance. Concept drift and adversarial samples are both considered to be out-of-distribution samples that cannot be easily differentiated by a trained model. Thus, an intelligent monitoring of the model inputs is necessary to distinguish between these two classes of out-of-distribution samples. In the paper, we consider a worst-case setting for the defender in which the original ML-based IDS is poisoned through an out-of-band mechanism. We propose an approach that perturbs an input sample at different magnitudes of noise and observes the change in the poisoned model’s outputs to determine if an input sample is adversarial. We evaluate this approach in two settings: Network-IDS and an Android malware detection system. We then compare it with existing techniques that detect either concept drift or adversarial samples. Preliminary results show that the proposed approach provides strong signals to differentiate between adversarial and concept drift samples. Furthermore, we show that techniques that detect only concept drift or only adversarial samples are insufficient to detect the other class of out-of-distribution samples.

Download


Paper Citation


in Harvard Style

Yudin M., Reddy A., Venkatesan S. and Izmailov R. (2024). Backdoor Attacks During Retraining of Machine Learning Models: A Mitigation Approach. In Proceedings of the 21st International Conference on Security and Cryptography - Volume 1: SECRYPT; ISBN 978-989-758-709-2, SciTePress, pages 140-150. DOI: 10.5220/0012761600003767


in Bibtex Style

@conference{secrypt24,
author={Matthew Yudin and Achyut Reddy and Sridhar Venkatesan and Rauf Izmailov},
title={Backdoor Attacks During Retraining of Machine Learning Models: A Mitigation Approach},
booktitle={Proceedings of the 21st International Conference on Security and Cryptography - Volume 1: SECRYPT},
year={2024},
pages={140-150},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012761600003767},
isbn={978-989-758-709-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 21st International Conference on Security and Cryptography - Volume 1: SECRYPT
TI - Backdoor Attacks During Retraining of Machine Learning Models: A Mitigation Approach
SN - 978-989-758-709-2
AU - Yudin M.
AU - Reddy A.
AU - Venkatesan S.
AU - Izmailov R.
PY - 2024
SP - 140
EP - 150
DO - 10.5220/0012761600003767
PB - SciTePress