CLIP: Assisted Video Anomaly Detection

Meng Dong

2024

Abstract

As the main application of intelligent monitoring, video anomaly detection in surveillance has been well developed but remains challenging. Various types of anomalies promote the requirements of unique detectors in the general domains, whereas users may need to customize normal and abnormal situations in specific domains in descriptions, such as ”pedestrian No entry” or ”people fighting”. Moreover, anomalies in unseen videos are usually excluded from the training datasets. Conventional techniques based on computer vision or machine learning are typically data-intensive or limited to specific domains. Targeting developing a generalized framework for intelligent monitoring, we introduce generative anomaly descriptions to compensate for the visual branch and bridge the possibilities to adapt specific application domains. In particular, we adopt contrastive language-image pre-training (CLIP) with generative anomaly descriptions as our general anomaly detector. Not as state-of-the-art, category-level anomaly descriptions instead of simple category names will be adopted as language prompts in this work. A temporal module is developed on top of CLIP to capture temporal correlations of anomaly events. Besides the above frame-level anomaly detection, we support the detection of object-centric anomalies for some specific domains. Extensive experiment results show that the novel framework offers state-of-the-art performance on UCF-Crime and ShanghaiTech datasets.

Download


Paper Citation


in Harvard Style

Dong M. (2024). CLIP: Assisted Video Anomaly Detection. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-684-2, SciTePress, pages 522-533. DOI: 10.5220/0012356300003654


in Bibtex Style

@conference{icpram24,
author={Meng Dong},
title={CLIP: Assisted Video Anomaly Detection},
booktitle={Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2024},
pages={522-533},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012356300003654},
isbn={978-989-758-684-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - CLIP: Assisted Video Anomaly Detection
SN - 978-989-758-684-2
AU - Dong M.
PY - 2024
SP - 522
EP - 533
DO - 10.5220/0012356300003654
PB - SciTePress