# Solving a Hard Instance of Suspicious Behaviour Detection with Sparse Binary Vectors Clustering

### Eric Filiol, Abhilash Hota

#### Abstract

In this article we present a study dealing with the problem of detecting a very small subset of suspicious and malicious behaviours represented by sparse binary vectors in a population of individuals significantly larger. The main problem lies in the fact that malicious behaviours, in the case of sparse vectors, are difficult to distinguish from normal behaviours. Despite the fact that vectors are apparently strongly unbalanced, this property cannot be exploited since the objects to classify (behaviours) do not exhibit strongly enough frequencies discrepancy. It is not possible to work on detection directly and it is therefore necessary to go through a preliminary phase of vector partitioning (representing normal or malicious behaviour) to select a reduced subset concentrating with a high probability most of the vectors corresponding to malicious behaviours. We have been working on a set of anonymized real data from terrorism-related cases.

