(a) Confusion matrix (46 clusters) (b) Confusion matrix (45 clusters)
Figure 5: Comparison of Confusion Matrix results obtained
for the different clusters.
4.2.2 Evaluation with 45 Clusters
As detailed in section 4.2.1, we conduct the same
evaluation pipeline while considering 45 clusters, in
order to assess their impacts on the prediction ac-
curacy. In fact, we assumed that the obtained re-
sults with 46, even though considered as good, may
be improved. For this purpose, we conduced a trial
with less clusters and evaluated the different metrics.
This second configuration, i.e., 45 clusters resulted
in an improved accuracy in predicting unknown at-
tacks compared to the first setting. However, it neg-
atively impacted the evaluation of K-Means. The F1
score dropped down to 0.89% and the Recall score
to 0.94%. Furthermore, the Precision score stepped
down to 0.85% and the total accuracy score to 0.89%.
4.3 Discussion
Figure 4(a) presents a common illustration of the real
attack and normal clusters. It is difficult to distinguish
one cluster from the other one since they are merged.
This is mainly due to the close behaviour between a
normal and an attack event. In our conducted experi-
ments, two configurations are evaluated, one using 45
and another using 46 clusters. This latter showed bet-
ter results using the validation dataset, thanks to the
similar behaviour of normal and attack events. Given
that, the selected criteria to identify if a cluster is nor-
mal or malicious is the class of the majority of events,
a higher number of clusters broke down the events in
a way that could improve the classification accuracy
of the clusters.
The number of clusters to use resolves the trade-
off between the accuracy to detect normal/attack
events and the good performance of the model to
classify known/unknown ones. Thus, based on
each setting’s requirements, i.e., to classify or detect
known/unknown attacks, our results showed that it is
more convenient to use the model with 45 clusters for
classification and 46 clusters for detection. Moreover,
in both cases, the accuracy to classify normal/attack
events is still high (at least 89%).
The precision and recall scores for the model us-
ing 45 clusters are fairly high. The precision met-
ric indicates that 85% of the events classified by the
model as attacks, are correct, meaning that the model
has a small false positive rate. The recall metric indi-
cates that 94% of the real attacks are correctly pre-
dicted. The F1 score (or the weighted average of
precision and recall) was equal to 89%. This metric
shows that the general detection rate of the combined
models is good. The confusion matrix is helpful to
analyse the known/unknown predictions of the mod-
els. The model using 45 clusters was the best model
since only 0.1% percent of the known events were
misclassified as unknown, and only 0.14% of the un-
known events were misclassified as known.
The model using 46 clusters obtained the same re-
sults as the model using 45 clusters when predicting
the known events. This shows that the difficulty in the
model lies in detecting the unknown attacks. Using 46
clusters, the model misclassified 38% of the unknown
events, which are 24% more than the model using 45
clusters.
5 CONCLUSION
This paper presented a hybrid approach to tackle the
problem of implementing intrusion detection systems
using machine learning models. The CICIDS 2017
dataset has been chosen, since it contains new relevant
attacks and realistic normal traffic, with a reasonable
size. The normal and attack data points were unbal-
anced, to balance the data undersampling technique
was used.
The highest performance for the K-Means clus-
tering was obtained with 46 clusters. The F1 score
was 0.91% while the Recall score 0.95% the Precision
score was 0.88% and the Accuracy score was 0.91%.
The highest performance for the Variational Bayesian
Gaussian Mixture model was obtained with 45 clus-
ters at 90% of the known attacks predicted as known
and 86% of the unknown predicted as unknown.
Future work will concentrate on evaluating other
ML models and integrating the proposed solution
into a SIEM system in a dynamic setting. This will
demonstrate the versatility of the proposed method-
ology in ever-evolving environments. In addition,
we will investigate the use of fully homomorphic
encryption as discussed by (Sgaglione et al., 2019;
Boudguiga et al., 2020) to make the intrusion detec-
tion more privacy-preserving. However, using homo-
morphic encryption will require the adaptation of the
used models and may result in a loss of accuracy.
Efficient Hybrid Model for Intrusion Detection Systems
699