
dataset, which makes the statistics tree expand to
many nodes. To overcome this limitation, we decided
to implement some core data structures that the model
uses to compute the statistics tree in C, by leveraging
Cython to develop a Python extension. With this so-
lution in place, we were able to lower the memory
consumed by up to 65% when using a statistics tree
with ∼40M nodes.
5 CONCLUSIONS AND FUTURE
WORK
Our work introduced a hybrid modeling method for
numerical data, based on a mixture of categorical
and numeric inputs. As shown in the evaluation sec-
tion, we achieved a 98% reduction in FPs with a
significantly smaller number of parameters required
for modeling and faster training times. The non-
uniform normalization scheme, permitted using the
tree, enables us to efficiently train models on datasets
with skewed numerical outputs and provides a nat-
ural back-off method for the statistical estimation.
Our future research plans will focus on (a) how the
consumption order of the attributes can be efficiently
computed, (b) on investigating more use-cases and (c)
on further refining our results and reducing the num-
ber of False Positives.
With every SMP, it is expected to have frequent
requests to fetch credentials, especially in enterprise
grade, rapid changing environments. This translates
in dozens of access logs being generated with every
request, which could lead to introduction of new ac-
cess patterns. To accommodate for the continuous
shift in patterns, the model would have to be retrained
often, to avoid data drift. This is one aspect that is part
of our future work on improving the model to auto-
matically detect that the model has drifted and decide
when it’s time to retrain.
Adding on future research plans, the current work
focused on attack patterns observed while inspecting
adversary emulations of real-world scenarios. As part
of our future research plans, we will investigate how
to improve the model’s robustness against new and
evolving attack patterns, to consider more viewing
points when deciding if an event is anomalous or not.
Finally, although we have shown that our hy-
brid modeling method proved successful to reduce
the number of False Positives and identify anomalous
events in SMPs access patterns, we only had access
to a single SMP to work with and test our approach
against. To validate that the proposed solution is ef-
fective when confronted with access logs from other
SMPs as well, we will conduct a new experiment us-
ing data gathered from multiple sources, and compare
the results.
REFERENCES
Anumol, E. (2015). Use of machine learning algorithms
with siem for attack prediction. In Intelligent Com-
puting, Communication and Devices: Proceedings of
ICCD 2014, Volume 1, pages 231–235. Springer.
Bryant, B. D. and Saiedian, H. (2020). Improving siem
alert metadata aggregation with a novel kill-chain
based classification model. Computers & Security,
94:101817.
Das, S., Ashrafuzzaman, M., Sheldon, F. T., and Shiva, S.
(2020). Network intrusion detection using natural lan-
guage processing and ensemble machine learning. In
2020 IEEE Symposium Series on Computational In-
telligence (SSCI), pages 829–835. IEEE.
Du, M., Li, F., Zheng, G., and Srikumar, V. (2017).
Deeplog: Anomaly detection and diagnosis from sys-
tem logs through deep learning. In Proceedings of the
2017 ACM SIGSAC conference on computer and com-
munications security, pages 1285–1298.
Feng, C., Wu, S., and Liu, N. (2017). A user-centric ma-
chine learning framework for cyber security opera-
tions center. In 2017 IEEE International Conference
on Intelligence and Security Informatics (ISI), pages
173–175. IEEE.
Gibert Llaurad
´
o, D., Mateu Pi
˜
nol, C., and Planes Cid, J.
(2020). The rise of machine learning for detection and
classification of malware: Research developments,
trends and challenge. Journal of Network and Com-
puter Applications, 2020, vol. 153, 102526.
Idhammad, M., Afdel, K., and Belouch, M. (2018). Semi-
supervised machine learning approach for ddos detec-
tion. Applied Intelligence, 48:3193–3208.
Noor, U., Anwar, Z., Amjad, T., and Choo, K.-K. R. (2019).
A machine learning-based fintech cyber threat attribu-
tion framework using high-level indicators of compro-
mise. Future Generation Computer Systems, 96:227–
242.
Osanaiye, O., Cai, H., Choo, K.-K. R., Dehghantanha,
A., Xu, Z., and Dlodlo, M. (2016). Ensemble-based
multi-filter feature selection method for ddos detec-
tion in cloud computing. EURASIP Journal on Wire-
less Communications and Networking, 2016(1):1–10.
Piplai, A., Mittal, S., Abdelsalam, M., Gupta, M., Joshi,
A., and Finin, T. (2020). Knowledge enrichment by
fusing representations for malware threat intelligence
and behavior. In 2020 IEEE International Conference
on Intelligence and Security Informatics (ISI), pages
1–6. IEEE.
Zekri, M., El Kafhali, S., Aboutabit, N., and Saadi, Y.
(2017). Ddos attack detection using machine learn-
ing techniques in cloud computing environments. In
2017 3rd international conference of cloud computing
technologies and applications (CloudTech), pages 1–
7. IEEE.
Zhou, H., Hu, Y., Yang, X., Pan, H., Guo, W., and Zou,
C. C. (2020). A worm detection system based on deep
learning. IEEE Access, 8:205444–205454.
IoTBDS 2024 - 9th International Conference on Internet of Things, Big Data and Security
190