dataset, which makes the statistics tree expand to
many nodes. To overcome this limitation, we decided
to implement some core data structures that the model
uses to compute the statistics tree in C, by leveraging
Cython to develop a Python extension. With this so-
lution in place, we were able to lower the memory
consumed by up to 65% when using a statistics tree
with ∼40M nodes.
Our work introduced a hybrid modeling method for
numerical data, based on a mixture of categorical
and numeric inputs. As shown in the evaluation sec-
tion, we achieved a 98% reduction in FPs with a
significantly smaller number of parameters required
for modeling and faster training times. The non-
uniform normalization scheme, permitted using the
tree, enables us to efficiently train models on datasets
with skewed numerical outputs and provides a nat-
ural back-off method for the statistical estimation.
Our future research plans will focus on (a) how the
consumption order of the attributes can be efficiently
computed, (b) on investigating more use-cases and (c)
on further refining our results and reducing the num-
ber of False Positives.
With every SMP, it is expected to have frequent
requests to fetch credentials, especially in enterprise
grade, rapid changing environments. This translates
in dozens of access logs being generated with every
request, which could lead to introduction of new ac-
cess patterns. To accommodate for the continuous
shift in patterns, the model would have to be retrained
often, to avoid data drift. This is one aspect that is part
of our future work on improving the model to auto-
matically detect that the model has drifted and decide
when it’s time to retrain.
Adding on future research plans, the current work
focused on attack patterns observed while inspecting
adversary emulations of real-world scenarios. As part
of our future research plans, we will investigate how
to improve the model’s robustness against new and
evolving attack patterns, to consider more viewing
points when deciding if an event is anomalous or not.
Finally, although we have shown that our hy-
brid modeling method proved successful to reduce
the number of False Positives and identify anomalous
events in SMPs access patterns, we only had access
to a single SMP to work with and test our approach
against. To validate that the proposed solution is ef-
fective when confronted with access logs from other
SMPs as well, we will conduct a new experiment us-
ing data gathered from multiple sources, and compare
the results.
