WORKLOAD HIDDEN MARKOV MODEL FOR
ANOMALY DETECTION
Juan Manuel Garc
´
ıa
Instituto Tecnol
´
ogico de Morelia
Morelia, M
´
exico
Tom
´
as Navarrete
Instituto Tecnol
´
ogico de Morelia
Morelia, M
´
exico
Carlos Orozco
FIRA - Banco de M
´
exico
Morelia, M
´
exico
Keywords:
Intrusion detection, anomaly detection, time series analysis, Markov processes.
Abstract:
We present an approach to anomaly detection based on the construction of a Hidden Markov Model trained
on processor workload data. Based on processor load measurements, a HMM is constructed as a model of the
system normal behavior. Any observed sequence of processor load measurements that is unlikely generated by
the HMM is then considered as an anomaly. We test our approach taking real data of a mail server processor
load to construct a HMM and then we test it under several experimental conditions including a simulated DoS
attacks. We show some evidence suggesting that this method could be successful to detect attacks or misuse
that directly affects processor performance.
1 INTRODUCTION
Since the beginning of the intrusion detection study
(Denning, 1987) two complementary approaches to
detect a possible intruder were established :
1. Anomaly detection, where the strategy is to suspect
of what is considered an unusual activity for the
subject (users, processes, etc.) and carry on fur-
ther investigation. This approach is particularly ef-
fective against novel (i.e. previously unknown) at-
tacks. Its main drawback is the high rate of false
positives, because any legitim but new activity can
rise an alert.
2. Signature detection, where the strategy is to look
for some special activity (signature) of previously
known attacks. Signature based detection systems
detects previously known attack in a timely and ef-
ficient way. The main issue of this approach is that
in order to detect an intrusion this must to be previ-
ously detected.
In order to carry on anomaly detection is required to
establish what is the normal state of a system. Sev-
eral approaches have been used to define what is sys-
tem normality. The state of a computer system can
be defined in terms of several measurable variables
like processor load, memory usage, processes num-
ber, etc. Matter of fact, system administrators usu-
ally observe some of this variables in order to detect
if something is going wrong. This variables then can
be used to define system normality using statistical
analysis, so we can obtain an adaptive intrusion de-
tection system.
In this work, we use CPU server load measure-
ments to construct a Hidden Markov Model that re-
flects the expected variations of server workload. We
pretend to find an effective and efficient way to de-
tect attacks (like DoS) that directly degrade the server
performance.
2 RELATED WORK
Anomaly detection relies on models of what is con-
sidered the ’normal’ behavior of users, systems and
applications and interprets deviations of this behav-
ior as evidence of malicious activity (Denning, 1987;
Ko et al., 1997; Gosh et al., 1998; Lane and Brod-
ley, 1999). Several techniques to express quantita-
tively the normal state of a system have been pro-
posed, including analysis of data streams of network
traffic (Lee et al., 1999), sequence analysis for operat-
ing system calls (Forrest et al., 1996), and data mining
of system audit data (Lee and Stolfo, 2000).
Some recent work on anomaly detection applies
56
Manuel García J., Navarrete T. and Orozco C. (2006).
WORKLOAD HIDDEN MARKOV MODEL FOR ANOMALY DETECTION.
In Proceedings of the International Conference on Security and Cryptography, pages 56-60
DOI: 10.5220/0002099700560060
Copyright
c
SciTePress
principles and concepts borrowed from biological sci-
ences (Coull et al., 2003; Burgess, 1998; Forrest et al.,
1996; Forrest et al., 1997). In particular some ap-
proaches are inspired by immunology, as in (Forrest
et al., 1996) where system calls are analyzed to infer
anomalous behavior. In (Lane and Brodley, 1999) a
technique that applies Instance Based Learning (IBL)
to temporal sequence of events in order to character-
ize normal behavior of users, systems and applica-
tions was presented. In (Michael and Ghosh, 2002) a
finite-state machine was constructed from audit data
to monitor statistical deviations from normal program
behavior. In (Yin et al., 2004) a Markov chain model
of system calls was applied to anomaly detection.
Specially related to our work, are the techniques
presented by (Wright et al., 2004) to building Hid-
den Markov Models profiles for network traffic, using
only information that remains invariant after encryp-
tion like packet size and arrival time.
Hidden Markov Models (HMMs) are a tool for
modeling time series data, and are used for compu-
tational molecular biology, data compression, speech
recognition, computer vision and other areas of artifi-
cial intelligence and pattern recognition. For a general
introduction to Hidden Markov Models and its appli-
cations see (Ghahramani, 2002; Jordan et al., 1999).
3 MEASUREMENTS
First of all, processor workload of a mail server was
registered over several months. Processor load was
measured each 120 seconds through SNMP queries
(MacFaden et al., 2003; Presuhn, 2002). This time
interval was used because was observed that any mea-
surement taken in between was almost invariant.
As an example, measurements taken on January
2005 are showed in figure 1. Each curve in figure
correspond to a day of the month, and each point rep-
resents a 15 min. average of processor load.
Figure 1: Processor load measured on January 2005.
Strong correlation was observed between measure-
ments taken on the same day of the week, with least
activity on weekends and most activity on Monday
morning. Monthly averages were calculated for the
seven days of the week. In figure 2 are showed the
Monday averages where peaks of activity can be ob-
served. We found that those peaks are close related
to the organization activity. For example, the peak at
9:00 a.m. showed in fig. 2 correlates to the beginning
of weekly activities, when most of the users download
the emails that they receive on weekend. Also we can
see an activity descent near three o’clock when most
of the users take their lunch, and a peak of activity
near the checkout time at eight o’clock. On week-
ends, when nobody is working, the activity measured
remains almost flat.
Figure 2: Average processor load on mondays.
4 HMM CONSTRUCTION
Let us denote the observation at time t by the vari-
able Y
t
. First, according to the HMM, we assume that
the observation at the time t was generated by some
process whose state S
t
is hidden from the observer.
A second assumption is that the state S
t
is dependent
only of the previous state S
t1
and the output Y
t
only
depends on the state S
t
. A third assumption is that
the hidden state variable is discrete: S
t
can take on K
values denoted by the integers {1, . . . , K}. Then, in
order to define a HMM, is needed to specify a prob-
ability distribution over the initial state P (S
1
), the
K × K state transition matrix defining P (S
t
|S
t1
)
and output model Π defining P (Y
t
|S
t
).
Actually we construct a HMM for each day of the
week, and we use monthly average sequences of each
day of the week as input of the learning algorithm . In
the following discussion, every numerical example is
referred to the HMM of Monday.
In our case, our sequence of observations Y
t
take
integer values ranging from 0 to 100, representing the
percentage of processor load. To determine K we ap-
plied the gradient over the monthly averaged values
WORKLOAD HIDDEN MARKOV MODEL FOR ANOMALY DETECTION
57
to obtain critical points that suggest state transitions,
and taking the number of critical points we obtain that
for Mondays K = 6. Then each state transition is re-
lated with a major change of the average of processor
load and we use this fact to construct the initial prob-
abilities of our model.
We take a Gaussian observation model
P (Y
t
|S
t
) =
1
σ
S
2π
e
(Y
2
t
µ
2
S
)/(2σ
2
S
)
(1)
where σ
S
and µ
S
depends on the state S
t
.
We associate S
1
with the ”on rest” state of the sys-
tem, and taking our observations beginning at mid-
night, S
1
is always the initial state so we take the start
state probability P (S
1
) as 1 when S
1
= 0 and 0 else-
where.
To estimate the initial transition matrix we estimate
probabilities using a kind of Bayesian rule as follows
P (S
i
|S
j
)
f(µ(S
i
)|µ(S
j
))
f(µ(S
j
))
(2)
where f(µ(S
i
)|µ(S
j
)) is the observed frequency of
transition of the S
j
associated average to the S
i
re-
lated average, and f(µ(S
j
)) as the measured fre-
quency of the average associated with the S
j
state.
Using this method we estimate a initial state transi-
tion matrix as
0, 97 0, 03 0 0 0 0
0 0, 88 0, 12 0 0 0
0 0 0, 94 0, 5 0 0
0 0 0 0, 8 0, 2 0
0 0 0 0 0, 92 0, 08
0, 13 0 0 0 0 0, 87
With this initial state transition matrix, the Baum-
Welch algorithm (Ghahramani, 2002) was applied to
learn the processor load measurement sequences to
obtain the state transition matrix:
0, 977 0, 023, 0 0 0 0
0 0, 933 0, 067 0 0 0
0 0 0, 939 0, 061 0 0
0 0 0 0, 89 0, 11 0
0 0 0 0 0, 916 0, 084
0, 199 0 0 0 0 0, 801
As can be observed, the final transition matrix is
close to our initial matrix so we can guess that our
outlined construction method give us almost a correct
HMM.
Finally, as output probability distributions we ad-
just normal distributions to the observations mea-
sured, obtaining for each state the parameters showed
at table 1
Table 1: Output normal distributions parameters.
State µ
S
σ
S
0 11,41 2,47
1 32,86 9,50
2 25,33 4,11
3 16,35 1,82
4 24,00 5,13
5 21,64 6,35
5 EXPERIMENTAL RESULTS
Given a observation sequence X
1
, X
2
, . . . , X
T
we
use the Forward-Backward algorithm to estimate the
probability P (X
1:T
) that such sequence could be gen-
erated by our HMM. Since the probability P (Y
1:T
)
for a typical sequence Y
1
, . . . , Y
T
of normal behavior
is 10
104
we use a the following metric
|log P (X
1:T
) log P (Y
1:T
)| (3)
to discriminate between normal and abnormal obser-
vations.
To test our model we apply several simulated ob-
servation sequences to see if they could be detected
as anomalies:
1. Noise. We test our detector with a sequence
X
1
, X
2
, . . . where each X
i
is a random number be-
tween 0 and 100 following a binomial distribution
with different means. With exception of weekend’s
models, this sequence was alway rejected.
2. Catastrophe. (Burgess et al., 2002) In a valid ob-
servation sequence we introduce sudden discontin-
uous changes with variant intensity. The anomaly
was detected if the magnitude measured by (3) was
greater than 2, 6.
3. DoS attack. In a similar way, we introduce in a
valid observation sequence some measurements in-
dicating a processor overload. This kind of se-
quences were always detected as an anomaly, ex-
cept for the following case.
4. Mimicry attacks (Wagner and Soto, 2002) To si-
mulate a mimicry attack effect, we introduce some
data indicating processor overload on a valid se-
quence, but in coincidence with normal peaks of
activity. As in catastrophe’s case, this simulated at-
tacks were detected if (3) was greater than 2, 6.
By the time this paper is written, we are testing
our model on an real productive environment in or-
der to obtain rates of false positives. A first discov-
ery is that normal administrative tasks (like a patch
installation) can rise false alerts. We are also testing
with a HMM generated by K-Means algorithm but,
until now, whitout obtaining better results that those
reported here.
SECRYPT 2006 - INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY
58
6 CONCLUSIONS AND FUTURE
WORK
In this paper we present an application of Hidden
Markov Models of processor load behavior for anom-
aly detection. We show experimental evidence sug-
gesting that this approach can be successful to detect
attacks or misuse that directly affects processor per-
formance. As we state in the introduction, system
normality can be defined in terms of several variables
like processor load, memory usage, etc., and then our
method could be more effective if it takes into account
not only processor load but another parameters like
network traffic (Wright et al., 2004).
We found in our case that processor load is close
related with activity cycles of our organization. A
more realistic model of what is the normal behavior
of a system must take under consideration natural and
social cycles of activity.
Finally we agree with (Axelsson, 2000) in the con-
clusion that intrusion detection is a problem far from
been solved.
ACKNOWLEDGEMENTS
The authors wish to thank to FIRA - Banco de M
´
exico
for provide us with experimental data and logistic
support.
Also we wish to tank to our anonymous referees for
their useful and insightful comments.
REFERENCES
Axelsson, S. (2000). The base-rate fallacy and the difficulty
of intrusion detection. ACM Trans. Inf. Syst. Secur.,
3(3):186–205.
Burgess, M. (1998). Computer immunology. In LISA
’98: Proceedings of the 12th Conference on Systems
Administration, pages 283–298, Berkeley, CA, USA.
USENIX Association.
Burgess, M., Haugerud, H., Straumsnes, S., and Reitan, T.
(2002). Measuring system normality. ACM Trans.
Comput. Syst., 20(2):125–160.
Coull, S., Branch, J., Szymanski, B., and Breimer, E.
(2003). Intrusion detection: A bioinformatics ap-
proach. In ACSAC ’03: Proceedings of the 19th
Annual Computer Security Applications Conference,
page 24, Washington, DC, USA. IEEE Computer So-
ciety.
Denning, D. E. (1987). An intrusion-detection model. IEEE
Trans. Softw. Eng., 13(2):222–232.
Forrest, S., Hofmeyr, S. A., and Somayaji, A. (1997). Com-
puter immunology. Commun. ACM, 40(10):88–96.
Forrest, S., Hofmeyr, S. A., Somayaji, A., and Longstaff,
T. A. (1996). A sense of self for unix processes. In SP
’96: Proceedings of the 1996 IEEE Symposium on Se-
curity and Privacy, page 120, Washington, DC, USA.
IEEE Computer Society.
Ghahramani, Z. (2002). An introduction to hidden markov
models and bayesian networks. Hidden Markov mod-
els: applications in computer vision, pages 9–42.
Gosh, A. K., Wanken, J., and Charron, F. (1998). Detecting
anomalous and unknown intrusions against programs.
In ACSAC ’98: Proceedings of the 14th Annual Com-
puter Security Applications Conference, page 259,
Washington, DC, USA. IEEE Computer Society.
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul,
L. K. (1999). An introduction to variational methods
for graphical models. Mach. Learn., 37(2):183–233.
Ko, C., Ruschitzka, M., and Levitt, K. (1997). Execution
monitoring of security-critical programs in distributed
systems: a specification-based approach. In SP ’97:
Proceedings of the 1997 IEEE Symposium on Security
and Privacy, page 175, Washington, DC, USA. IEEE
Computer Society.
Lane, T. and Brodley, C. E. (1999). Temporal sequence
learning and data reduction for anomaly detection.
ACM Trans. Inf. Syst. Secur., 2(3):295–331.
Lee, W. and Stolfo, S. J. (2000). A framework for con-
structing features and models for intrusion detection
systems. ACM Trans. Inf. Syst. Secur., 3(4):227–261.
Lee, W., Stolfo, S. J., and Mok, K. W. (1999). Mining in
a data-flow environment: experience in network intru-
sion detection. In KDD ’99: Proceedings of the fifth
ACM SIGKDD international conference on Knowl-
edge discovery and data mining, pages 114–124, New
York, NY, USA. ACM Press.
MacFaden, M., Partain, D., Saperia, J., and Tackabury,
W. (2003). Configuring Networks and Devices
with Simple Network Management Protocol (SNMP),
RFC3512. RFC Editor, United States.
Michael, C. C. and Ghosh, A. (2002). Simple, state-
based approaches to program-based anomaly detec-
tion. ACM Trans. Inf. Syst. Secur., 5(3):203–237.
Presuhn, R. (2002). Management Information Base
(MIB) for the Simple Network Management Protocol
(SNMP), RFC 3418. RFC Editor, United States.
Wagner, D. and Soto, P. (2002). Mimicry attacks on host-
based intrusion detection systems. In CCS ’02: Pro-
ceedings of the 9th ACM conference on Computer and
communications security, pages 255–264, New York,
NY, USA. ACM Press.
Wright, C., Monrose, F., and Masson, G. M. (2004).
Hmm profiles for network traffic classification. In
VizSEC/DMSEC ’04: Proceedings of the 2004 ACM
workshop on Visualization and data mining for com-
puter security, pages 9–15, New York, NY, USA.
ACM Press.
Yin, Q., Zhang, R., and Li, X. (2004). An new intrusion de-
tection method based on linear prediction. In InfoSecu
’04: Proceedings of the 3rd international conference
on Information security, pages 160–165, New York,
NY, USA. ACM Press.
WORKLOAD HIDDEN MARKOV MODEL FOR ANOMALY DETECTION
59