manner similar to CRFs and MEMMs. The sequence
we use here is the sequence of feature vectors.
Let x with x(n) ∈ R
N
be a feature vector se-
quence and s a corresponding state sequence, with
s(n) ∈ {ζ
0
,ζ
1
,ζ
2
,...,ζ
K
}. We use the convention that
the states ζ
1
,ζ
2
,...,ζ
K
describe the normal case, and
ζ
0
the event, that is in our case the occurrence of the
contrast agent. The training data includes no events:
s(n) 6= ζ
0
. Using several states for the normal case,
we obtain a highly descriptive model that covers the
various cases of what is defined to be normal.
For our purpose, we need now to link the training
data to the states of the normal case. This is an unsu-
pervised classification problem, as we need to assign
each feature vector from the training data one label
from the set {ζ
1
,ζ
2
,...,ζ
K
}. Let y
min
be the mini-
mum vessel feature in the training data and y
max
be
the maximum one. We define a
q
:=
K−q+1
K
y
min
+(1 −
K−q+1
K
)y
max
for q = 1, 2, . . . , K + 1, and we label the
training data by s(n) = ζ
q
if a
q
≤
1
T
∑
T −1
m=0
y(n + m) <
a
q+1
.
The idea of the edLLM is to label a feature-vector
sequence of a certain length at once. As each feature
vector is computed from a batch of vessel features, we
use then a larger context for improved decision. We
determine the probability of a state sequence, given a
sequence of feature vectors, similar to CRFs (Gupta
and Sarawagi, 2005). As a difference to CRFs, we
work with sequences of feature vectors, that we define
with the help of a sliding window. Thus we can use
this algorithm online. For the training, however, we
use the whole training data in a single sequence of
feature vectors. he training is conducted in the same
way as for CRFs and MEMMs, except for the penalty
term as described in Section 2.2.2.
2.2.1(a) Log-linear model. Let M ∈ N be the length
of a feature vector sequence x
i
and t
i
∈ N
0
be a
starting index of this sequence, i = 0,1,... where
i = 0 denotes the training data. For a shorter notation,
let x
i
:= [x(t
i
+ m)]
M
m=1
. We do the same with the
states: s
i
(m) := s(t
i
+ m) and s
i
:= [s
i
(m)]
M
m=1
. The
probability of s
i
, given x
i
and s
i
(0) is defined by
(Gupta and Sarawagi, 2005)
p(s
i
|x
i
,s
i
(0)) :=
exp
M
∑
m=1
λ
>
Φ(s
i
(m − 1), s
i
(m),x
i
,m)
Z(x
i
)
. (2)
Z(x
i
) is a normalization value such that p(s
i
|x
i
,s
i
(0))
is a probability. λ is a weighting vector that identifies
features that are important for a successful description
of the normal case. The function Φ establishes the re-
lationship between the feature vectors and the states.
It is defined by
Φ(s
1
,s
2
,x
i
,n) :=
[x
i
(n) · [[s
2
= ζ
k
]]]
K
k=1
h
[[[s
1
= ζ
j
]] · [[s
2
= ζ
k
]]]
K
j=1
i
K
k=1
x
i
(n) · [[s
2
= ζ
0
]]
. (3)
with x
i
(n) ∈ R
N
, Φ(s
1
,s
2
,x
i
,n) ∈ R
(N+1)K+K
2
. [[P]] =
1 if the predicate P is true and 0 otherwise (Gupta
and Sarawagi, 2005). For each sequence i, we need
an initial label s
i
(0) = s(t
i
). For the training data
(that is, i = 0), we have no information about s
0
(0) =
s(0), so we use an arbitrary symbol such that s(0) /∈
{ζ
0
,ζ
1
,...,ζ
K
} (Gupta and Sarawagi, 2005). For
i = 1, the initial label is s
1
(0) = s(t
1
). This is one of
the labels used for the training. For i > 1, s
i
(0) = s(t
i
)
is determined by the inference at this time step.
2.2.1(b) Example. Assume we have only 2D fea-
ture vectors (N = 2) and a sequence of length M = 2
for some i with i > 1, x
i
(1) = [a,b]
>
and x
i
(2) =
[c,d]
>
. Further, assume we have K = 2, that is,
s
i
(n) ∈ {ζ
0
,ζ
1
,ζ
2
}, n = 1,2. Note that with this pa-
rameters, (K + 1)
M
= 9 sequences are possible. We
assume that the first label is ζ
1
, and then we build our
example for all possible length 2 sequences.
Using the definitions above, Φ(ζ
1
,ζ
1
,x
i
,1) =
[a,b,0,0,0,0,0,0,0,0]
>
, and Φ(ζ
1
,ζ
1
,x
i
,2) =
[c,d, 0, 0, 1, 0, 0, 0, 0, 0]
>
. The fifth entry of
Φ(ζ
1
,ζ
1
,x
i
,1) indicates the “transition” from state ζ
1
to ζ
1
(the systems stays in state 1). The first two en-
tries are x
i
(1). If we use the state sequence [ζ
1
,ζ
2
]
>
,
then Φ(ζ
1
,ζ
2
,x
i
,2) = [0,0,c,d,0,1,0,0,0,0]
>
,
so the third and fourth entry of Φ(ζ
1
,ζ
2
,x
i
,1)
are x
i
(2). The sixth entry of Φ(ζ
1
,ζ
2
,x
i
,1) is
1, due to the transition from ζ
1
to ζ
2
. If we as-
sume an event at lag two, that is s(2) = ζ
0
, then
Φ(ζ
1
,ζ
0
,x
i
,2) = [0, 0, 0, 0, 0, 0, 0, 0, c, d]
>
.
We further assume that the weighting vector is
λ = [λ( j)]
10
j=1
] = [1, −1, −1, 1, 1, 0.5, 0.5, 1, 1, 1]
>
.
Then, according to Equation (2), we compute the
unnormalized probability as
˜p(s|x
i
,s(0)) =
exp
2
∑
m=1
λ
>
Φ(s(m − 1), s(m), x
i
,m)
!
. (4)
s(0) is some arbitrary symbol such that
s(0) /∈ {ζ
0
,ζ
1
,ζ
2
}. Hence, ˜p([ζ
1
,ζ
1
]
>
|x
i
,s(0)) =
exp(λ
>
Φ(ζ
1
,ζ
1
,x
i
,s(0))) = exp(a − b + c − d + 1) =
EVENT DETECTION USING LOG-LINEAR MODELS FOR CORONARY CONTRAST AGENT INJECTIONS
175