and the q negative eigenvalues that have the high-
est absolute value. Each direction is scaled by the
magnitude of the corresponding eigenvalue.
• Positive Pseudo-Euclidean Space (PPES): This
p-dimensional space is defined as PES, but only
the p largest positive eigenvalues are kept.
• Negative Pseudo-Euclidean Space (NPES): This
q-dimensional space is defined as PES, but only
the q largest negative eigenvalues (in magnitude)
are kept; no positive eigenvalues are used.
• Corrected Euclidean Space (CES): In CES, a
constant is added to all the eigenvalues (positive
and negative) to ensure that they all become pos-
itive. This constant is given by 2|a|, where a is
the negative eigenvalue with the largest absolute
value.
2.2 Dissimilarity Spaces (DS)
We consider four more spaces constructed in the fol-
lowing way: we compute the pairwise Euclidean dis-
tances between data points of one of the spaces de-
fined above. These distances are new feature repre-
sentations of x
i
. Note that the dimension of the fea-
ture space is equal to the number of points.
Since our classifier suffers from the curse of di-
mensionality, we must reduce the number of features;
there are several techniques for that (Hastie et al.,
2009). We chose k-means to find a number of pro-
totypes k < N. k is selected as a certain percent-
age of N/2, and the algorithm is initialized in a de-
terministic way as described in (Su and Dy, 2007).
After the k prototypes are found, the distances from
each point x
i
to each of these prototypes are used
as their new feature representations. This defines
four new spaces, which are named as Dissimilarity
Pseudo-Euclidean Space (DPES), Dissimilarity Posi-
tive Pseudo-Euclidean Space (DPPES), Dissimilarity
Negative Pseudo-Euclidean Space (DNPES) and Dis-
similarity Corrected Euclidean Space (DCES).
3 THE MAP-DID ALGORITHM
In this section, dissimilarities between patterns in the
eight previously defined spaces are computed as Eu-
clidean distances.
3.1 Dissimilarity Increments
Distribution (DID)
Let X be a set of patterns, and (x
i
,x
j
,x
k
) a triplet
of nearest neighbors belonging to X, where x
j
is the
nearest neighbor of x
i
and x
k
is the nearest neighbor
of x
j
, different from x
i
. The dissimilarity increment
(DI) (Fred and Leit˜ao, 2003) between these patterns
is defined as d
inc
(x
i
,x
j
,x
k
) =
d(x
i
,x
j
) −d(x
j
,x
k
)
.
This measure contains information different from a
distance: the latter is a pairwise measure, while the
former is a measure for a triplet of points, thus a mea-
sure of higher-order dissimilarity of the data.
In (Aidos and Fred, 2011) the DIs distribution
(DID) was derived under the hypothesis of Gaussian
distribution of the data and it was written as a function
of the mean value of the DIs, λ. Therefore, the DID
of a class is given by
p
d
inc
(w;λ) =
πβ
2
4λ
2
wexp
−
πβ
2
4λ
2
w
2
+
π
2
β
3
8
√
2λ
3
×
4λ
2
πβ
2
−w
2
exp
−
πβ
2
8λ
2
w
2
erfc
√
πβ
2
√
2λ
w
, (1)
where erfc(·) is the complementary error function,
and β = 2−
√
2.
3.2 MAP-DID
Consider that {x
i
,c
i
,inc
i
}
N
i=1
is our dataset, where x
i
is a feature vector in R
d
, c
i
is the class label and inc
i
is the set of increments yielded by all the triplets of
points containing x
i
. We assume that a class c
i
has
a single statistical model for the increments, with an
associated parameter λ
i
. This DID, described above,
can be seen as high-order statistics of the data since it
has information of a third order dissimilarity of data.
For example, we generate a 2-dimensional Gaus-
sian with 1000 points; it has zero mean and covari-
ance the identity matrix (figure 1 left). We also
generate a 2-dimensional dataset with 1000 points,
where 996 points are in the center and there are four
off-center points at coordinates (±a, 0) and (0,±a),
where a is such that the covariance is also the identity
matrix (figure 1 right). We compute the DIs for each
dataset and look at their histograms (figure 1).
Although the datasets have the same mean and co-
variancematrix, the two DIs distributionsare very dif-
ferent from each other. Therefore, the DIs can be seen
as a measure of higher-order statistics: the two dis-
tributions under consideration have exactly the same
mean and variance, but their DIDs are vastly different.
So, we design a maximum a posteriori (MAP)
classifier that combines the Gaussian Mixture Model
(GMM) and the information given by the increments,
assuming that x
i
and inc
i
are conditionally indepen-
dent given c
j
. We used a prior given by p(c
j
) =
|c
j
|/N, with |c
j
| the number of points of class j, and
the likelihood p(x
i
,inc
i
|c
j
) = p(x
i
|c
j
)p(inc
i
|c
j
).
CLASSIFICATION USING HIGH ORDER DISSIMILARITIES IN NON-EUCLIDEAN SPACES
307