KAOG algorithm, and propose a modified graph-
based classifier which is more noise resilient than
the KAOG.
The rest of the paper is organized as follows: in
the next section, we present an overview on graph-
based classifiers and relational data followed by
explaining the KAOG algorithm. In section 3, we
introduce the proposed algorithm. Section 4, is the
experimental results and finally the last section is the
conclusion.
2 GRAPH CLASSIFICATION AND
RELATIONAL DATA
The problem of graph classification was first studied
by (
Gonzalez 2002) as a greedy method for finding a
sub-graph. In (Deshpande 2005), the authors resented
a graph classification algorithm that uses frequent
sub-graph discovery algorithms to find all
topological substructures in the dataset. By using
highly efficient frequent sub-graph discovery
algorithms, they reduced the computational
complexity of the proposed algorithm based on
which they were able to select the most
discriminative sub-graph candidate to improve the
accuracy of the classifier. Chatterjee and Raghavan
in (
Chatterjee 2012) proposed a data transformation
algorithm to improve the accuracy of two classifiers
(LD and SVM). First, they employed a similarity
graph neighbourhoods (SGN) in the training feature
subspace and mapped the input dataset by
determining displacements for each entity and then
trained a classifier on the transferred data. On the
other hand, there are some applications which use
the graph structure directly as a classifier (Bertini Jr.
et al 2011
).
In this paper, we combine the idea of relational
data and graph classification to improve the
accuracy of the KAOG classifier in the presence of
noise. Since we compare the proposed method with
KAOG algorithm, in the next sub-section, we
explain the main concept and functionality of the
KAOG algorithm. The main core of the KOAG
algorithm is K-associated graph (KAG) which builds
a graph for an input parameter K. KAG is explained
in the following section.
2.1 Constructing the K-associated
Graph (KAG)
The following (Algorithm 1) illustrates the K-
associated graph construction phase in which a
graph is built based on a fix value of K.
Algorithm 1. Constructing the K-associated graph
from a data set (Bertini Jr. et al 2011)
Input: A constant K and a data set X = {(x1, c1), . . .
, (xi, ci), . . . , (xN,cN)}
Symbols: D
vi ;K
is the label-dependent K-
neighborhood set of vertex v
i
findComponents( ) is a function that returns the
components of a giving graph;
purity( ) is a function that calculates the purity
measure;
1: ←∅
2:
←∅
3: for all
do
4: ∆
,
←
|
Λ
,
5: ← ∪
Δ
,
6: end for
7: ← ,
8: for all
do
9:
←
10:
←
∪
`,`
;∅
11: end for
12: Output: The K-associated graph
,…,
,…,
where component
`
`,`
;
and
represents the purity of
The input of the KAG training phase is a
constant value K and the training dataset X, in which
x
i
shows the i
th
sample of the training set and c
i
shows the corresponding label.
Basically, the KAOG algorithm consists of three
main parts. In the first part, for each vertex v
i
, based
on the input K, the k nearest neighbours of v
i
is
calculated (Λ
,
). In the next step, from Λ
,
, the
samples which have the same label as v
i
are selected
as Δ
,
Based on the∆
,
, some edges are built, starting
at v
i
and ending at ∆
,
members.
In the next step, the findcomponent(V,E)
function is responsible for finding the components
(sub-graphs) which are built in previous step. Each
component consists of some samples from the same
class which form a component. Each class may have
some components which are not connected to each
other. In this function, V is the vertices of the graph
and E is the edges that are generated in previous
step.
In last step, based on Equations (1) and (2) the
purity measure for each component is calculated.
The purity measure illustrates how members of a
component are well connected to each other.
1
(1)
ANoiseResilientandNon-parametricGraph-basedClassifier
171