complexity than exact graph matching as it takes dis-
tortion and noise into account during the matching
process. Indeed, the exact algorithms dedicated to
solving error-tolerant graph matching are computa-
tionally complex (Vento, 2014); and (M. Neuhaus
and Bunke., 2006)). Consequently, lots of works
have been employed to approximately solve the error-
tolerant graph matching problem. Such methods are
often called heuristics or approximate methods. Ap-
proximate methods for the error-tolerant graph match-
ing problem have been investigated based on genetic
algrithm (A.D.J. Cross and Hancock, 1997), proba-
bilistic relaxation (W. Christmas and Petrou., 1995),
EM algorithm (Andrew D. J. Cross, 1998); (Finch,
1998) and neural networks (Kuner and Ueberreiter,
1988). The aforementioned techniques are expected
to present a polynomial run-time. However, they can-
not ensure the quality of their solutions and are likely
to output suboptimal solutions.
Graph Edit Distance, referred to as GED, is an
error-tolerant technique that has been widely studied
and largely applied to PR (Tsai and Fu, 1979). Its
flexibility comes from its generality as it is applicable
on unconstrained attributed graphs. GED is a general-
ization of the graph isomorphism problem where the
goal is to minimize the cost of graph transformation.
In GED, graph g
1
is transformed into graph g
2
by
means of series of transformations. The allowed edit
operations are: deletion, insertion and substitution of
nodes and their corresponding edges. GED is compu-
tationally complex or expensive, it is said to be an NP-
COMPLETE problem where the complexity is expo-
nential in the number of nodes of the involved graphs.
Such a fact limits GED algorithms to work on rela-
tively small graphs. To overcome this problem three
main directions have been adopted in the literature.
First, optimal methods based on admissible heuristics
to prune the search space (e.g., (Riesen et al., 2007)).
Second, sub-optimal methods simplifying the prob-
lem (e.g., (M. Neuhaus and Bunke., 2006)). Third,
sub-optimal methods by means of approximate op-
timization algorithm (e.g., (Riesen, 2009); and (An-
dreas Fischer, 2013)). However, sub-optimal meth-
ods does not guarantee to find the best matching and
the error rate gets higher as the involved graphs get
larger. Accordingly, in this thesis a focus is given to
optimal methods. To prevent the combinatorial ex-
plosion, many works have been focused on efficiently
pruning the search space. The computation of ad-
missible lower bounds have been deeply studied to
reduce memory and CPU complexity (Bunke, 1983);
and (Zeng et al., 2009).
Most of the current techniques are optimized for
centralized graph processing. A distributed approach
providing horizontal scalability is required in order to
handle the analysis workload. Thus, besides provid-
ing lower and upper bounds for the problem, we have
adopted the idea of decomposing GED into smaller
problems, or sub-problems, via a divide and conquer
strategy. The sub-problems are then solved in a dis-
tributed manner.
The rest of this report is organized as follows. In
Section 2, a focus on the related works is given. In
Section 3 the notations and the definitions used in the
paper are presented and our approach is positioned
up in the literature. Section 4 reports our proposed
sequential algorithm used to solve GED. In Section
5, the chosen parallel computing model and the pro-
posed distributed GED are presented, respectively. In
Section 6 the databases and the experimental protocol
used to point out the performance of the proposed ap-
proaches are represented. Section 7 demonstrates the
results achieved so far. Finally, conclusions are drawn
and future perspectives are discussed in Section 8.
2 RELATED WORKS
The distributed and parallel graph matching methods,
presented in the literature, can be divided into two cat-
egories: Data-Parallelism and Search-Parallelism. In
Data-Parallelism, the graphs g
1
and g
2
can be parti-
tioned into sub-graphs. These small sub-graphs can
be matched independently in a sequential or in a par-
allel manner. The results of all the sub-problems
are reassembled producing a global answer of the
main graph matching problem ((Qiu and Hancock,
2006); (Patwary et al., 2010); and (Kollias, 2012)). In
Search-Parallelism, matching g
1
and g
2
is considered
as a single problem. However, the search space of g
1
and g
2
is partitioned and then processed in a com-
pletely parallel manner ((Allen and Yasuda, 1997);
(Wan, 1998); and (Plantenga, 2013)).
We focus on two distributed works, belonging to
the Search-Parallelism category and are dedicated to
solving graph matching problems:
2.1 Maspar-SIMD
A parallel inexact graph matching algorithm is pro-
posed in [22]. This algorithm, referred to as MasPar-
SIMD, is depth-first branch-and-bound for determin-
ing a minimum-distance correspondence between two
unlabeled graphs. The heuristic, called forward
checking, used to prune the search space, examines
the possible sources of edges mismatches and thus
forward checking keeps track of constraints (edges)
that are not satisfied. The degree of mismatch of a
ICPRAM2015-DoctoralConsortium
4