isomorphic to the query graph q. One approach for
subgraph searching could be to exhaustively perform
subgraph matching for all the graphs g
i
in the graph
database G and verify using subgraph isomorphism
between q and g
i
, where g
i
∈ G (as defined below).
Such a naive method is however computationally ex-
pensive.
The recent heuristics methods developed to solve the
subgraph isomorphism (or subgraph matching) prob-
lem have shown significant performance improve-
ment. Most existing algorithms for subgraph isomor-
phism are based on the idea of backtracking, where
the query vertices are matched to vertices in the data
graph incrementally (Sun et al., 2022a; Sun et al.,
2022b; Kim et al., 2021; Min et al., 2021; Han et al.,
2019). The general framework for the subgraph iso-
morphism consists of two tasks: filtering and match-
ing. In the filtering phase, the number of vertices in
the data graph mapped to the vertices in the query
graph is reduced, thereby producing the potential can-
didate vertex sets. In the matching phase, the back-
tracking approach is performed by recursively extend-
ing each vertex in the candidate sets and then check-
ing for the subgraph isomorphism of the query graph.
Nevertheless, the subgraph isomorphism approaches
are generally evaluated for small-sized query graphs
and only a single data graph.
On the other hand, some approaches answer the sub-
graph query by typically utilizing the two phase filter-
then-verify (FTV) framework to build the subgraph
indices (Licheri et al., 2021; Luaces et al., 2021). The
first phase, index construction phase, involves enu-
meration of the graph patterns (such as paths or trees
or cycles, etc.) either through mining frequent pat-
terns or exhaustive enumeration. This is then fol-
lowed by query processing phase, which includes
filtering out the data graphs in the graph database
that does not contain query graph, thereby generat-
ing pruned candidate graphs. Finally, in the verifi-
cation step, the subgraph matching is performed on
the candidate graphs. Therefore, the pruning power
of the indexes reduces the number of subgraph iso-
morphic verification to the less number of candidate
graphs. However, the works on subgraph indexing
are mainly targeted towards setting where there are
many numbers of small data graphs. Further, the algo-
rithms those uses mining-based approach to generate
frequent patterns and then index them produces sta-
ble indexing structure but consumes longer construc-
tion time. While the methods that perform exhaustive
enumeration of small patterns, require more memory
space to store all the permutations of patterns in the
data graphs. Our approach is developed to provide
a compact suffix tree representation for subgraph in-
dexing.
3 PRELIMINARIES AND
PROBLEM DEFINITION
In this paper, we consider undirected, connected,
vertex-labelled graphs in transaction graph database.
The transaction graph database is a set of large num-
ber of small graphs (or connected components), called
data graphs, G = {g
1
, g
2
, g
3
, ..., g
n
}. Each graph g
i
is defined as a triplet g = {V
g
, E
g
, L
g
}, composed of
three elements where V
g
is the set of vertices, E
g
is
the set of edges between vertices in the graph, and L
g
is the set of labels associated with the vertices in the
graph. More precisely, L
g
is a mapping function that
maps vertex to a label in σ, where σ is the distinct la-
bel list. Subgraphs correspond to a subset of nodes of
the original graphs. A graph h is said to be induced
subgraph of a data graph g if the vertices in h is the
subset of g, V
h
⊆ V
g
, and the corresponding edge set
consists of all the edges in E
g
that have both the end-
points in V
h
. The label of the vertices in h is the same
as the label of the corresponding vertex in g.
Definition 1 (Subgraph isomorphism (Kim et al.,
2021)). Given a query graph q and data graph g, an
embedding of q in g is a mapping M : V
q
→ V
g
such
that: (1) M is injective (i.e. M(u) ̸= M(u
′
) for every
u ̸= u
′
∈ V
q
, (2) L
q
(u) = L
g
(M(u)) for every u ∈ V
q
(3)
(M(u), M(u
′
)) ∈ E
g
for every (u, u
′
) ∈ E
q
. q is said to
be isomorphic to g, denoted by q ⊆ g if there exists an
embedding of q in G.
Our proposed indexing structure aims to speed
up the search for subgraphs while maintaining small-
sized indexes.
Definition 2 (Subgraph searching). For a given trans-
action graph database with n data graphs G =
{g
1
, g
2
, g
3
, ..., g
n
} and a query graph q, the subgraph
search is to find all the graphs g
i
in the database that
contain q.
4 OUR COMPRESSED TRIE
We now derive our construction of a suffix tree that
supports the two above-mentioned inner node struc-
tures.
4.1 Compressed Suffix Tree
Construction
In this section, we discuss the representation and the
process of building the compressed suffix tree to in-
dex the paths in the graph for obtaining the candidate
graphs. Inspired by the idea of text compression, our
method implements a radix tree like index structure
DATA 2022 - 11th International Conference on Data Science, Technology and Applications
534