timization problem, MSS has many real-life applica-
tions such as wireless networks and DNA sequencing
(Joseph et al., 1992; Butenko and Pardalos, 2003).
There are several existing approaches that attempt
to solve the MSS problem, including sequential al-
gorithm, random-priority parallel algorithm, random-
permutation parallel algorithm (Blelloch et al., 2012),
maximum satisfiability solvers (Li and Quan, 2010),
etc. However, the MSS problem is an NP-hard opti-
mization problem and the above-mentioned methods
are all greedy or heuristic algorithms in nature. As
such, it is unlikely that there exists an extremely ef-
ficient algorithm for finding a maximum independent
set of a graph.
Recently, DQN-based reinforcement learning is
used in approximating optimal solutions for combi-
natorial optimization problems (Khalil et al., 2017;
Bello et al., 2016). This idea is motivated by the fact
that real-world optimization problem maintains simi-
lar combinatorial structure but only differs in the data.
This inherent similarity among problem instances ap-
pears to also exist in the MSS problem. It is common
to find that two different MSS problem have similar
combinatorial structures, especially when they arise
in the same domain. This motivates us to train the
MSS problem over a number of randomly generated
graphs that may have a resemblance to unseen real-
world graphs or networks.
In this paper, we consider using graph embedding
and reinforcement learning to approximate the opti-
mal solution to MSS problems. Notice that although
the framework of combining graph embedding and
reinforcement learning has already been used to ap-
proximate the minimum vertex cover set, a comple-
ment of the MSS, of a graph. It is not trivial to ap-
proximate the optimal solution to the MSS problem
since the complement of an estimated minimum ver-
tex cover set is not always an MSS of a graph. Hence,
establishing a new framework for the MSS problem is
necessary.
2 DEEP REINFORCEMENT
LEARNING FRAMEWORKS
2.1 Problem Description
Given a graph G = (V, E), find a subset of nodes S ⊆
G such that no two edges in S are adjacent, and |S| is
maximized.
Let S = (v
1
,··· , v
|S|
) denote a partial solution,
where v
i
∈ V represents the nodes of S, and
¯
S = V \S
the set of candidate nodes to be added. We also use
S to describe the current state of G. Let x represent
a tag of G with the current partial solution S, with
each dimension x
v
= 1 if the node v ∈ S and 0 oth-
erwise. We consider using a maintenance procedure
h(S) which maps an ordered list S to a combinatorial
structure satisfying the specific constraints of a prob-
lem. This maintenance procedure is a standard pro-
cedure in previous research (Khalil et al., 2017). In
our problem setting, the helper functionh(·) is unnec-
essary, because our target is to find a stable set with
the largest size, the quality of a partial solution S can
be simply defined as |S|.
In our framework, we rely on a greedy algorithm,
a popular approach in designing approximation al-
gorithms, that constructs a solution by sequentially
adding nodes to a partial solution. The policy of
choosing which node to be added for each iteration
is determined by some evaluation function Q(h(S),v)
that measures the quality of adding a node to the cur-
rent partial solution. Then, the algorithm will extend
the partial solution S as:
S
:
= (S,v
∗
), where v
∗
= arg max
v∈
¯
S
Q(h(S),v).
2.2 Representation
Graph Embedding. Because we are optimizing
over a graph G = (V,E), we expect that the evalu-
ation function Q should take into account the infor-
mation of G, its current state S and the node v to be
added. The difficulty of expressing Q(h(S), v) moti-
vates us to design a powerful deep learning approxi-
mator
ˆ
Q(h(S),v; Θ) with parameters Θ to estimate the
Q function learned from a collection of problem in-
stances. The problem instances are obtained by gen-
erating a set of graphs {G
i
}
m
i=1
from a distribution D.
In particular, we choose to use structure2Vec as the
graph embedding network due to its effectiveness in
representing structured data (Dai et al., 2016).
We follow (Khalil et al., 2017) to implement graph
embedding. Let µ
v
, a p-dimensional vector, repre-
sent the embedding of node v. Given a graph frame-
work, a network structure is recursively defined by the
stucture2Vec. Specifically, it would begin with initial
embedding µ
v
0
at each node, and then for all v in V
updating the embedding synchronously at each iter-
ation, so the next µ
v
is calculated by a generic non-
linear function with parameters related to the graph.
Then node-specific tags x
v
are aggregated recursively
according to G’s graph topology. After a few steps
of recursion, the network will produce a new embed-
ding µ
v
for each node, taking into account both graph
characteristics and long-range interactions. This pro-
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
484