The remainder of this paper is organized as
follows. Section 2 reviews related work and Section
3 gives a detailed account of the proposed deep
neural network and the features which have been
used in this paper. Section 4 details the geographical
corpus we constructed. The experimental results are
introduced in Section 5. We finally conclude our
work in Section 6.
2 RELATED WORK
Recently, deep learning has become an attractive
area for multiple applications. It has made major
breakthroughs in the field of computer vision and
speech recognition, and natural language
understanding is another area in which deep learning
is poised to make a large impact over the next few
years (LeCun et al.,2015). Among the different deep
learning strategies, convolutional neural networks
have been successfully applied to different NLP task
such as POS tagging, sentiment analysis, semantic
role labelling, etc.
Some people have applied deep learning on
relation extraction. Socher et al. (2012) tackle
relation classification using a recursive neural
network that assigns a matrix-vector representation
to every node in a parse tree. The representation for
longer phrases is computed bottom-up by
recursively combining the word according to the
syntactic structure of the parse tree. Mikolov et al.
(2013) found that the vector-space word
representations are surprisingly good at capturing
syntactic and semantic regularities in language, and
that each relationship is characterized by a relation-
specific vector offset. For example, the male/female
relationship is automatically learned, and with the
induced vector representations, “King – Man +
Woman” results in a vector very close to “Queen”. It
is an effective theoretical support for instance
relation and parallel relation extraction. Zeng et al.
(2014) proposed an approach to predict the
relationship between two marked nouns where
lexical and sentence level features are learned
through a CNN. The experimental results
demonstrate that the position features are critical for
relation classification, while the extracted lexical
and sentence level features are effective for relation
classification. Zhang et al. (2015) considered that a
key issue that has not been well addressed by the
CNN-based method, which is the lack of capability
to learn temporal features, especially long-distance
dependency between nominal pairs, so they
proposed a simple framework based on recurrent
neural networks and compared it with CNN-based
model. Experiments on two different datasets
strongly indicate that the RNN-based model can
deliver better performance on relation classification,
and it is particularly capable of learning long-
distance relation patterns. Xu et al. (2015) thought
the method of Zeng et al. has been proved
effectively, but it often suffers from irrelevant
subsequences or clauses, especially when subjects
and objects are in a longer distance. Moreover, such
information will hurt the extraction performance
when it is incorporated into the model. Therefore,
they proposed to learn a more robust relation
representation from a convolution neural network
model that works on the simple dependency path
between subjects and objects, which naturally
characterizes the relationship between two nominal
and avoids negative effects from other irrelevant
chunks or clauses. Simulation results indicates that
the performance has been improved to 85.4%.
3 THE PROPOSED DEEP
NEURAL NETWORK
The proposed deep neural network architecture is
shown in Figure 1. The deep neural network is
divided into three layers. In the first layer, each
character (or class key words) is mapped into
character embedding (or class embedding) space.
Extract features in the second layer which
include:(1) Select character embedding by sliding
window, then extract the character features by the
neural network;(2) Character embedding of a
sentence is taken as the input of the convolutional
layer, and sentence features is extracted after
pooling;(3) Class keywords embedding are
concatenated, then input them into the neural
network to extract the class features. In the last
layer, we assemble the three features head-to-tail in
sequence as the input of fully connected layer, to
predict the label of character.
3.1 Character Embedding
Traditionally, a word is represented by a one-hot-
spot vector. The vector size equals the vocabulary
size. The element at the word index is “1” while the
other elements are “0”s, which leads to the curse of
dimensionality and sparseness. Now, a word is
represented by a real-valued vector in
which
could learn through language model.
Character embedding is a finer-grained
representation of word embedding. Now, character
embedding is primarily applied to Chinese
segmentation via deep neural networks. The Chinese
Chinese Geographical Knowledge Entity Relation Extraction via Deep Neural Networks
25