parameters (Iván, 2009a; 2009b). In 2013, Atish Roy
introduced unsupervised classification based on
statistic model, and generated topology mapping
GTM for application in 3D seismic facies analysis.
This method offsets SOM lack of proper
convergence criterion and parameter selection rules.
(Atish Roy. 2013).
In sum, there are two types’ methods in seismic
signal classification, one is the unsupervised
classification, and the other is supervised
classification:
1) Unsupervised classification such as SOM needs
good initialization conditions.
2) Supervised classification such as SVM need very
good labeled sample and used a lot of memory.
Above mentioned technology have been applied in
seismic data analysis, but there also exist many
defects. The main defect is that algorithm is too
complicate, time consuming and requires very large
memory. Also need very good initialization
conditions. These defects influence the practical
application of these methods.
In the image pattern recognition system, feature
extraction based on high-level semantic use different
types of feature for semantic clustering. Every
semantic cluster contains various underlying
characters such as color, shape etc. Finally form the
top-down image semantic hierarchy clustering
structure. This greatly reduced the complexity of the
algorithm, saved the system resources. Inspired by
this, we proposed high-level feature extraction on
seismic waveform classification. First, we extract
seismic amplitude character, then use the bag of
words model reduce the data dimension. In details,
we consider every seismic data as a document, and
put its character as words, reduce its dimension by
extract its theme, thus extract the feature of seismic
image. Experimental results show that applied the
bag of words to seismic pattern recognition can
obtain good experiment result.
2 PRINCIPLE OF HIGH-LEVEL
SEMANTIC EXTRACTION
For a specific goal, in addition to containing
low-level visual knowledge such as color, shape and
texture, also contain semantic knowledge for human
visual perception. In seismic image processing, this
semantic knowledge is what we called class model.
And how extract this knowledge is an important
issue. According to the current study, the extraction
of image semantic feature generally learns from the
model structure of the text semantic analysis. First,
on the granularity of semantic expression, bag of
words (Li and Perona, 2005) model is a more
common method. This algorithm first define
semantic of different image tiles, describe it as visual
words, then use these visual words to express
different ontology of image, and realize the semantic
study. Secondly, about the extraction of semantic, the
typical models are probabilistic latent semantic
analysis (PLSA) and latent Dirichlet allocation
(LDA). (Blei et al., 2003). According to these
models, there are some research results successfully
used in automatic image annotation and retrieval.
Taken together, the extraction of semantic is mainly
base on machine learning, data mining and relevance
feedback.
2.1 Topic Model on BOW
Bag of words initially originated in text processing.
For a text, suppose we can ignore its word order,
grammar and syntax, only consider it as a word set,
or a word group. And each word is independent, not
depend on the other word. So we can select a word
in anywhere and not influenced by the previous
sentence.
For 3d seismic data waveform, we can consider it
consists of some classification model, and every
class model consists of some waveform character.
That is we think each waveform character in 3d
seismic data volume select a class model with certain
probability. So if we want generate a 3d seismic data
volume, the probability for each waveform character
in it is
p
|
|
|
(1)
So if given a series of 3d seismic data volume,
though training data volume-character, we can study
each feature’s probability in every class model and
each class model’s probability in every 3d seismic
data volume.
When it is implemented, we adopt the Latent
Dirichlet Allocation (LDA) to realize the generation
model of the 3d seismic data volume.
We can use graph model to describe the topic
model. As shown in figure 1.
LDA first proposed by Blei and David M. etc. in
2003. (Blei et al., 2003). At present in the text
mining including text theme identify, text
classification and text similarity computing have
been widely applied. It is a topic model, and the
GISTAM2015-1stInternationalConferenceonGeographicalInformationSystemsTheory,ApplicationsandManagement
30