from presumed unlabeled seismic data, our proposed
system is generic and scalable and relies entirely on
objective closeness metrics in feature space that re-
moves dependency on a more constraining regional
scheme. For each of our distribution and signal type
feature vectors, both cluster analysis and classifica-
tion results affirm seismic pattern relations that cross
continent boundaries, suggesting similarity of impar-
tial macroseismic effects.
The data we acquired comprised of a large number
of hundreds of thousands earthquake events, recorded
in an extended period of time of four decades, and af-
fected a few thousands sites around the world. How-
ever, only a few hundreds of places, each bearing at
least several hundreds of shake occurrences, are sta-
tistically reasoned and pertinent to our probabilistic
system approach. Advancing the growth of the seis-
mic training set is imperative to our work and directly
affects classification robustness. Yet using geograph-
ical locations that endured under one hundred seismic
events is a suboptimal choice for our system, giving
rise to highly sparse feature vectors. Alternatively, we
contend that by coalescing locations of a small event
count into a macro seismic site, based on geo-spatial
proximity considerations, our training collection size
is likely to increase further and proportionally let us
gain a more stable classification process.
A direct progression of our work is to assume
no foregoing knowledge of the number of seismic
clusters to generate, and discover both the model fit-
ting and the selection dimension directly from the in-
complete seismic training set, using a combination of
Akaike and Bayesian information criteria. We look
forward to further incorporate the three dimensional
geometrical data provided in a GeoJSON object, and
possibly detect seismic similarity along either a longi-
tude or a latitude extent perspective. Lastly, the flex-
ibility of our software allows us to pursue a higher
level, inter-cluster network study to better understand
second order set of seismic relations.
ACKNOWLEDGEMENTS
We would like to thank the anonymous reviewers for
their insightful and helpful feedback.
REFERENCES
Akaike, H. (1973). Information theory and an extension of
the maximum likelihood principle. In International
Symposium on Information Theory, pages 267–281,
Budapest, Hungary.
Baeza-Yates, R. and Ribeiro-Neto, B., editors (1999). Mod-
ern Information Retrieval. ACM Press Series/Addison
Wesley, Essex, UK.
Baum, L. E. (1972). An inequality and associated maxi-
mization technique in statistical estimation for proba-
bilistic functions of Markov processes. In Symposium
on Inequalities, pages 1–8, Los Angeles, CA.
Baum, L. E. and Petrie, T. (1966). Statistical inference for
probabilistic functions of finite state Markov chains.
Annals of Mathematical Statistics, 37(6):1554–1563.
Cormen, T. H., Leiserson, C. H., Rivest, R. L., and
Stein, C. (1990). Introduction to Algorithms. MIT
Press/McGraw-Hill Book Company, Cambridge, MA.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).
Maximum likelihood from incomplete data via the
EM algorithm. Royal Statistical Society, 39(1):1–38.
Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Unsu-
pervised learning and clustering. In Pattern Classifi-
cation, pages 517–601. Wiley, New York, NY.
Flinn-Engdahl (2000). Flinn-Engdahl seismic and geo-
graphic regionalization scheme. http://earthquake.
usgs.gov/learn/topics/flinn
engdahl.php.
Fraley, C. and Raftery, A. E. (2002). Model-based
clustering, discriminant analysis and density estima-
tion. Journal of the American Statistical Association,
97(458):611–631.
Fraley, C. and Raftery, A. E. (2007). Bayesian regulariza-
tion for normal mixture estimation and model-based
clustering. Journal of Classification, 24(2):155–181.
GeoJSON (2007). Geojson format for encoding geographic
data structures. http://geojson.org/.
Hough, S. E. (2014). Earthquake intensity distribution:
A new view. Bulletin of Earthquake Engineering,
12(1):135–155.
Johnson, S. C. (1967). Hierarchical clustering schemes.
Psychometrika, 32(3):241–254.
Kaufman, L. and Rousseeuw, P. J., editors (1990). Finding
Groups in Data: An Introduction to Cluster Analysis.
Wiley, New York, NY.
Langfelder, P., Zhang, B., and Horvath, S. (2007). Defining
clusters from a hierarchical cluster tree: the dynamic
tree cut library for R. Bioinformatics, 24(5):719–720.
Manning, C. D., Raghavan, P., and Schutze, H. (2008). In-
troduction to Information Retrieval. Cambridge Uni-
versity Press, Cambridge, United Kingdom.
Manning, C. D. and Schutze, H., editors (2000). Founda-
tions of Statistical Natural Language Processing. MIT
Press, Cambridge, UK.
Mclachlan, G. J. and Basford, K. E. (1988). Mixture Mod-
els: Inference and Applications to Clustering. Marcel
Dekker, New York, NY.
Mclachlan, G. J. and Peel, D. (2000). Finite Mixture Mod-
els. John Wiley and Sons, New York, NY.
Ngatchou-Wandji, J. and Bulla, J. (2013). On choosing a
mixture model for clustering. Journal of Data Sci-
ence, 11(1):157–179.
R (1997). R project for statistical computing. http://
www.r-project.org/.
InferringGeo-spatialNeutralSimilarityfromEarthquakeDatausingMixtureandStateClusteringModels
15