# Inferring Geo-spatial Neutral Similarity from Earthquake Data using Mixture and State Clustering Models

### Avi Bleiweiss

#### Abstract

Traditionally, earthquake events are identified by prescribed and well formed geographical region boundaries. However, fixed regional schemes are subject to overlook seismic patterns typified by cross boundary relations that deem essential to seismological research. Rather, we investigate a statistically driven system that clusters earthquake bound places by similarity in seismic feature space, and is impartial to geo-spatial proximity constraints. To facilitate our study, we acquired hundreds of thousands recordings of earthquake episodes that span an extended time period of forty years, and split them into groups singled out by their corresponding geographical places. From each collection of place affiliated event data, we have extracted objective seismic features expressed in both a compact term frequency of scales format, and as a discrete signal representation that captures magnitude samples in regular time intervals. The distribution and temporal typed feature vectors are further applied towards our mixture model and Markov chain frameworks, respectively, to conduct clustering of shake affected locations. We performed extensive cluster analysis and classification experiments, and report robust results that support the intuition of geo-spatial neutral similarity.

#### References

- Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In International Symposium on Information Theory, pages 267-281, Budapest, Hungary.
- Baeza-Yates, R. and Ribeiro-Neto, B., editors (1999). Modern Information Retrieval. ACM Press Series/Addison Wesley, Essex, UK.
- Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Symposium on Inequalities, pages 1-8, Los Angeles, CA.
- Baum, L. E. and Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. Annals of Mathematical Statistics, 37(6):1554-1563.
- Cormen, T. H., Leiserson, C. H., Rivest, R. L., and Stein, C. (1990). Introduction to Algorithms. MIT Press/McGraw-Hill Book Company, Cambridge, MA.
- Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Royal Statistical Society, 39(1):1-38.
- Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Unsupervised learning and clustering. In Pattern Classification, pages 517-601. Wiley, New York, NY.
- Flinn-Engdahl (2000). Flinn-Engdahl seismic and geographic regionalization scheme. http://earthquake. usgs.gov/learn/topics/flinn engdahl.php.
- Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association, 97(458):611-631.
- Fraley, C. and Raftery, A. E. (2007). Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification, 24(2):155-181.
- GeoJSON (2007). Geojson format for encoding geographic data structures. http://geojson.org/.
- Hough, S. E. (2014). Earthquake intensity distribution: A new view. Bulletin of Earthquake Engineering, 12(1):135-155.
- Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3):241-254.
- Kaufman, L. and Rousseeuw, P. J., editors (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York, NY.
- Langfelder, P., Zhang, B., and Horvath, S. (2007). Defining clusters from a hierarchical cluster tree: the dynamic tree cut library for R. Bioinformatics, 24(5):719-720.
- Manning, C. D., Raghavan, P., and Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, Cambridge, United Kingdom.
- Manning, C. D. and Schutze, H., editors (2000). Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, UK.
- Mclachlan, G. J. and Basford, K. E. (1988). Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York, NY.
- Mclachlan, G. J. and Peel, D. (2000). Finite Mixture Models. John Wiley and Sons, New York, NY.
- Ngatchou-Wandji, J. and Bulla, J. (2013). On choosing a mixture model for clustering. Journal of Data Science, 11(1):157-179.
- R (1997). R project for statistical computing. www.r-project.org/.
- Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of IEEE, 77(2):257-286.
- Rajaraman, R. and Ullman, J. D. (2011). Mining of Massive Datasets. Cambridge University Press, New York, NY.
- Salton, G., Wong, A., and Yang, C. S. (1975). A Vector Space Model for Automatic Indexing. Communications of the ACM, 18(11):613-620.
- Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2):461-464.
- Theodoridis, Y. (2001). SEISMO-SURFER: A prototype for collecting, querying, and mining seismic data. In Advances in Informatics, pages 159-171, Nicosia, Cyprus.
- USGS (2004). Real time feeds and notifications. http:// earthquake.usgs.gov/earthquakes/feed/v1.0/.
- Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. American Statistical Association, 58(301):236-244.
- Young, J. B., Presgrave, B. W., Aichele, H., Wiens, D. A., and Flinn, E. A. (1995). The Flinn-Engdahl regionalization scheme: the 1995 revision. Physics of the Earth and Planetary Interiors, 96(4):223-297.

#### Paper Citation

#### in Harvard Style

Bleiweiss A. (2015). **Inferring Geo-spatial Neutral Similarity from Earthquake Data using Mixture and State Clustering Models** . In *Proceedings of the 1st International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM,* ISBN 978-989-758-099-4, pages 5-16. DOI: 10.5220/0005347500050016

#### in Bibtex Style

@conference{gistam15,

author={Avi Bleiweiss},

title={Inferring Geo-spatial Neutral Similarity from Earthquake Data using Mixture and State Clustering Models},

booktitle={Proceedings of the 1st International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM,},

year={2015},

pages={5-16},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0005347500050016},

isbn={978-989-758-099-4},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Geographical Information Systems Theory, Applications and Management - Volume 1: GISTAM,

TI - Inferring Geo-spatial Neutral Similarity from Earthquake Data using Mixture and State Clustering Models

SN - 978-989-758-099-4

AU - Bleiweiss A.

PY - 2015

SP - 5

EP - 16

DO - 10.5220/0005347500050016