TIME SERIES SEGMENTATION AS A DISCOVERY TOOL - A Case Study of the US and Japanese Financial Markets

Jian Cheng Wong, Gladys Hui Ting Lee, Yiting Zhang, Woei Shyr Yim, Robert Paulo Fornia, Danny Yuan Xu, Jun Liang Kok, Siew Ann Cheong


In this paper we explain how the dynamics of a complex system can be understood in terms of the lowdimensional manifolds (phases), described by slowly varying effective variables, it settles onto. We then explain how we can discover these phases by grouping the large number of microscopic time series or time series segments, based on their statistical similarities, into the a small number of time series classes, each representing a distinct phase. We describe a specific recursive scheme for time series segmentation based on the Jensen-Shannon divergence, and check its performance against artificial time series data. We then apply the method on the high-frequency time series data of various US and Japanese financial market indices, where we found that the time series segments can be very naturally grouped into four to six classes, corresponding roughly with economic growth, economic crisis, market correction, and market crash. From a single time series, we can estimate the lifetimes of these macroeconomic phases, and also identify potential triggers for each phase transition. From a cross section of time series, we can further estimate the transition times, and also arrive at an unbiased and detailed picture of how financial markets react to internal or external stimuli.


  1. Barranco-López, V., Luque-Escamilla, P., Martínez-Aroza, J., and Román-Roldán, R. (1995). Entropic textureedge detection for image segmentation. Electronic Letters, 31:867-869.
  2. Baur, D. G. and Lucey, B. M. (2009). Flights and contagion - an empirical analysis of stock-bond correlations. Journal of Financial Stability, 5(4):339-352.
  3. Bernaola-Galván, P., Ivanov, P. C., Amaral, L. A. N., and Stanley, H. E. (2001). Scale invariance in the nonstationarity of human heart rate. Physical Review Letters, 87:168105.
  4. Bernaola-Galván, P., Román-Roldán, R., and Oliver, J. L. (1996). Compositional segmentation and long-range fractal correlations in dna sequences. Physical Review E, 53(5):5181-5189.
  5. Bialonski, S. and Lehnertz, K. (2006). Identifying phase synchronization clusters in spatially extended dynamical systems. Physical Review E, 74:051909.
  6. Bivona, S., Bonanno, G., Burlon, R., Gurrera, D., and Leone, C. (2008). Taxonomy of correlations of wind velocity - an application to the sicilian area. Physica A, 387:5910-5915.
  7. Braun, J. V., Braun, R. K., and Müller, H.-G. (2000). Multiple changepoint fitting via quasilikelihood, with application to dna sequence segmentation. Biometrika, 87(2):301-314.
  8. Braun, J. V. and Müller, H.-G. (1998). Statistical methods for dna sequence segmentation. Statistical Science, 13(2):142-162.
  9. Carlstein, E. G., Müller, H.-G., and Siegmund, D. (1994). Change-Point Problems, volume 23 of Lecture NotesMonograph Series. Institute of Mathematical Statistics.
  10. Chen, J. and Gupta, A. K. (2000). Parametric Statistical Change Point Analysis. Birkhäuser.
  11. Cheong, S.-A., Stodghill, P., Schneider, D. J., Cartinhour, S. W., and Myers, C. R. (2009a). The context sensitivity problem in biological sequence segmentation. q-bio/0904.2668.
  12. Cheong, S.-A., Stodghill, P., Schneider, D. J., Cartinhour, S. W., and Myers, C. R. (2009b). Extending the recursive jensen-shannon segmentation of biological sequences. q-bio/0904.2466.
  13. Chung, F.-L., Fu, T.-C., Luk, R., and Ng, V. (2002). Evolutionary time series segmentation for stock data mining. In Proceedings of the IEEE International Conference on Data Mining 2002 (9-12 Dec 2002, Maebashi City, Japan), pages 83-90.
  14. Churchill, G. A. (1989). Stochastic models for heterogeneous dna sequences. Bulletin of Mathematical Biology, 51(1):79-94.
  15. Churchill, G. A. (1992). Hidden markov chains and the analysis of genome structure. Computers & Chemistry, 16(2):107-115.
  16. Connolly, R., Stivers, C., and Sun, L. (2005). Stock market uncertainty and the stock-bond return relation. Journal of Financial and Quantitative Analysis, 40(1):161-194.
  17. Crotty, J. (2009). Structural causes of the global financial crisis: a critical assessment of the 'new financial architecture'. Cambridge Journal of Economics, 33(4):563-580.
  18. Dincer, I. (2000). Renewable energy and sustainable development: a crucial review. Renewable and Sustainable Energy Reviews, 4(2):157-175.
  19. Fellman, P. V. (2008). The complexity of terrorist networks. In Proceedings of the 12th International Conference on Information Visualization (Jul 9-11, 2008).
  20. Garnaut, R. (2008). The Garnaut Climate Change Review. Cambridge University Press.
  21. Giorgi, F. and Mearns, L. O. (1991). Approaches to the simulation of regional climate change: A review. Reviews of Geophysics, 29(2):191-216.
  22. Goldfeld, S. M. and Quandt, R. E. (1973). A markov model for switching regressions. Journal of Econometrics, 1:3-16.
  23. Gross, R., Leach, M., and Bauen, A. (2003). Progress in renewable energy. Environment International, 29(1):105-122.
  24. Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57:357-384.
  25. Heimo, T., Kaski, K., and Saramäki, J. (2009). Maximal spanning trees, asset graphs and random matrix denoising in the analysis of dynamics of financial networks. Physica A, 388:145-156.
  26. Jain, A., Murty, M., and Flynn, P. (1999). Data clustering: A review. ACM Computing Surveys, 31(3):264-323.
  27. Jiang, J., Zhang, Z., and Wang, H. (2007). A new segmentation algorithm to stock time series based on pip approach. In Proceedings of the Third IEEE International Conference on Wireless Communications, Networking and Mobile Computing 2007 (21-25 Sep 2007, Shanghai, China), pages 5609-5612.
  28. Kruskal, J. B. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7:48-50.
  29. Lai, S. K., Lin, Y. T., Hsu, P. J., and Cheong, S. A. (2011). Dynamical study of metallic clusters using the statistical method of time series clustering. Computer Physics Communications, 182:1013-1026.
  30. Leach, M., Scoones, I., and Stirling, A. (2010). Governing epidemics in an age of complexity: Narratives, politics and pathways to sustainability. Global Environmental Change, 20(3):369-377.
  31. Lee, G. H. T., Zhang, Y., Wong, J. C., Prusty, M., and Cheong, S. A. (2009). Causal links in us economic sectors. arXiv:0911.4763.
  32. Lee, U. and Kim, S. (2006). Classification of epilepsy types through global network analysis of scalp electroencephalograms. Physical Review E, 73:041920.
  33. Lemire, D. (2006). Overfitting and time series segmentation: A locally adaptive solution. arXiv:cs/0605103.
  34. Li, W. (2001a). Dna segmentation as a model selection process. In Proceedings of the International Conference on Research in Computational Molecular Biology (RECOMB), pages 204-210.
  35. Li, W. (2001b). New stopping criteria for segmenting dna sequences. Physical Review Letters, 86(25):5815- 5818.
  36. Lin, J. (1991). Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145-151.
  37. Mantegna, R. N. (1999). Hierarchical structure in financial markets. The European Physical Journal B, 11:193- 197.
  38. Monar, J. (2007). The eu's approach post-september 11: global terrorism as a multidimensional law enforcement challenge. Cambridge Review of International Affairs, 20(2):267-283.
  39. Morens, D. M., Folkers, G. K., and Fauci, A. S. (2004). The challenge of emerging and re-emerging infectious diseases. Nature, 430:242-249.
  40. Oliver, J. J., Baxter, R. A., and Wallace, C. S. (1998). Minimum message length segmentation, volume 1394 of Lecture Notes in Computer Science, pages 222-233. Springer.
  41. Onnela, J.-P., Chakraborti, A., Kaski, K., and Kertész, J. (2003a). Dynamic asset trees and black monday. Physica A, 324:247-252.
  42. Onnela, J.-P., Chakraborti, A., Kaski, K., Kertész, J., and Kanto, A. (2003b). Asset trees and asset graphs in financial markets. Physica Scripta, T106:48-54.
  43. Onnela, J.-P., Chakraborti, A., Kaski, K., Kertész, J., and Kanto, A. (2003c). Dynamics of market correlations: Taxonomy and portfolio analysis. Physical Review E, 68(5):056110.
  44. Prim, R. C. (1957). Shortest connection networks and some generalizations. The Bell System Technical Journal, 36:1389-1401.
  45. Ramensky, V. E., Makeev, V. J., Roytberg, M. A., and Tumanyan, V. G. (2000). Dna segmentation through the bayesian approach. Journal of Computational Biology, 7(1-2):215-231.
  46. Román-Roldán, R., Bernaola-Galván, P., and Oliver, J. L. (1998). Sequence compositional complexity of dna through an entropic segmentation method. Physical Review Letters, 80(6):1344-1347.
  47. Santhanam, M. S. and Patra, P. K. (2001). Statistics of atmospheric correlations. Physical Review E, 64:016102.
  48. Taylor, J. B. (2009). The financial crisis and the policy responses: An empirical analysis of what went wrong. NBER Working Paper No. 14631.
  49. Tóth, B., Lillo, F., and Farmer, J. D. (2010). Segmentation algorithm for non-stationary compound poisson processes. The European Physical Journal B - Condensed Matter and Complex Systems, 78(2):235-243.
  50. Vaglica, G., Lillo, F., Moro, E., and Mantegna, R. N. (2008). Scaling laws of strategic behavior and size heterogeneity in agent dynamics. Physical Review E, 77(3):036110.
  51. Wang, Y., Leung, L. R., McGregor, J. L., Lee, D.-K., Wang, W.-C., Ding, Y., and Kimura, F. (2004). Regional climate modeling: Progress, challenges, and prospects. Journal of the Meteorological Society of Japan, 82(6):1599-1628.
  52. Wong, J. C., Lian, H., and Cheong, S. A. (2009). Detecting macroeconomic phases in the dow jones industrial average time series. Physica A, 388(21):4635-4645.
  53. Zhang, Y., Lee, G. H. T., Wong, J. C., Kok, J. L., Prusty, M., and Cheong, S. A. (2011). Will the us economy recover in 2010? a minimal spanning tree study. Physica A, 390(11):2020-2050.

Paper Citation

in Harvard Style

Cheng Wong J., Hui Ting Lee G., Zhang Y., Shyr Yim W., Paulo Fornia R., Yuan Xu D., Liang Kok J. and Ann Cheong S. (2011). TIME SERIES SEGMENTATION AS A DISCOVERY TOOL - A Case Study of the US and Japanese Financial Markets . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 52-63. DOI: 10.5220/0003653700520063

in Bibtex Style

author={Jian Cheng Wong and Gladys Hui Ting Lee and Yiting Zhang and Woei Shyr Yim and Robert Paulo Fornia and Danny Yuan Xu and Jun Liang Kok and Siew Ann Cheong},
title={TIME SERIES SEGMENTATION AS A DISCOVERY TOOL - A Case Study of the US and Japanese Financial Markets},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},

in EndNote Style

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - TIME SERIES SEGMENTATION AS A DISCOVERY TOOL - A Case Study of the US and Japanese Financial Markets
SN - 978-989-8425-79-9
AU - Cheng Wong J.
AU - Hui Ting Lee G.
AU - Zhang Y.
AU - Shyr Yim W.
AU - Paulo Fornia R.
AU - Yuan Xu D.
AU - Liang Kok J.
AU - Ann Cheong S.
PY - 2011
SP - 52
EP - 63
DO - 10.5220/0003653700520063