One-Step or Two-Step Optimization and the Overfitting Phenomenon - A Case Study on Time Series Classification

Muhammad Marwan Muhammad Fuad

2014

Abstract

For the last few decades, optimization has been developing at a fast rate. Bio-inspired optimization algorithms are metaheuristics inspired by nature. These algorithms have been applied to solve different problems in engineering, economics, and other domains. Bio-inspired algorithms have also been applied in different branches of information technology such as networking and software engineering. Time series data mining is a field of information technology that has its share of these applications too. In previous works we showed how bio-inspired algorithms such as the genetic algorithms and differential evolution can be used to find the locations of the breakpoints used in the symbolic aggregate approximation of time series representation, and in another work we showed how we can utilize the particle swarm optimization, one of the famous bio-inspired algorithms, to set weights to the different segments in the symbolic aggregate approximation representation. In this paper we present, in two different approaches, a new meta optimization process that produces optimal locations of the breakpoints in addition to optimal weights of the segments. The experiments of time series classification task that we conducted show an interesting example of how the overfitting phenomenon, a frequently encountered problem in data mining which happens when the model overfits the training set, can interfere in the optimization process and hide the superior performance of an optimization algorithm.

References

  1. Bramer, M., 2007. Principles of Data Mining, Undergraduate Topics in Computer Science, Springer.
  2. Chen, L., Ng, R., 2004. On the Marriage of Lp-Norm and Edit Distance, In Proceedings of 30th International Conference on Very Large Data Base, Toronto, Canada, August, 2004.
  3. Feoktistov, V., 2006. Differential Evolution: “In Search of Solutions” (Springer Optimization and Its Applications)”. Secaucus, NJ, USA: Springer- Verlag New York, Inc..
  4. Haupt, R.L., Haupt, S. E., 2004. Practical Genetic Algorithms with CD-ROM. Wiley-Interscience.
  5. Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra , 2000. Dimensionality reduction for fast similarity search in large time series databases. J. of Know. and Inform. Sys.
  6. Keogh, E., Zhu, Q., Hu, B., Hao. Y., Xi, X., Wei, L. & Ratanamahatana, C.A., 2011. The UCR Time Series Classification/Clustering Homepage: www.cs.ucr.edu/ eamonn/time_series_data/
  7. Larose, D., 2005. Discovering Knowledge in Data: An Introduction to Data Mining, Wiley, Hoboken, NJ.
  8. Last, M., Kandel, A., and Bunke, H., editors. 2004. Data Mining in Time Series Databases. World Scientific.
  9. Lin, J., Keogh, E., Lonardi, S., Chiu, B. Y., 2003. A symbolic representation of time series, with implications for streaming algorithms. DMKD 2003: 2-11.
  10. Mitchell, M., 1996. An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA.
  11. Muhammad Fuad, M.M., 2012a. ABC-SG: A New Artificial Bee Colony Algorithm-Based Distance of Sequential Data Using Sigma Grams. The Tenth Australasian Data Mining Conference - AusDM 2012, Sydney, Australia, 5-7 December, 2012. Published in the CRPIT Series-Volume 134, pp 85-92.
  12. Muhammad Fuad, M.M. , 2012b. Differential Evolution versus Genetic Algorithms: Towards Symbolic Aggregate Approximation of Non-normalized Time Series. Sixteenth International Database Engineering & Applications Symposium- IDEAS'12 , Prague, Czech Republic,8-10 August, 2012 . Published by BytePress/ACM.
  13. Muhammad Fuad, M.M. , 2012c. Genetic AlgorithmsBased Symbolic Aggregate Approximation. 14th International Conference on Data Warehousing and Knowledge Discovery - DaWaK 2012 - Vienna, Austria, September 3 - 7.
  14. Muhammad Fuad, M.M. , 2012d. Particle Swarm Optimization of Information-Content Weighting of Symbolic Aggregate Approximation. The 8th International Conference on Advanced Data Mining and Applications -ADMA2012, 15-18 December 2012, Nanjing, China . Published by Springer-Verlag in Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence, Volume 7713, pp 443-455.
  15. Yi, B. K., and Faloutsos, C., 2000. Fast time sequence indexing for arbitrary Lp norms. Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt.
Download


Paper Citation


in Harvard Style

Muhammad Fuad M. (2014). One-Step or Two-Step Optimization and the Overfitting Phenomenon - A Case Study on Time Series Classification . In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-015-4, pages 645-650. DOI: 10.5220/0004916706450650


in Bibtex Style

@conference{icaart14,
author={Muhammad Marwan Muhammad Fuad},
title={One-Step or Two-Step Optimization and the Overfitting Phenomenon - A Case Study on Time Series Classification},
booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2014},
pages={645-650},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004916706450650},
isbn={978-989-758-015-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - One-Step or Two-Step Optimization and the Overfitting Phenomenon - A Case Study on Time Series Classification
SN - 978-989-758-015-4
AU - Muhammad Fuad M.
PY - 2014
SP - 645
EP - 650
DO - 10.5220/0004916706450650