Authors:
Eric Paquet
1
;
Herna L. Viktor
2
and
Hongyu Guo
3
Affiliations:
1
National Research Council of Canada and University of Ottawa, Canada
;
2
University of Ottawa, Canada
;
3
National Research Council of Canada, Canada
Keyword(s):
Data pre-processing, Aggregation, Gaussian distribution, Lévy distribution.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Foundations of Knowledge Discovery in Databases
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Pre-Processing and Post-Processing for Data Mining
;
Structured Data Analysis and Statistical Methods
;
Symbolic Systems
Abstract:
Consider a scenario where one aims to learn models from data being characterized by very large fluctuations that are neither attributable to noise nor outliers. This may be the case, for instance, when examining supermarket ketchup sales, predicting earthquakes and when conducting financial data analysis. In such a situation, the standard central limit theorem does not apply, since the associated Gaussian distribution exponentially suppresses large fluctuations. In this paper, we argue that, in many cases, the incorrect assumption leads to misleading and incorrect data mining results. We illustrate this argument against synthetic data, and show
some results against stock market data.