substantial level of noise and frequent fluctuations in
crucial characteristics within the stock market render
stock prediction intricate and inefficient. Random
forests have the ability to conduct feature analysis,
which quantifies the significance of each input
feature. Utilizing Random Forest (RF) for feature
extraction can enhance the precision of stock price
forecasts (Ma, Han and Fu, 2019).
This work presents a novel strategy that integrates
statistical and machine learning method to overcome
the shortcomings of the previously mentioned
models, aiming to enhance the precision of predictive
analysis. A fusion model is created by training an
LSTM model using the outputs of ARIMA, GARCH,
and Random Forest models as features. ARIMA, a
conventional statistical model, is ideal for predicting
short-term outcomes, whereas machine learning
models like LSTM are better suited for analyzing
extensive datasets with nonlinear patterns. Therefore,
the objective of this thesis is to investigate whether
the results of the fusion of the two types of models
outperform the separate models for their respective
predictions. An LSTM model trained with the outputs
of ARIMA, GARCH and Random Forest models as
features will be used as the fusion model. To
investigate the merits and demerits of these methods
as well as the possibility of a hybrid approach to
predicting stock prices, this study will evaluate each
of these models individually and compare the
prediction results of the individual models with their
fusion model in order to explore the most accurate
stock prediction model.
The major contributions of this paper are as
follows:
1. Development of a novel fusion model
combining ARIMA, GARCH, LSTM, and Random
Forest to enhance stock price prediction accuracy.
2. An extensive evaluation and comparative
analysis of traditional statistical models, machine
learning models, and the proposed fusion model
based on historical stock market data.
3. Demonstration of the effectiveness of feature
engineering and integrated learning phases in
improving prediction performance.
4. Provision of a more reliable and precise tool
for stock market analysts and investors to make
informed decisions in volatile financial markets.
The manuscript is structured as follows: Section 2
provides a review of related work in the fields of
statistical modeling and machine learning methods
for stock price prediction. Section 3 details the
methodology, including the construction and
development of the ARIMA, GARCH, LSTM, and
Random Forest models, as well as the fusion model.
Section 4 discusses the experimental procedure, data
pre-processing, and evaluation metrics used in this
study. Section 5 presents the results and comparisons
of the models. Finally, Section 6 concludes the paper
and suggests directions for future research.
2. RELATED WORK
Statistical Modeling in Stock Price Prediction
Model Study
ARIMA modeling is widely regarded as an
exceptionally efficacious forecasting methodology
within the domain of stock forecasting. As its
predictions are derived from the values of the input
variables and the error term, ARIMA forecasting does
not necessitate the presupposition of any underlying
model or associated equations. However,
sophisticated nonlinear real-world problems may
introduce some bias into the ARIMA model due to
the fact that it is a linear regression model. However,
it is generally observed that linear models outperform
complex structural models when it comes to short-
term forecasting (Ma, 2020). A method for
forecasting the price of garlic was introduced by Yan
W. et al. (Wang et al., 2022). This method utilized a
combination of GARCH family models and LSTM.
By constructing a GARCH family model, they
acquired data on volatility characteristics, including
volatility aggregation, of garlic price series. The
LSTM network was employed to examine the
complex nonlinear interactions between sequences of
garlic prices and their intrinsic volatility, aiming to
forecast subsequent garlic price trends. Resulting
from the independence of the machine learning
models, the fusion model proves to be effective. The
study by Yan W. et al. and the anticipated stock price
prediction model in this study share some similarities
(Wang et al., 2022); this provides the inspiration for
the concept of model fusion in this article.
Research into machine learning methods for
forecasting stock prices
LSTM is crafted as a variant of recurrent neural
networks, especially skilled in handling and
predicting major events in time series data marked by
substantial intervals and periods. Jin, Z. et al. noticed
the advantages of the LSTM model in analyzing the
relationship between time series and adapted the
LSTM model by using the attention mechanism to
predict the closing price with greater precision (Jin et
al., 2020). Park H et al. introduced a new stock
prediction framework called LSTM-Forest, which
combines LSTM and Random Forest to address the