to estimate the model. Further, AR model and its
variant versions have been extensively used in field of
economics and finance (Ye 2017, Santosa 2022, Tash
2011). For example, Ye proposed an ARIMA-SVR
stock prediction model based on wavelet analysis,
which improved the forecasting accuracy but did not
overcome the influence of singularities in the time
series (Ye 2017). ARIMA model was employed to
predict the prices of 45 stocks with different
characteristics, and the stock sequence suitable for the
model was procured by classification (Santosa 2022).
Tash and Modarres applied the AR/GARCH model to
Tehran stocks, and the prediction results show ed that
their method could improve the prediction accuracy
(Tash 2011).
However, the method based on autoregressive
model still has two deficiencies, which may lead to
prediction bias in the model.
(1) The mean square error criterion used in the
autoregressive model will lead to errors in prediction.
Specifically, if some data points of random variables
are far away from each other in the coordinate system
of the same name, the error will expand in the form of
square, which makes a huge gap between the two
random variables.
(2) The autoregressive model uses the same order
to predict different fluctuations, which will lead to
prediction errors. Specifically, because of the
complexity of the stock price curve, a regression
model is used to predict the change of the price curve
of all stocks, resulting in low prediction accuracy.
In order to accurately predict the stock price trend,
this paper proposes a stock price trend prediction
method based on the maximum correlation entropy
autoregressive model. Specifically, firstly, the stock
price curve is segmented and correlational entropy is
used as the similarity measure to cluster the price
curve segments. Then, for each class of clustered data,
a regression model is constructed using the maximum
corentropy criterion as the constraint function, which
is used to predict the change trend of the stock price
curve. In summary, this paper mainly does the
following four aspects :(1) based on the maximum
correlationentropy criterion, a new regression
prediction model is constructed. Traditional
autoregressive models are sensitive to singularities
because of the minimum mean square error criterion.
In this paper, the maximum corentropy criterion is
used as the constraint function, and the Gaussian
corentropy is used to limit the infinite expansion of
the error, which effectively weakens the influence of
the singularity on the curve similarity measurement.
(2) Based on the clustering strategy, a well-targeted
regression prediction model is constructed for each
type of price curve. The prediction accuracy of
regression model is greatly affected by model order.
Because of the complexity of using the stock price
curve, using one regression model to predict the
change of the price curve of all stocks leads to low
prediction accuracy. Using the clustering strategy, the
price curves with similar change trends are grouped
into a group, and a regression model is constructed
for price prediction, which can effectively improve
the accuracy of prediction. (3) Based on correlational
entropy, a new similarity measure of price curve is
proposed. The existing clustering methods are
generally based on Euclidean distance and the
clustering results are particularly sensitive to the
singularity of the stock price curve. Correlational
entropy is used to measure the similarity of any two
curves. Essentially, two curves are taken as random
variables to measure the similarity based on the
difference of their probability distribution, which can
better overcome the influence of singularities. (4)
Based on the open set identification, the singularity
problem in the clustering process is optimized. In this
paper, the open set recognition strategy is adopted to
add boundary constraints to the clustering results,
which makes the clustering results more accurate and
can better deal with the problem of singular point
classification in the clustering process. In order to
visually illustrate the advantages of open set
recognition, this paper clustering the data containing
singularities. As shown in Figure 1, Figure 1 (b)
represents the original data, where sample points
①~④ represent the data to be classified. Figure 1
(a) shows the result obtained by using closed sets,
and symbols "⊕" and "
○
,— " represent different
categories. It is obvious that singularities ② and ④
do not fit into any category. If open set identification
is adopted, the result is shown in FIG. 1 (c). It can be
seen that singularities ② and ④ are outside the
boundary constraints and can eliminate this problem.
2 RELATED METHODS
This section mainly introduces the methods related to
this paper, including similarity measures and
autoregressive models. This paper uses uppercase
letters (e.g. X,Y ) to represent time series data,
lowercase letters with subscripts (e.g. x
,y
) to
represent individual data, uppercase bold letters
(e.g.𝑿,𝒀)to represent matrices, and superscript letter
d to represent distances (e.g.d
, Euclidean distances).