Research on Sales Forecast of Fresh Food Industry Based on
ARIMA: Transformer Model
Xiaoli Zhang
1
, Huailiang Zhang
2
, Yanyu Gong
1
, Xue Zhang
1
and Haifeng Wang
1*
1
Linyi University, Linyi, China
2
Xinfa Group, Liaocheng, China
Keywords: Transformer, Time Series, Sales Forecast, Fresh Food.
Abstract: In addition to its high perishability, fresh food also has a strong timeliness. In order to reduce costs and
improve efficiency, it is necessary for enterprises to accurately predict the sales volume of fresh food. This
paper examines how order planning and production output are out of balance in the sales process of fresh food
industries, and presents a time series high-frequency trading big data forecasting model based on the ARIMA-
Transformer combined forecasting model, along with a quantitative analysis of the MAPE and RMSPE
evaluation indexes. Based on the experimental results, the MAPE of the ARIMA-Transformer forecasting
model is 0.171 percent lower than the MAPE of the LSTM, ARIMA, and Transformer models, and the
RMSPE is 0.306 percent lower than that of the LSTM model, proving its rationality and superiority in
predicting fresh food sales volumes.
1 INTRODUCTION
Nowadays, fresh food is produced and sold in a non-
standardized manner. Perishability and timeliness are
important characteristics, and the sale of fresh food is
closely related to timeliness. Using the high-
frequency trading data, a sales forecasting model is
developed using machine learning theory to predict
the sales of various fresh foods based on the changing
law of sales volume in the fresh food industry.
Dynamic scheduling of production plans can be
achieved based on the dynamic distribution of order
quantities by sales portrait, enabling enterprises to
develop logistics distribution and sales strategies,
optimize resource allocation, reduce costs and
increase productivity.
2 RELATED WORK
The prediction accuracy of traditional models is
difficult to meet the needs of major industries.
According to the characteristics of fresh vegetables,
Lu Wang (Lu Wang, 2021) proposed to improve the
support vector machine model by combining the
fuzzy information granulation method and the
optimized particle swarm optimization algorithm, but
considering the limited factors affecting the sales
volume, it could not be effectively solved when
dealing with the uncertain problem. To improve the
accuracy of retail sales forecasting, Huo Jiazhen (Huo
Jiazhen, 2023) and others developed a model based
on Ensemble Empirical Mode Decomposition
(EEMD), Holt-Winters, and Gradient Lifting Tree
(GBDT). Experimental results indicate that the model
has good predictive performance for multi-step
predictions. However, the model needs a lot of data
for training, so it cannot be applied to applications
with small data sample size. Xu Yingzhuo (Xu
Yingzhuo, 2023) and others established a game sales
forecasting model based on the gradient boosting
decision tree (GBDT) algorithm. The experimental
results show that this model has higher goodness of
fit than other forecasting models. However, the model
does not consider the influence of external factors on
sales volume, and the application scenario is
relatively simple.
3 RESEARCH CONTENT
An ARIMA-Transformer model based on time series
data is presented in this paper. There are two main
parts to the model: ARIMA and Transformer. By
combining ARIMA model predictions with
Transformer model predictions, further predictions
Zhang, X., Zhang, H., Gong, Y., Zhang, X. and Wang, H.
Research on Sales Forecast of Fresh Food Industry Based on ARIMA-Transformer Model.
DOI: 10.5220/0012284800003807
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 2nd International Seminar on Artificial Intelligence, Networking and Information Technology (ANIT 2023), pages 397-400
ISBN: 978-989-758-677-4
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
397
can be made. As the research object of this
experiment, sales data of livestock products in the
slaughter industry are examined. By modeling and
forecasting time series with ARIMA, a data set is
obtained that is recorded as D
b
. This model
transforms data set Da and data set D
b
into data sets
in a <Source, Target> format, which is used as input
for the Transformer model to model and predict, as
well as to calculate the MAPE and RMSPE
evaluation indexes.
4 SYSTEM MODEL
LSTM is chosen as an important benchmark model in
this experiment, but its principle is not discussed in
detail.
Figure 1. ARIMA-Transformer Model Structure Diagram.
An ARIMA-Transformer combined forecasting
model is proposed in this paper to further improve the
accuracy of fresh food sales forecasts. According to
Fig. 1, the structure of the overall model is divided
into three parts: ARIMA, Time-Series-Transformer,
and Time-Series-Transformer-ARIMA. ARIMA and
the improved Transformer model are combined in this
model.
4.1 Data Preprocessing
Figure 2. Data preprocessing step diagram.
According to Fig. 2, the collected sales order data are
classified into sales orders according to meat names,
and the customer's order data are randomly selected
as the experimental data set, which mainly includes
two columns: Date and Value. In the experiment, the
data set is divided into training and test sets in a ratio
of 8:2 chronologically.
4.2 Based on ARIMA Sales Forecasting
Model Design
In the ARIMA(p,d,q) model,
stands for
difference operation and stands for the number of
differences required when transforming time series
into stationary series:



 
 (1)
In the model 
, the number of autoregressive
terms is p, and 
is the autoregressive model.
The formula includes
as the current value,
as
the error term, as a constant, and
as the
autocorrelation coefficient. In particular, the formula
is as follows:
 


 
(2)
MA(q) is a moving average model in which
stands for white noise, is a constant, and
is a
coefficient of autocorrelation. In particular, the
formula is as follows:
 


 
(3)
4.3 Based on Transformer Time Series
Sales Forecasting Model Design
The decoder is modified to make it possible to predict
time series data using the traditional Transformer
model. Compared with the traditional Decoder part,
the <Source, Target> sequence of sales volume is
generated based on the sliding window, and most of
the data for Target is derived from the Source, so it is
not necessary to add attention mechanisms on the
Target side. Therefore, the Decoder part keeps only
the full connection layer of the connection system.
Fig. 3 shows the specific structure.
Figure 3. Time-Series-Transformer Model.
In this model, the time series data set is changed
to the form of <Source, Target> in the form of sliding
window, where the sliding window period is 15, i.e.
input_window=15 and output_window=1, so the past
15 days' sales data are used to predict the next day's
sales data.
ANIT 2023 - The International Seminar on Artificial Intelligence, Networking and Information Technology
398
4.4 Based on ARIMA and Transformer
Combination Forecasting Model
Design
As shown in Fig. 1, the real data set Da is taken as the
input to the Time-Series-Transformer model, and the
prediction result D
b
from the ARIMA model is taken
as the label value. Based on the Time-Series-
Transformer model and ARIMA, the data set Da is
used as input, along with the other parameters that are
consistent with those in Section 3.3.
5 EXPERIMENT AND RESULT
ANALYSIS
5.1 ARIMA Model Construction
The ARIMA model is one of the most commonly used
time series prediction models. Based on the premise
that the data should be stable, it is necessary to make
one or more differential treatments on the unstable
data, which depends on the value of parameter d in
ARIMA(p,d,q). Most of the time series data are
unstable, so it is necessary to make one or more
differential treatments on the unstable data.
Figure 4. Timing diagram of livestock product sales.
Based on timing Fig. 4, the overall sales volume
of the product is stable and wireless. In order to
confirm whether the data set is stationary, we do ADF
test, and the results show that the p values of the
original data and the first-order difference are close to
0, which meets the stationarity condition. In order to
compare the prediction results with other models, we
make a first-order difference between the data sets,
that is, d=1.
Table 1. ADF test results.
Origin Value
First Difference value
Test Statistic Value
-5.865557
-9.066090
p-value
3.332044x10-7
4.428753x10-15
Number of Observations Used
408
398
Critical Value(1%)
-3.446479
-3.446887
Critical Value(5%)
-2.868650
-2.868829
Critical Value(10%)
-2.570557
-2.570653
In order to determine the values of parameters p
and q in ARMA (p, d, q), the Bayesian Information
Criterion (BIC) is used as the standard. According to
Fig. 5, the square with the minimum BIC is in the
square of AR
0
and MA
1
, i.e., the parameters p=0 and
q=1, so ARIMA(0,1,1) is used to model the dataset.
Figure 5. BIC thermal diagram.
Fig. 6 illustrates the prediction result of
ARIMA(1,1,1) on the complete dataset. According to
the figure, the prediction result obtained using the
ARIMA model is very close to the real data, with
MAPE of 1.815 and RMSPE of 3.301.
Figure 6. ARIMA model prediction result diagram.
5.2 Construction of Combined
Forecasting Model Based on
ARIMA and Transformer
In Section 3.3, the ARIMA-Transformer model is
discussed. Based on the prediction results shown in
Fig. 7, MAPE is 1.644, and RMSPE is 2.995.
Research on Sales Forecast of Fresh Food Industry Based on ARIMA-Transformer Model
399
Figure 7. ARIMA-Transformer Model Prediction Result
Diagram.
5.3 Performance Evaluation Indicators
We evaluate the prediction results using Mean
Absolute Percentage Error (MAPE) and Root Mean
Square Percentage Error (RMSPE). A detailed
calculation formula can be found below: (where y
i
is
the sample's real value at time i, y is its predicted
value at the current time, x_min is its minimum value,
x_max is its maximum value, and m is its length).
1 Mean absolute percentage error (MAPE)







 


(4)
2 Root mean square percentage error (RMSPE)





 

(5)
In Table 2, we compare the forecast results of the
fresh food industry using the ARIMA-Transformer
model with other models. In the ARIMA-Transformer
model, the error between the predicted value and the
real value is the smallest, with a reduction in MAPE
by 0.17051 and 0.30604 respectively, and a relatively
high overall performance.
Table 2. Model Evaluation Indicators
MODEL
MAPE
RMSPE
LSTM
4.87734
8.12778
ARIMA
1.81493
3.30136
Time-Series-Transformer
4.57646
4.57646
ARIMA-Transformer
1.64442
2.99532
6 CONCLUSION
It presents a ARIMA-Transformer forecasting model
for time series high-frequency trading big data,
addressing the imbalance between order planning and
production output in the fresh food industry's sales
process. Experiments show that the prediction results
of this model are more accurate than other models,
thus helping enterprises to better optimize supply
chain management and adjust production.
ACKNOWLEDGEMENTS
This project is supported by Shan dong Province
Science and Technology Small and Medium
Enterprises Innovation Ability Enhancement Project
of China (No. 2023TSGC0449)
REFERENCES
Shi Jiannan, Zou Junzhong, Zhang Jian, et al. Research on
stock price time series prediction based on DMD-LSTM
model (J). Computer Application Research, 2020,
37(3):5.
Lu Wang. Study on the forecast of the sales trend of fresh
vegetables based on improved SVM (D). Anhui
Agricultural University, 2021.
DOI:10.26919/d.cnki.gannu.2021.000101.
Huo Jiazhen, Xu Jun, Chen Mingzhou. Multi-step forecast
of retail sales based on EEMD-Holt-Winters-GBDT
model (J/OL). Industrial Engineering and Management:
1-14 (June 30, 2023). http://kns.cnki.net/kcms/detail/.
Xu Yingzhuo, Guo Bo, Wang Liupeng. Research on game
sales forecasting model based on GBDT algorithm (J).
Intelligent Computer and Application, 2023,
13(01):182-185.
Mostafa M,Zahra A,Poneh Z, et al. Time series analysis of
cutaneous leishmaniasis incidence in Shahroud based on
ARIMA model(J). BMC Public Health, 2023, 23(1).
Atul S, Kumar P J. A multi-model forecasting approach for
solid waste generation by integrating demographic and
socioeconomic factors: a case study of Prayagra j, India
(J). Environmental monitoring and assessment, 2023,
195(6).
Muriithi B M,Samuel W. Time Series Analysis and
Forecasting of Household Products Prices (A Case
Study of Nyeri County)(J). Mathematical Modelling and
Applications, 2023, 7(2).
Yuhong J,Lei H,Yushu C. A Time Series Transformer based
method for the rotating machinery fault diagnosis (J).
Neurocomputing, 2022, 494.
Shengchun P,Xian Y,Qianqian L, et al. Time series
prediction of shallow water sound speed profile in the
presence of internal solitary wave trains(J). Ocean
Engineering, 2023, 283.
Liyanage D R. Inflation Forecasting Using Automatic
ARIMA Model in Sri Lanka (J). International Journal
of Economic Behavior and Organization, 2023, 11(2).
ANIT 2023 - The International Seminar on Artificial Intelligence, Networking and Information Technology
400