Prediction of the West Texas Intermediate Crude Oil Price Using

ARIMA Model and ARIMA-GARCH Model

Minzhao Li

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, 200082, China

Keywords: Prediction, WTI Crude, ARIMA Model, GARCH Model.

Abstract: Crude oil stands as a pivotal energy source and raw material indispensable for modern life and various

production activities. The fluctuations in its price are intricately linked to the seamless functioning of the

macroeconomy, the healthy development of the capital market, and the choices of individual investors.

However, under the influence of various external factors, international crude oil price changes are

characterized by complexity and are very difficult to predict. This study delves into the monthly fluctuations

of West Texas Intermediate (WTI) crude oil prices, employing the ARIMA and GARCH models to shed light

on its future trajectory. Methodically, it subjects the data to normality tests, stationarity tests, and white noise

tests, while leveraging the AIC information criterion and minimum MSE criterion to fine-tune the model.

Through rigorous analysis, the study establishes both ARIMA (1,1,0) and ARIMA(1,1,0)-GARCH (1,1)

models and both models forecast a modest uptick in oil prices over the next three months, spanning May,

June, and July of 2024. This article adds GARCH model and comparing with the traditional ARIMA model,

ARIMA-GARCH model takes conditional heteroskedasticity into account and can be more accurate and

comprehensive in prediction, which can provide certain reference and suggestion for various investors.

1 INTRODUCTION

In recent years, driven by economic globalization,

crude oil has asserted increasing dominance within

the commodity market. Given its unique

characteristics, the fluctuations and trajectories of

international oil prices have garnered significant

attention. These fluctuations reverberate across

various sectors, impacting government policies,

economic activities, portfolio management, risk

mitigation, and more. Bastianin et al. asserted that

economic policies and financial regulatory measures

aimed at alleviating the adverse consequences of

unforeseen oil price fluctuations should prioritize

addressing the underlying causes of such shocks

(Bastianin et al. 2016). Additionally, the impact of

WTI crude oil price on Shanghai crude oil futures

prices is asymmetric in both intensity and direction in

the short and long term, with a positive effect in the

short term and a negative effect in the long term

(Ding, 2024).

Especially since the advent of the 21st century,

with the advancement of global trade liberalization

and the deepening of world economic globalization,

the crude oil market has witnessed heightened

frequency of fluctuations, resulting in significant

volatility in crude oil prices. This market exhibits

perpetual volatility, marked by conspicuous non-

linearity and pronounced price swings. As a

commodity, crude oil's financial characteristics have

gained prominence, attributed to its pricing

mechanism and the proliferation of derivatives in

financial markets. The dual nature of crude oil renders

its price susceptible not only to market fundamentals

such as supply and demand dynamics and inventory

changes but also to various other influencing factors,

including fluctuations in the U.S. dollar, geopolitical

tensions, economic policies, speculative forces, and

more. However, there are many factors that affect

crude oil, and the inherent relationship between the

various influencing factors is complex. How to

determine the main influencing factors and long-term

influencing factors is a difficult problem.

Sari et al. found that long-term trends in oil prices

are significantly impacted by global risk perceptions

(Sari et al., 2011). Additionally, Le et al. posited that

increases in Covid-19 cases, uncertainties

surrounding U.S. economic policies, and anticipated

stock market volatility collectively contributed to the

decline in WTI crude oil prices in April 2020. Despite

110

Li, M.

Prediction of the West Texas Intermediate Crude Oil Price Using ARIMA Model and ARIMA-GARCH Model.

DOI: 10.5220/0012999200004601

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Innovations in Applied Mathematics, Physics and Astronomy (IAMPA 2024), pages 110-118

ISBN: 978-989-758-722-1

these challenges, losses in global stock markets

appeared to have been notably mitigated (Le et al.,

2021). Wang et al. argued that in extreme scenarios,

there exists a robust causal relationship between

investor sentiment and the crude oil futures market

(Wang et al., 2021).

Therefore, how to accurately predict WTI crude oil

prices has become a top priority, and many complex

and innovative models have been built to predict WTI

crude oil prices. Traditional econometric models play

a crucial role across diverse economic domains,

including the prediction of WTI crude oil prices.

Herrera employed RiskMetrics and GARCH models

for short-term forecasts, Exponential GARCH

(EGARCH) for medium-term horizons, and Markov-

switching GARCH (MS-GARCH) for long-term

predictions (Wang and Liu, 2016 & Herrera, Hu and

Pastor, 2018). Indeed, machine learning methods and

hybrid models have gained significant traction in the

realm of crude oil price prediction. A multitude of

scholars have conducted extensive research in this

area. Wu et al. leveraged a Convolutional Neural

Network (CNN) model to extract text features from

news media texts and Google Trends data, assessing

their efficacy in explaining crude oil price predictions

(Wu et al., 2021). Li et al. investigated the enduring

impacts of global crude oil production and economic

activities on crude oil prices. They devised a hybrid

model incorporating Genetic Algorithm Optimized

Support Vector Machine (GASVM) and Back

Propagation Neural Network (BPNN) to analyze

monthly oil price data for predictive purposes (Li,

Zhu and Wu, 2019). Wang fused a multi-layer

perceptron with a neural network to develop an

Elman Recurrent Neural Network (ERNN) model for

empirical crude oil price forecasting (Wang and

Wang, 2016).

Overall, these studies highlight the need for

accurate prediction of the global price of WTI crude.

The econometric model can predict short-term crude

oil prices more accurately, but the nonlinear, complex

and non-stationary characteristics of crude oil prices

make the model have certain flaws. The machine

learning model uses linear and nonlinear models to

enrich the experimental process and set up various

scenarios to improve the applicability of the model.

Then, more complex deep learning models and

numerous neural network algorithms were added to

forecasts, which can extract effective information and

focus on trends and changes in time series. This paper

will primarily concentrate on utilizing the ARIMA

and GARCH models for crude oil price prediction.

The aim is to offer a valuable reference for crude oil

futures investors, aiding them in making informed

decisions and conducting risk mitigation transactions.

By leveraging these models, investors can potentially

reduce their losses to a considerable extent.

2 METHODOLOGY

2.1 Data Source

Fred's global WTI crude oil price index is the source

of the data used in this investigation. This dataset

comprises monthly average prices of crude oil in U.S.

dollars. It is meticulously documented, with no

instances of missing values or outliers. For analysis,

the paper has selected price data spanning from

January 2000 to April 2024, amounting to 292

observations. The first 276 observations, covering the

period from January 2000 to December 2022,

constitute the training set, while the remaining 16

observations, spanning from January 2023 to April

2024, are designated as the test set. Ultimately, a

rolling forecast approach is adopted to predict the

WTI crude oil price for May, June, and July 2024.

The internationally recognized crude oil

benchmark prices are WTI and Bren crude oil prices.

This paper selects the price of WTI crude oil, which

occupies the leading position in terms of global

commodity futures trading volume because of its

advantages of transparent quotations and high

liquidity, as well as the status of U.S. super crude oil

buyers and the world influence of the New York

Stock Exchange. At the same time, this paper selected

prices rather than yields, average prices rather than

closing prices, and monthly data over the past decade

rather than all data.

2.2 Variable Selection

Crude oil prices are obviously volatile and cyclical.

Indeed, the prices of crude oil can experience

substantial fluctuations, with the potential for

significant rises or falls within relatively brief time

frames, and they can experience rising and falling

cycles over a span of several years or even ten years.

Changes in oil prices are often affected by

geopolitics, technological progress, and the

macroeconomic environment, as illustrated in Figure

Prediction of the West Texas Intermediate Crude Oil Price Using ARIMA Model and ARIMA-GARCH Model

111

Figure 1: Global WTI crude price.

From Figure 1, it can be concluded that WTI oil

prices remained slightly unchanged from 2000 to

2004, then gradually increased, reaching their peak in

the summer of 2008 due to geopolitical tensions, and

then rapidly collapsing due to the impact of the 2008

financial crisis. , then prices rose slowly from 2009 to

2014, then dropped rapidly due to excess production,

then gradually climbed higher from 2016 to 2020, and

then plummeted rapidly in 2020 because of the

COVID-19 pandemic's effects. It has risen rapidly in

the past three years and declined rapidly in the past

year. This article uses a univariate time series. The

selection of variables is shown in Table 1. Month is

used as the time series and price is used as the

variable.

Table 1: List of Variables.

Variable Type Meaning

Month Date Month

WTI Double WTI crude oil

rice

(

dollar

)

2.3 Model Selection

This article uses Autoregressive Integrated Moving

Average (ARIMA) and Generalized Autoregressive

Conditional Heteroskedasticity (GARCH) models to

process time series data of crude oil prices. ARIMA

consists of autoregression (AR), difference (I) and

moving average (MA). The AR part contains the

impact of observations from past periods on current

values. Part I transforms non-stationary time series

into stationary by removing trends and seasonal

characteristics through differencing. The MA part

takes into account the impact of past forecast errors

on the current value. The GARCH model is also

called the generalized ARCH model. It not only takes

into account the volatility aggregation phenomenon

caused by the heteroscedasticity of the sequence like

ARCH, but also takes into account several lag terms

of the variance to capture more heteroskedasticity

information.

3 RESULTS AND DISCUSSION

3.1 Data Processing

The data used to build the model need to be

stationary. First, it is necessary to determine whether

the original sequence data is stationary. By observing

the time series diagram in Figure 1, the ACF diagram

(Figure 2), and performing the ADF unit root test, we

can see that the model is not stationary. For this

purpose, it is necessary to stabilize by difference

processing or logarithmic transformation processing.

First, first-order difference processing is used. By

analysing the timing diagram (Figure 3), ACF

diagram, and PACF diagram (Figure 4) after first-

order difference, we can conclude that the series has

already been stationary and there is no need to

perform second-order difference or logarithmic

transformation. Besides, according to Table 2, the

ADF test shows that when the lag order is chosen to

be 1, the sequence is stationary in all three cases:

including time trends and intercept terms, only

intercept terms, and no definite trend (in the first case

the model has already been stationary and for

accuracy, the other two cases are also tested).

Besides, the lag order 1 is determined step by step

starting from 10 and the reason why the AIC criterion

is not adopted is to make the test more accurate,

because the AIC criterion cannot guarantee that serial

correlation will be eliminated.

Figure 2: The ACF plot of original sequence.

IAMPA 2024 - International Conference on Innovations in Applied Mathematics, Physics and Astronomy

112

Figure 3: First-order differential timing diagram.

Figure 4: First-order differential ACF plot and PACF plot.

Table 2: The ADF test.

Type Dickey-Fuller Lag order p-value

Intercept + time tren

-9.285 1 0.0708

Interce

t onl

-9.298 1 0.6143

No intercept and tren

-9.309 1 0.3368

3.2 Model Evaluation

The selection of p and q parameters can be

determined through the Autocorrelative Function

(ACF) plot and Partial Autocorrelative Function

(PACF) plot (Figure 4). The ACF plot is employed to

ascertain the coefficient (q) of the MA model, while

the PACF plot is utilized to determine the coefficient

(p) of the AR model. By observing the above figure,

it can be seen that both figures exceed the critical

value when lag equals 1 and 6. The models can be

initially identified as ARIMA (0,1,1), ARIMA

(0,1,6), ARIMA (1,1,0), ARIMA (6,1,0). The Akaike

Information Criterion (AIC) is an assessment

criterion that utilizes the notion of information

entropy. It acts as a benchmark for evaluating a

statistical model's complexity and the effectiveness of

model fitting. The degree of model fitting is better the

smaller the AIC. Firstly, fit the above four models

respectively, select the two models with smaller AIC,

ARIMA (1,1,0) and ARIMA (6,1,0) for further

analysis, and then conduct overfitting analysis. For

Prediction of the West Texas Intermediate Crude Oil Price Using ARIMA Model and ARIMA-GARCH Model

113

the two preliminary models, appropriately increase

the order of MA, q, from 0 to 1, and fit the two new

models. It is found that the AIC becomes larger, and

the model is considered to be insufficiently fitted, so

discard them. The AIC of each model are as shown in

Table 3.

Table 3: Model determination.

ARIMA Model AIC

(0,1,1) 1721.40

(

0,1,6

)

1717.89

(1,1,0) 1716.31

(

6,1,0

)

1716.48

Additional model

(

6,1,1

)

1718.21

(1,1,1) 1718.31

3.3 Residual Analysis

In order to test the quality of the fitted model, residual

analysis is needed. If the model identification is

correct, the characteristics of the residuals are similar

to those of the white noise sequence, and similar to

independent and identically distributed normal

random variables. First, check whether the residual

sequence contains a trend that is not explained by the

fitted model by observing the time series plot, then

check whether normality is satisfied through the QQ

plot and normal distribution test, and finally check

whether the normality is satisfied through the ACF

plot, PACF plot and Ljung-Box Test to check for

autocorrelation.

It can be seen from Figure 5 that the residual timing

diagram fluctuates up and down at 0. In both the ACF

plot and the PACF plot, the values exceed the critical

value when the lag order is 6, and also slightly exceed

it when the lag order is larger (negligible). It can be

seen from the QQ plot that there are a few abnormal

points that do not adhere to a normal distribution. The

p-value of the Shapiro-Wilk (SW) test in Table 4 is

small (0.000013) so residuals do not adhere to a

normal distribution. The Ljung-Box (LB) test values

in Table 5 are basically greater than 0.2 except one,

so it can be considered that there is no

autocorrelation.

Table 4: Shapiro-Wilk test.

Model W p-value

ARIMA

(

1,1,0

)

0.9711 0.000013

ARIMA(6,1,0) 0.9748 0.000051

Figure 6 makes it clear that the residual timing

diagram fluctuates up and down at 0. In both the ACF

plot and the PACF plot, the values slightly exceed the

critical value when the lag order is larger than 15

(negligible). It can be seen from the QQ plot that there

are a few abnormal points that do not obey the normal

distribution. Because of the modest p-value

(0.000051) of the SW test in Table 4, the residuals do

not follow a normal distribution. Table 5's LB test

results are unquestionably higher than 0.2, so it can

be definitely considered that there is no

autocorrelation.

Figure 5: ARIMA (1,1,0) residuals plot, ACF plot, PACF plot, QQ plot.

IAMPA 2024 - International Conference on Innovations in Applied Mathematics, Physics and Astronomy

114

Figure 6: ARIMA (6,1,0) residuals plot, ACF plot, PACF plot, QQ plot.

Figure 7: Forecasts from ARIMA (1,1,0).

Table 5: Ljung-Box test.

Model Q* df p-value

ARIMA(1,1,0) 10.160 6 0.071

11.737 12 0.384

18.955 18 0.331

27.399 24 0.239

ARIMA(6,1,0) 1.4920 12 0.960

10.019 18 0.614

19.929 24 0.337

3.4 Forecasting

The next process is to verify the model and make

predictions. This article uses rolling forecast and

sliding window width instead of multi-step forward

forecast. The purpose is to incorporate the latest data

points for more accurate forecasts. Because the

ARIMA model is only suitable for short-term

forecasts, the effect of multi-step forward forecast is

often not good. good. First, establish the above two

models for the training set, only perform one-step

forward prediction, then incorporate a new test set

data point (without deleting old data points), fit the

model again, continue prediction, and repeat the

above process 16 times. Then use mean square error

(MSE) and mean absolute error (MAE) to verify the

model fitting effect. A smaller value for both

coefficients indicates better performance of the model

on the test set. The MSE and MAE of ARIMA (1,1,0)

and ARIMA (6,1,0) are shown in Table 6:

Table 6: Model validation.

Model MSE MAE

ARIMA(1,1,0) 27.615 4.676

ARIMA

(

6,1,0

)

31.894 5.015

Prediction of the West Texas Intermediate Crude Oil Price Using ARIMA Model and ARIMA-GARCH Model

115

It can be seen that the MSE and MAE of ARIMA

(1,1,0) are smaller and the model fitting effect is

better, so it is used as the final model to predict WTI

crude price. The last step is to predict the latest

unknown data points. The crude oil price forecast for

the latest three months is shown in Figure 7 and Table

Table 7: Forecasts from ARIMA (1,1,0).

Time Point

forecast

L 80 H 80 L 95 H 95

2024

05 86.705 79.754 93.656 76.075 97.336

2024/06 87.173 75.525 98.821 69.360 104.987

2024/07 87.334 71.871 102.798 63.685 110.984

The blue plots show the forecasting value and the

grey area shows the model's prediction limit at

confidence levels of 80 and 95. From the findings

above, it is evident that the predicted oil price

experienced a slight increase in the next three months.

3.5 GARCH Model

It should be noted that a prerequisite of the ARIMA

model is that the data is conditionally homoscedastic.

However, for many financial time series data, such as

the WTI crude oil price analysed in this article, their

conditional variance is affected by the present and the

past, and has Conditional heteroskedasticity. This

kind of data has the characteristic of continuous peaks

and troughs: large fluctuations are often succeeded by

subsequent large fluctuations, while small

fluctuations tend to be followed by additional small

fluctuations. Therefore, it is necessary to use the

GARCH model to model the variance of the data.

To determine whether the model exhibits

heteroscedasticity, it is essential to observe the ACF

and PACF plots of the square of the sequence after

the difference and the square of the residual after

fitting the model. If there is a significant correlation,

it indicates the existence of conditional

heteroskedasticity. As shown in Figure 8 and Figure

9, the squared plot of the differential sequence

exceeds the critical value many times when the lag

order is 1 to 5. The residual squared sequence also

significantly exceeds the critical value when the lag

order is 2 in the two figures. It can be considered that

there is a conditional difference. variance. At the

same time, McLeod-Li test, one of the white noise

tests, is performed on the square sequence. According

to Figure 10, the point values are all lower than the p

value of 0.05, which is also considered to have

conditional heteroskedasticity.

Figure 8. The ACF plot and PACF plot of difference squared.

Figure 9: The ACF plot and PACF plot of ARIMA (1,1,0) residual squared.

IAMPA 2024 - International Conference on Innovations in Applied Mathematics, Physics and Astronomy

116

Figure 10: The ACF plot and PACF plot of ARIMA (1,1,0) residual squared.

Considering conditional heteroskedasticity, an

ARIMA-GARCH model is established to fit the data,

and the ARIMA (1,1,0) determined previously and

the GARCH (1, 1) most commonly used for financial

time series data are combined to fit the WTI crude oil

price data. Then, carry out residual analysis and

conduct Ljung-Box test on residual and residual

square respectively to test the fitting effect of ARIMA

model and GARCH model. As can be seen from

Table 8, the p values are large, both are greater than

0.1, and it can be considered that the fit is good.

Table 8: Ljung-Box test for ARIMA-GARCH model.

Object Q* df p-value

residual 2.305 1 0.129

2.307 2 0.128

2.846 5 0.471

Residual s

uare

0.074 1 0.785

2.475 5 0.511

4.093 9 0.573

Results of the predictions are displayed in Table

9. It is evident that throughout the next three months,

oil prices will continue to rise marginally.

Table 9: Forecasts from ARIMA (1,1,0)-GARCH (1,1).

Time Point forecast Sigma

2024

05 86.140 4.989

2024/06 86.299 5.067

2024/07 86.331 5.144

3.6 Critical Thinking

Although this article is very detailed and

comprehensive in modeling, and combines ARIMA

and GARCH, two very commonly used time series

models, to fit the data, it can be said that the

predictions under this large framework will be very

accurate. But in fact, no one can know which model

the data obeys, especially for financial time series

data, which is characterized by very high volatility

and uncertainty, and is extremely susceptible to

external interference. This article does not consider

derivative models of GARCH, such as Threshold

GARCH (TGARCH) or Asymmetric Power GARCH

(APARCH), etc. It does not take into account the

asymmetric effect and Taylor effect. At the same

time, in terms of machine learning, algorithms such

as CNN and SVM are not integrated into oil price

predictions, and in terms of external factors, there is

no way to take into account the interference of oil

price factors, such as oil supply and demand and

geopolitical risks.

4 CONCLUSION

This article forecasts monthly WTI crude oil prices

using the ARIMA and ARIMA-GARCH models, and

conducts tests such as stationarity test, normality test,

white noise test, and model fitting goodness. The

model is finally determined to be ARIMA (1,1,0) and

ARIMA(1,1,0)-GARCH(1,1) respectively predict

that oil prices will rise slightly and even slightly in the

next three months, that is, in May, June, and July

2024.

However, a variety of factors will affect the price

of crude oil amid the current global economic

downturn. Oil prices will become more difficult to

anticipate due to the production strategies of countries

that produce crude oil, geopolitical events, the

development of new energy technologies, and the

activities of futures markets and commodity

exchange-traded funds, such as exchange traded fund

(ETF) in the financial market. It should be noted that

the research in this article does not take into account

realistic and complex scenarios. For macro managers

Prediction of the West Texas Intermediate Crude Oil Price Using ARIMA Model and ARIMA-GARCH Model

117

who hope that the crude oil futures market will

operate effectively and achieve stable futures prices,

and speculators who hope to have the opportunity to

achieve excess profits, the model selected in this

article is relatively simple. It does not constitute

investment advice and can only be used as a reference

to a certain extent.

REFERENCES

Bastianin A, Conti F and Manera M 2016 The impacts of

oil price shocks on stock market volatility: Evidence

from the G7 countries. Energy Policy, 98 160-169.

Ding X 2024 Research on the asymmetric impact of

international crude oil prices on Shanghai crude oil

futures prices. China-Arab Science and Technology

Forum, 43-47.

Sari R, Soytas U and Hacihasanoglu E 2011 Do global risk

perceptions influence world oil prices? Energy

Economics, 33(3) 515-524.

Le T-H, Le A T and Le H-C 2021 The historic oil price

fluctuation during the Covid-19 pandemic: What are the

causes? Research in International Business and

Finance, 58.

Wang L, Ma F, Niu T J and Liang C 2021 The importance

of extreme shock: Examining the effect of investor

sentiment on the crude oil futures market. Energy

Economics, 99.

Wang Y D and Liu L 2016 Crude oil and world stock

markets: Volatility spillovers, dynamic correlations,

and hedging. Empirical Economics, 50(4) 1481-1509.

Herrera A M, Hu L and Pastor D 2018 Forecasting crude

oil price volatility. International Journal of Forecasting,

34(4) 622-635.

Wu B, Wang L, Lv S, et al. 2021 Effective crude oil price

forecasting using new text-based and big-data driven

model. Measurement, 168.

Li J, Zhu S and Wu Q 2019 Monthly crude oil spot price

forecasting using variational mod decomposition.

Energy Economics, 83 240-253.

Wang J and Wang J 2016 Forecasting energy market

indices with recurrent neural networks: Case study of

crude oil price fluctuations. Energy, 102 365-374.

IAMPA 2024 - International Conference on Innovations in Applied Mathematics, Physics and Astronomy

118