forget gates. Sigmoid activations with compression
values between 0 and 1 can be observed in gates.
Since each of the numbers multiplied by 0 equals 0
and each and every number multiplied by 1
corresponds to the same value, the sigmoid activation
feature enables the network to determine which data
is crucial for storage and which data is of no
significance to throw away.
Three distinct gates in the LSTM are used to
control the information flow in the LSTM cells. The
first one is the input gate, which, as suggested by its
name, is in charge of taking input data. The input gate
will determine whether the input data should be added
to the cell state based on the sigmoid activation
function in accordance with the foregoing. The forget
gate is also in the position of selecting which data to
throw away. To choose which pieces of information to
keep track of, it also utilizes a sigmoid activation
function. The output gate is the final one. The LSTM
network's output is generated by the output gate via
the cell state. It creates an information vector that
depicts the current state of the system using a Tanh
activation process. In general, an LSTM network
evaluates new information, processes it, and then
stores it in the cell state. After passing through the
forget gate, where certain data could be dropped, the
cell state is then transferred via the output gate to
create the desired output. The network is able to learn
long-term dependencies courtesy of the loop's ability
to keep track of its internal state over time.
LSTM is suitable for dealing with data that is
associated with each other between multiple variables;
that is, the data set has a significant correlation in time
series changes. The CO2 emission prediction in this
experiment is based on the CO2 emissions in previous
years to predict the next CO2 emissions, and LSTM is
very suitable as a model for this experiment. In this
study, the LSTM model first analyzes and trains the
input CO2 emissions and years before predicting the
potential values of CO2 emissions in the next few
years for testing.
2.2.2 Loss Function
It is crucial to select the loss function for training. The
MSE loss function, which is frequently used in
machine learning, is the most suitable option for this
task of predicting CO2 emissions. Regression is one
of the three fundamental machine learning models,
and it plays a vital role in modeling and analyzing the
relationships between variables. In the context of
predicting CO2 emissions, the selection of an
appropriate loss function for training is crucial.
Among the various options available in machine
learning, the MSE loss function stands out as the most
suitable choice. The MSE loss function is commonly
employed in regression tasks, where the goal is to
accurately estimate continuous values based on input
variables. Given its relevance to this study's objective
of forecasting CO2 emission levels, the MSE function
becomes particularly pertinent. Regression analysis is
one of the fundamental pillars of machine learning,
providing valuable insights into the relationships
between variables. By leveraging regression models,
researchers and analysts can effectively model and
analyze the complex dynamics of CO2 emissions,
uncovering underlying patterns and trends. The MSE
loss function quantifies the discrepancy between the
predicted and actual CO2 emission values by
computing the average squared difference. This
choice of loss function aligns well with the objective
of accurate forecasting, as it places higher emphasis
on larger prediction errors, thus favoring more precise
models.
Moreover, the MSE metric offers several
advantages in evaluating the performance of the CO2
emission forecasting model. It provides an easily
interpretable measure of prediction accuracy,
enabling researchers and stakeholders to assess the
reliability of the model's forecasts. The squared
nature of MSE also ensures that larger deviations
from the actual values receive more significant
penalties, promoting the prioritization of accurate
predictions. By incorporating the MSE loss function
into the model training process, this study aims to
develop a robust and reliable forecasting model for
CO2 emissions. Through comprehensive analysis and
consideration of the relationships between various
variables, the regression-based approach empowered
by the MSE metric offers valuable insights into
mitigating climate change and shaping effective
environmental policies. In regression problems, a
precise value is typically predicted, such as in this
study's predicted annual CO2 emissions value, as
follows:
MSE =
∑
(𝒾
𝒾)²
(1)
Given n training data, each training data's actual
output is
y𝒾 and its expected value is y𝒾 . The
aforementioned formula can be used to define the
MSE loss produced by the model for n training data.
The MSE function measures the quality of the model
by calculating the distance between the predicted
value and the actual value, that is, the square of the
error. When summing samples, MSE loss applies the
square method to prevent positive and negative errors.
This method's distinguishing feature is that it