criterion for the i-th pixel satisfied several images at once, then observations with the largest value of
NDVI were chosen. For Su and Au seasonal mosaics, the training sample is formed according to the
following channels: Band 4 (red), Band 8 (NIR), Band 11 (SWIR1), Band 12 (SWIR2); channel
relations Band 4 / Band 8, Band 4 / Band 12, Band 8 / Band 11, Band 8 / Band 12, Band 11 / Band 12,
NDVI index. For the ApOC mosaic, special metrics were selected using the following statistics: the
median, the 1
st
and 3
rd
quartiles of the bands: Band 4, Band 8, Band 11, Band 12 and the NDVI index.
The ApOc approximately corresponds to the growing season in Kyiv region. We therefore aimed to
capture the dynamics of spectral features of different tree species within this timeframe (April to
October). The mosaics we extracted were three metrics corresponding to the start of the season (1
st
quartile), middle (median), end (3
d
quartile). The minimum and maximum values were not applied
because they would potentially highlight extreme pixel values. We believe this set of predictors is
effective in detecting the seasonal variability of the spectral properties of various categories of the
forest cover, particularly in identifying the dominant tree species.
In the GEE computing environment, all characteristics (metrics) of seasonal composite mosaics
were composed into one multichannel image with 10m resolution. The total number of channels we
selected was 45. Prior to collating the images, each Sentinel-2 image was converted to Top-of-
Atmosphere reflectance values to enable supervised classification.
2.4. Settings of the classification algorithm
Given the large number of predictors, exclusively non-parametric methods for classifying satellite
images were considered. Recently, the Random Forest (RF) machine learning algorithm has been
widely used as an enhancement of traditional decision trees consisting of many of decision trees [12].
For the construction of each decision tree, an individual bootstrap sample (usually two thirds) was
drawn from the original dataset (i.e. sampling with replacement). The rest of the observations were
used to estimate OBB (out-of-bag) classification errors. Bagging was repeated n times, after which
results from all classification trees were averaged. Finally, the predicted class of an observation was
determined by the majority case from all the decision trees developed within the RF [13].
In the RF method, the decision of classification trees is weakly correlated due to the double
realization of a random selection process, i.e. at the stage of the formation of training subsets and the
selection of predictors for branching. However, the optimization of the training sample of a large set
of predictors is important [8].
In order to evaluate the relative impact of each predictor on the accuracy of the RF model,
the %IncMSE indicator was used. It indicates how many percentages the mean square error of
classification will increase if we excluded the corresponding variable from the model, and is the most
commonly used indicator in studies to interpret the accuracy of RF classifiers [14], [15].
In order to select the optimal values for the parameters of the RF model, we used the tuneRF
function from the randomForest statistical package in the R programming language. The magnitude
of the relative influence of the predictors on the accuracy of the classification was estimated by the
mean arithmetic error value (OBB error), calculated as a result of 50 repetitive launches of the
randomForest algorithm. Subsequently, each variable was assigned a rank according to the decrease
of %IncMSE.
In the first stage, we analyzed how different the accuracy of the classification of individual
seasonal mosaics was (Figure 2). The slightest error showed the classification of images of the ApOc
period (OBB was approximately 1%), but the two other classifications of seasonal mosaics of Su and
Au showed slightly lower accuracy (OBB is approximately 2 and 8% respectively). For some
indicators, the %IncMSE characteristics acquired negative values and indicated the need to exclude
them from the calculations.
In the second stage we used the entire list of predictors for estimating the classification OBB error.
Of the 35 variables used, those with the lowest errors were obtained (about 0,1%). This enabled us to
IWEMSE 2018 - International Workshop on Environmental Management, Science and Engineering
600