A Game Theoretic Approach Based on Differential Evolution to

Ensemble Learning for Classiﬁcation

Rodica Ioana Lung

Center for the Study of Complexity, Babes¸ -Bolyai University, T. Mihali 58-60, Cluj Napoca, Romania

Keywords:

Ensemble Learning, Classiﬁcation, Game Theory, Differential Evolution.

Abstract:

Aggregating results of several learners known to each perform well on different data types is a challenging

task that requires ﬁnding intelligent, trade-off solutions. A simple game-theoretic approach to this problem

is proposed. A non-cooperative game is used to aggregate the results of different classiﬁcation methods.

The Nash equilibrium of the game is approximated by using a Differential Evolution algorithm. Numerical

experiments indicate the potential of the approach for a set of synthetic and real-world data.

1 INTRODUCTION

The evaluation and aggregation of the results of sev-

eral classiﬁers continue to be a challenging task in

machine learning. It is well-known that different

methods produce different results on different data

sets, making it difﬁcult to decide which method to use

for prediction purposes or how to aggregate results in

an optimal manner. Stacking ensemble methods use

another learner to aggregate results and make predic-

tions (Polikar, 2012). The main goal of aggregating

the results of different learners is to ﬁnd a good trade-

off between the results to reduce bias and variability

(Mienye and Sun, 2022). The main goal of using an-

other learner is to create an intelligent and explainable

aggregation.

One of the ﬁelds that specialize in deﬁning and

computing different types of trade-offs is game the-

ory (Maschler et al., 2020). In game-theoretic models,

agents interact in strategic and conﬂicting situations,

and equilibria are deﬁned. The term equilibrium is

used because, most of the time, the solution’s deﬁni-

tions include some notion of stability. One of the most

popular solution concepts in game theory is the Nash

equilibrium, which ensures stability against unilateral

deviations.

In this paper, we propose using the Nash equilib-

rium as a solution for a meta-learner of a stacking

ensemble method. In a stacking ensemble approach,

several base learners are trained on the data set, and

their results are aggregated by using another learner,

called a meta learner. A game among base learners is

deﬁned, in which each of them aims to maximize its

marginal contribution to the accuracy of the ensem-

ble. The Nash equilibrium of the game is approxi-

mated using a differential evolution algorithm.

The proposed method is called Nash stacking dif-

ferential evolution and is described in Section 3, after

a short review of related work in Section 2. Numerical

experiments performed on synthetic and real-world

data are used to illustrate the behavior and potential

of the approach in Section 4. The paper ends with

conclusions.

2 RELATED WORK

The binary classiﬁcation problem consists of ﬁnding a

rule to assign one of two labels, or classes, to data in-

stances, based on known information about the data.

This means that we are given a data set X ⊂ R

n×p

con-

taining n instances x

∈ R

and their corresponding

labels Y ∈ {0, 1}

from which to learn the rule. The

goal may be to use the rule to explore or explain the

data or to make predictions about instances from the

same distribution for which the labels are unknown.

The binary classiﬁcation problem is one of the

most studied in the literature, with many proposed

methods varying in their approach (Hastie et al., 2016;

Zaki and Meira Jr., 2014). One of the reasons so

many approaches exist is that different methods per-

form well on different data sets, and it is a difﬁcult

task to predict which method will perform better on

new data. Ensemble methods aim to provide ways to

combine the results of several learners to overcome

258

Lung, R.

A Game Theoretic Approach Based on Differential Evolution to Ensemble Learning for Classiﬁcation.

DOI: 10.5220/0012192700003595

In Proceedings of the 15th International Joint Conference on Computational Intelligence (IJCCI 2023), pages 258-264

ISBN: 978-989-758-674-3; ISSN: 2184-3236

this shortcoming. There are two main categories of

ensemble learning: homogeneous and heterogeneous

(Mienye and Sun, 2022); homogeneous methods use

the same learner several times on different instances

of the data, while heterogeneous methods use differ-

ent learners on the same data and aggregate results.

Stacking generalization is a heterogeneous ensem-

ble learning method that uses the results of several,

and typically very different learners by further train-

ing another model to combine them(Wolpert, 1992;

Pavlyshenko, 2018).

Stacking models use a set of base algorithms,

called level-0 models, trained directly on the data set

and (another) model, called meta-learner, or level-1

model to learn how to combine their predictions best

(Polikar, 2012). They are widely used for various

applications and settings. For example, a stacking

model considering different feature combinations for

the default risk of small enterprises is presented in

(Chi et al., 2023). Medical applications include de-

tecting heart irregularities and predicting cardiovas-

cular disease in (Mohapatra et al., 2023). A variation

of stacking, called A-stacking is used for spoof ﬁn-

gerprint detection in (Agarwal and Chowdary, 2020).

In (Dong et al., 2021) a stacking ensemble is used

to predict wind power, and in (Sun and Trevor, 2018)

it is used for annual river ice breakup dates. Rock

deformation predictions are approached with stacking

models in (Koopialipoor et al., 2022). There are also

multiple applications in sentiment analysis (Wang

et al., 2014; AlGhamdi et al., 2022; Zhang et al.,

2021; Chen et al., 2022; Agarwal and Chowdary,

2021).

3 NASH-STACKING

CLASSIFICATION (NS)

Consider a set of k base learners ML

, ML

, . . . , ML

Each of them is trained using X,Y , resulting in a set

of predictions

, j = 1, . . . , k, and the corresponding

probabilities P

, j = 1, . . . , k. Thus, ˆy

i j

is the class

predicted by model ML

for the instance x

∈ X , and

i j

is the probability that the instance x

is classiﬁed

as 1 by the learner ML

. Predictions are made based

on probabilities p

i j

: if p

i j

> 0.5, then instance x

labeled as 1, otherwise as 0.

The goal of ensemble learning is to combine the

results of the k learners so that those that performed

well are used, and those that performed poorly are dis-

carded. A simple and effective meta-learner would be

linear model, as it also provides a possible interpreta-

tion of the parameters. In this approach the goal is to

ﬁnd parameter α

= (α

, . . . , α

) such that ensemble

probabilities are computed as:

(α

, P) =

∑

j=1

i j

, ∀i = {1, . . . , n}, (1)

with

∑

j=1

= 1, can be used to make reliable pre-

dictions.

3.1 Ensemble Learning Game

A new method for learning the parameter α as an

equilibrium of a non-cooperative game is proposed.

A non-cooperative game is deﬁned by three elements:

a set of players K, a set of actions available to them A,

and a set of payoff functions U that take into account

the actions of all the players. The game Γ(K, A,U)

proposed in this paper consists of:

• The set of players K is represented by the k base

learners;

• The set of actions A: each player j chooses a α

parameter to contribute to the ensemble model.

Thus A ⊂ R

; an element α ∈ A is called the strat-

egy proﬁle or a situation of the game, in which

player j has chosen α

• The payoff functions U = (u

)

j=1,k

are computed

as a marginal contribution of each player to the

accuracy of the model:

(α) = ACC(α|P,Y ) − ACC(α

− j

,Y ), (2)

and

U(α) = (u

(α), u

(α), . . . , u

(α)), (3)

where

ACC(α|P,Y ) =

∑

i=1

I(I(EP

> 0.5) = y

). (4)

is given in Eq. (1) and I(·) is the identity func-

tion taking the value one if the argument is true

and zero otherwise. In Eq. (2), the index − j in

− j

and P

− j

indicates that the j

component of α

and P, respectively, is removed.

Thus, the payoff each player tries to maximize is

represented by its contribution to the overall accu-

racy of the model. A higher payoff indicates that the

learner contributes more to the overall accuracy of the

ensemble. Non-cooperative game theory offers many

solution concepts. Among the most popular is the

Nash equilibrium, which can be described as a strat-

egy proﬁle of the game from which none of the play-

ers has a unilateral incentive for deviation, i.e. none

of the players can improve their payoffs by changing

their strategies while all others maintain theirs. For

game Γ, the Nash equilibrium may indicate that none

A Game Theoretic Approach Based on Differential Evolution to Ensemble Learning for Classiﬁcation

259

Algorithm 1: NS-DE: Nash Stacking - DE outline.

1: Generate initial population A = {α

, . . . , α

popsize

}

using the standard uniform distribution;

2: Evaluate population using payoffs U in Eq. (3)

and (5);

3: nrgen=0;

4: while (nrgen<MaxGen) and (no changes in Max-

Gen/10 iterations) do

5: for each i = {1, ..., popsize} do

6: create offspring o

from parent β

using the

DE/rand/1/exp scheme (Alg. 2);

7: if (o

Nash ascends (Alg. 3) parent α

) then

8: o

replaces parent α

;

9: end if

10: end for

11: end while

12: return non-dominated individuals;

of the methods can contribute more to the ensemble

by unilaterally changing its parameter in α. While at

this stage it is not possible to assess if the game has

an equilibrium, by using a stochastic search method

that detects equilibria we can ﬁnd solutions that may

present equilibrium properties that may be of interest

to a decision maker.

3.2 Differential Evolution for Nash

Equilibria Detection

Differential evolution (DE) is a simple and efﬁ-

cient stochastic search and optimization method that

evolves a population of potential solutions to the

problem (Storn and Price, 1997). It has been adapted

to approximate the Nash equilibria of a game us-

ing the Nash ascendancy relation in (Lung and Du-

mitrescu, 2008; Lung et al., 2010) during the selec-

tion phase. We further adapt the DE to approximate

the NE of the game Γ, and we call this version the

Nash Stacking Differential Evolution (NS-DE). The

outline of NS-DE is presented in Algorithm 1.

Population and Initialization. The NS-DE popu-

lation consists of individuals α ∈ R

. In the ﬁrst iter-

ation, they are randomly generated following a stan-

dard uniform distribution.

Evaluation. The ﬁtness of each individual is eval-

uated as the payoff function U in Eq. (3). However,

to ensure that the right side of equation (1) is a prob-

ability, only during the evaluation of the payoffs, the

values of α

are normalized to add to 1 by dividing

them by their sum. Thus, when evaluating an individ-

ual α = (α

, . . . , α

, its values are ﬁrst modiﬁed to:

Algorithm 2: DE - the DE/rand/1/exp scheme to create off-

spring o

from parent α

1: o

= α

;

2: randomly select parents α

, α

, where i

6= i

6= i;

3: w = U (0, k)

;

4: for j = 0; j < k ∧U(0, 1) < CR; j = j + 1 do

5: o

= α

+ F(α

− α

w);

6: w = (w + 1)%k;

7: end for

U(0, k) is a discrete uniform value between 0 and k.

+ α

+ . . . + α

(5)

and α

are used within the payoff functions.

Variation Operators. NS-DE uses the

DE/rand/1/exp scheme, presented in Algorithm

2, to create an offspring (Thomsen, 2004). With a

probability CR, some components of the offspring

are modiﬁed using values from three different parents

from the current population by adding the difference

of two, multiplied by a scaling factor F, to the third.

CR and F are parameters of the DE.

Nash Ascendancy. Each offspring replaces its par-

ent if it is better than it. In the game-theoretic con-

text, the concept of better is implemented by using

a relation among strategy proﬁles that permits their

comparisons for selection purposes. Thus, to guide

the search of the DE population towards the equilib-

rium of game Γ, we use the Nash ascendancy relation

from (Lung and Dumitrescu, 2008), described in Al-

gorithm 3. The comparison is made by counting how

many players can improve their strategies by unilat-

erally changing from one individual to the other. The

individual with fewer such players is considered bet-

ter from the Nash ascendancy point of view. If there

is an equal number of players that can improve their

payoffs from each side, then the two individuals are

considered indifferent to each other.

Output. NS-DE provides the individual that is bet-

ter than most other individuals in the population based

on the Nash ascendancy relation.

NS-DE Parameters. NS-DE uses speciﬁc DE pa-

rameters: population size popsize, maximum number

of iterations MaxGen, crossover rate CR, and scaling

factor F. If 10% of the maximum number of itera-

tions elapse without any replacements being made in

the population, the search stops.

ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications

260

Algorithm 3: Nash ascendancy test to compare offspring o

to parent β.

1: k

= k

= 0;

2: for j = 0; j < p; j = j + 1 do

3: if o

<> β

then

4: O

= O, β

= β;

5: o

= β

, β

= o

;

6: if u

) > u

(o) then

7: k

+ +;

8: end if

9: if u

(β

) > u

(β) then

10: k

+ +;

11: end if

12: end if

13: end for

14: if k

< k

then

15: return o Nash ascends β is TRUE (1);

16: else

17: if k

> k

then

18: return o Nash ascends β is FALSE (-1);

19: else

20: return o is INDIFFERENT to β (0);

21: end if

22: end if

4 NUMERICAL EXPERIMENTS

Numerical experiments are performed on a set of syn-

thetic and real-world data to illustrate the behavior of

the approach.

4.1 Experimental Set-up

Data. Synthetically generated datasets can be used

to assess the behavior of a method for various param-

eters and degrees of difﬁculty. While good results on

these datasets do not guarantee similar performance

on real-world data, they do indicate the approach’s

potential. In this paper, 36 datasets combining all of

the following parameters were generated:

• number of instances: 100, 500, 1000;

• number of features: 3, 10,20,30;

• class separation: 0.1, 1, 10.

The data were generated by using the function

make classification form the sklearn library

(Pedregosa et al., 2011) available in Python. The class

separation parameter controls the overlapping of the

classes, with a smaller value indicating a larger over-

lap and a more difﬁcult classiﬁcation problem.

The following real-world data sets from the UCI

machine learning repository (Dua and Graff, 2017)

are used:

• R1 arcene , with 200 instances and 10001 at-

tributes;

• R2 banana , with 5300 instances and 3 attributes;

• R3 hill valley with noise , with 1212 instances and

101 attributes;

• R4 hill valley without noise , with 1212 instances

and 101 attributes;

• R5 Kr vs kp, with 3196 instances and 37 attributes;

• R6 LSVT Voice Rehabilitation , with 126 instances

and 311 attributes;

• R7 Madelon , with 2600 instances and 501 at-

tributes;

• R8 Monks3 , with 554 instances and 7 attributes;

• R9 Phoneme , with 5404 instances and 6 at-

tributes;

• R10 Ringnorm , with 7400 instances and 21 at-

tributes;

• R11 Thyroid Sick Euthyroid. , with 3163 instances

and 26 attributes;

• R12 Wilt , with 4839 instances and 6 attributes.

Base Learners. The following base learners (Zaki

and Meira Jr., 2014) are used for numerical experi-

ments :

• ML

: Logit - logistic regression,

• ML

: RF - random forests,

• ML

: KNN - k Nearest neighbor,

• ML

: NaiveB - Naive Bayes,

• ML

: SVM - Support Vector Machine.

For each method, we used their implementation from

the sklearn (Pedregosa et al., 2011) library, with

default parameters. The results provided by the ﬁve

methods were used to construct the ﬁve-player game

Γ, whose equilibrium is approximated by using NS-

DE.

Performance Evaluation. For all synthetic and

real-world data sets, 10-fold cross-validation is used

to evaluate the performance of NS-DE. This means

that each data set is divided into 10 equal-sized folds,

and experiments are performed 10 times; each fold is

used once for testing purposes and the other nine for

training purposes. Three performance indicators are

reported for test folds: the area under the curve AUC

(Melo, 2013), the accuracy ACC (Zaki and Meira Jr.,

2014), and log-loss LL (Vovk, 2015). AUC and ACC

take values between 0 and 1; the higher, the better.

A Game Theoretic Approach Based on Differential Evolution to Ensemble Learning for Classiﬁcation

261

Figure 1: Synthetic datasets, box-plots of log-loss values reported by NS-DE, compared to the base approach. Lower values

are considered better.

LL represents an error, and the smaller the value, the

better the results can be considered.

For each data set, the ten values reported for each

fold were compared between the two methods by us-

ing a non-parametric Mann Whitney U test (MWU),

testing the null hypothesis that the NS-DE results

were worse. Whenever the null hypothesis is rejected,

for p-values smaller than 0.05 we can consider NS-

DE results better.

Parameter Settings. NS-DE uses speciﬁc DE pa-

rameters. The experiments presented in this paper

were carried out with a population size of 30 indi-

viduals, evolving for 30 iterations, with F = 0.5 and

CR = 0.8. For each test data fold, 10 independent

runs of NS-DE are performed, and the average of the

performance indicators is reported.

4.2 Results and Discussions

Synthetic Data Sets. A total of 17 of the 108 com-

parisons performed indicated a signiﬁcant difference

in the results in favor of NS-DE. The rest of the re-

sults, although showing minor improvements, could

not be considered signiﬁcantly better, but also not

worse. The most signiﬁcant differences were in the

values of the Log-loss indicator. Figure 1 presents

box-plots of log loss results reported for each data set

for the ten-folds. In the ﬁrst row of plots, datasets with

the smallest class separator value (with overlapping

classes) are the most difﬁcult, while in the last row,

we have the datasets with best-separated data. The

ﬁrst column presents results for data sets with 100 in-

stances, the second one data sets with 500 instances

and the last one with 1000 instances. The boxes are

grouped based on the number of attributes. We ﬁnd

that most signiﬁcant differences appear for data-sets

with lower class separator values and a higher num-

ber of instances. The number of attributes in the data

set does not seem to inﬂuence the results, which is

maybe due to the fact that both methods work with

the results of the base learners and not directly with

the data sets.

Example 4.1. As an example of a result, consider a

synthetic dataset with 500 instances, 3 attributes, and

class separation 0.1. Table 2 presents results reported

by the base learners, the base ensemble, and NS-DE

ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications

262

Table 1: Average and standard deviation values for the three indicators for the real-world datasets. Results marked with an (*)

indicate that the difference between NS-DE and the base approach can be considered signiﬁcant according to the MWU test.

Data AUC NS-DE AUC base ACC NS-DE ACC base LL NS-DE LL base

set mean std mean std mean std mean std mean std mean std

R1 0.937 0.033 0.920 0.047 0.893 0.045 0.850 0.075 0.353 0.049 0.399 0.056

R2 0.955 0.009 0.952 0.010 0.891 0.015 0.897 0.009 0.316* 0.021 0.426 0.007

R3 0.951* 0.009 0.831 0.044 0.892* 0.025 0.704 0.031 0.404* 0.034 0.592 0.018

R4 0.983* 0.037 0.926 0.020 0.966* 0.063 0.769 0.028 0.311* 0.070 0.520 0.011

R5 0.986 0.015 0.987 0.015 0.929 0.042 0.913 0.066 0.241* 0.061 0.325 0.060

R6 0.794 0.191 0.762 0.216 0.768 0.095 0.674 0.150 0.557* 0.057 0.611 0.077

R7 0.778* 0.024 0.728 0.022 0.707* 0.016 0.663 0.026 0.567* 0.018 0.606 0.015

R8 0.992 0.011 0.981 0.023 0.967 0.025 0.964 0.034 0.179* 0.035 0.233 0.041

R9 0.945* 0.010 0.917 0.012 0.892* 0.009 0.846 0.015 0.276* 0.020 0.349 0.014

R10 0.996 0.002 0.995 0.002 0.955 0.021 0.971 0.006 0.222* 0.014 0.265 0.005

R11 0.947 0.022 0.953 0.022 0.949 0.007 0.953 0.007 0.286 0.018 0.250 0.014

R12 0.989 0.009 0.987 0.012 0.983 0.010 0.980 0.009 0.058* 0.026 0.094 0.019

Table 2: Example 4.1, results reported by the base learners,

the base ensemble, and NS-DE.

AUC ACC LL

Logit 0.429 0.460 0.725

RF 0.615 0.580 0.658

KNN 0.885 0.800 1.018

NaiveB 0.562 0.520 0.654

SVM 0.822 0.700 0.509

base 0.853 0.740 0.539

NS-DE 0.887 0.774 0.480

on one test fold. The solution provided by NS-DE

(averaged over 10 DE runs) is

α = (0.4663, 0.6825, 1.2417, 0.1787, 1.1179).

α reasonably identiﬁes the contribution of the base

learners to the ensemble, with the highest coefﬁcient

assigned to KNN and SVM. In this instance, KNN re-

ports the best results when we look at AUC and ACC,

but the worst log loss value.

As a remark, the sum of the coefﬁcients is not one,

but when used in the ensemble, the values are normal-

ized, resulting in

= (0.1264, 0.1851, 0.3367, 0.0484, 0.3031).

Real-World Data. Table 1 presents the numerical

results reported by NS-DE and the base method on

the real world data for the three performance mea-

sures. Results for which the MWU test indicates a

better performance than the base method are marked

with an (*). In a similar manner to synthetic data, log

loss values are signiﬁcantly improved in most cases.

In most cases, standard deviations reported by NS-DE

are lower. For four datasets, AUC and ACC values are

also signiﬁcantly better.

5 CONCLUSIONS

A game-theoretic approach to evolving parameters of

an ensemble meta-learner is proposed. As an initial

attempt to use the Nash equilibrium to estimate pa-

rameters for a meta-learner, the marginal contribution

to the accuracy of the training data is used as a pay-

off function in a non-cooperative game. A differen-

tial evolution algorithm is adapted to approximate the

Nash equilibrium of the game.

The preliminary results presented here show that

the approach may be competitive, compared to a base-

line model that averages the probabilities of the base

models. Future models can explore existing stacking

generalization models and the possibility of introduc-

ing the Nash equilibrium as a solution for different

meta-learners in order to offer decision-makers dif-

ferent types of options.

ACKNOWLEDGEMENTS

This work was supported by a grant of the Romanian

Ministry of Education and Research, CNCS - UEFIS-

CDI, project number PN-III-P4-ID-PCE-2020-2360,

within PNCDI III.

REFERENCES

Agarwal, S. and Chowdary, C. R. (2020). A-Stacking and

A-Bagging: Adaptive versions of ensemble learning

algorithms for spoof ﬁngerprint detection. Expert Sys-

tems with Applications, 146:113160.

Agarwal, S. and Chowdary, C. R. (2021). Combating hate

speech using an adaptive ensemble learning model

A Game Theoretic Approach Based on Differential Evolution to Ensemble Learning for Classiﬁcation

263

with a case study on COVID-19. Expert Systems with

Applications, 185:115632.

AlGhamdi, N., Khatoon, S., and Alshamari, M. (2022).

Multi-Aspect Oriented Sentiment Classiﬁcation: Prior

Knowledge Topic Modelling and Ensemble Learning

Classiﬁer Approach. Applied Sciences, 12(8).

Chen, J., Li, Z., and Qin, S. (2022). Ensemble Learning for

Assessing Degree of Humor. In 2022 International

Conference on Big Data, Information and Computer

Network (BDICN), pages 492–498.

Chi, G., Huang, X., Zhou, Y., and Guo, X. (2023). Discrim-

inating the default risk of small enterprises: Stacking

model with different optimal feature combinations.

Expert Systems with Applications, 229:120494.

Dong, Y., Zhang, H., Wang, C., and Zhou, X. (2021). Wind

power forecasting based on stacking ensemble model,

decomposition and intelligent optimization algorithm.

Neurocomputing, 462:169–184.

Dua, D. and Graff, C. (2017). Uci machine learning reposi-

tory.

Hastie, T., Tibshiran, R., and Friedman, J. (2016). The Ele-

ments of Statistical Learning: Data Mining, Inference,

and Prediction. Springer.

Koopialipoor, M., Asteris, P. G., Mohammed, A. S., Alex-

akis, D. E., Mamou, A., and Armaghani, D. J. (2022).

Introducing stacking machine learning approaches for

the prediction of rock deformation. Transportation

Geotechnics, 34:100756.

Lung, R. I. and Dumitrescu, D. (2008). Computing

nash equilibria by means of evolutionary computa-

tion. Int. J. of Computers, Communications & Con-

trol, III(suppl.issue):364–368.

Lung, R. I., Mihoc, T. D., and Dumitrescu, D. (2010). Nash

equilibria detection for multi-player games. In Pro-

ceedings of the IEEE Congress on Evolutionary Com-

putation, CEC 2010, Barcelona, Spain, 18-23 July

2010, pages 1–5. IEEE.

Maschler, M., Zamir, S., Solan, E., Hellman, Z., and Borns,

M. (2020). Game Theory. Cambridge University

Press.

Melo, F. (2013). Area under the ROC Curve, pages 38–39.

Springer New York, New York, NY.

Mienye, I. D. and Sun, Y. (2022). A Survey of Ensemble

Learning: Concepts, Algorithms, Applications, and

Prospects. IEEE Access, 10:99129–99149.

Mohapatra, S., Maneesha, S., Mohanty, S., Patra, P. K.,

Bhoi, S. K., Sahoo, K. S., and Gandomi, A. H.

(2023). A stacking classiﬁers model for detecting

heart irregularities and predicting Cardiovascular Dis-

ease. Healthcare Analytics, 3:100133.

Pavlyshenko, B. (2018). Using stacking approaches for ma-

chine learning models. In 2018 IEEE Second Interna-

tional Conference on Data Stream Mining & Process-

ing (DSMP), pages 255–258.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer,

P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,

A., Cournapeau, D., Brucher, M., Perrot, M., and

Duchesnay, E. (2011). Scikit-learn: Machine learning

in Python. Journal of Machine Learning Research,

12:2825–2830.

Polikar, R. (2012). Ensemble Learning. In Zhang, C.

and Ma, Y., editors, Ensemble Machine Learning:

Methods and Applications, pages 1–34. Springer New

York, New York, NY.

Storn, R. and Price, K. (1997). Differential evolution – a

simple and efﬁcient heuristic for global optimization

over continuous spaces. J. of Global Optimization,

11(4):341–359.

Sun, W. and Trevor, B. (2018). A stacking ensemble learn-

ing framework for annual river ice breakup dates.

Journal of Hydrology, 561:636–650.

Thomsen, R. (2004). Multimodal optimization using

crowding-based differential evolution. In Proceedings

of the 2004 Congress on Evolutionary Computation

(IEEE Cat. No.04TH8753), volume 2, pages 1382–

1389 Vol.2.

Vovk, V. (2015). The Fundamental Nature of the Log Loss

Function, pages 307–318. Springer International Pub-

lishing, Cham.

Wang, G., Sun, J., Ma, J., Xu, K., and Gu, J. (2014). Sen-

timent classiﬁcation: The contribution of ensemble

learning. Decision Support Systems, 57:77–93.

Wolpert, D. H. (1992). Stacked generalization. Neural Net-

works, 5(2):241–259.

Zaki, M. J. and Meira Jr., W. (2014). Data Mining and Anal-

ysis: Fundamental Concepts and Algorithms. Cam-

bridge University Press.

Zhang, Y., Lu, H., Jiang, C., Li, X., and Tian, X. (2021).

Aspect-Based Sentiment Analysis of User Reviews in

5G Networks. IEEE Network, 35(4):228–233.

ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications

264