Towards the Automatic Creation of Optimized Classiﬁer Ensembles

Julius Voggesberger

1 a

, Peter Reimann

1,2 b

and Bernhard Mitschang

1 c

Institute for Parallel and Distributed Systems, University of Stuttgart, Universit

atsstr. 38, Stuttgart, Germany

Graduate School of Excellence advanced Manufacturing Engineering,

University of Stuttgart, Nobelstr. 12, Stuttgart, Germany

Keywords:

Classiﬁer Ensembles, Classiﬁer Diversity, Decision Fusion, AutoML, Machine Learning.

Abstract:

Classiﬁer ensemble algorithms allow for the creation of combined machine learning models that are more

accurate and generalizable than individual classiﬁers. However, creating such an ensemble is complex, as

several requirements must be fulﬁlled. An expert has to select multiple classiﬁers that are both accurate and

diverse. In addition, a decision fusion algorithm must be selected to combine the predictions of these classiﬁers

into a consensus decision. Satisfying these requirements is challenging even for experts, as it requires a lot

of time and knowledge. In this position paper, we propose to automate the creation of classiﬁer ensembles.

While there already exist several frameworks that automatically create multiple classiﬁers, none of them meet

all requirements to build optimized ensembles based on these individual classiﬁers. Hence, we introduce and

compare three basic approaches that tackle this challenge. Based on the comparison results, we propose one of

the approaches that best meets the requirements to lay the foundation for future work.

1 INTRODUCTION

Classiﬁcation algorithms make it possible to automate

decision making and to cast predictions by extracting

insights from data. However, often a single classiﬁca-

tion model is not sufﬁcient for the task at hand. The

reasons can be manifold: Small data sets that may lead

to overﬁtting, data sets with high dimensionality, or

the underlying task is too complex for a single classi-

ﬁer, which results in inaccurate predictions (Dietterich,

2000; Polikar, 2006; Hirsch et al., 2019; Wilhelm et al.,

2020; Hirsch et al., 2023).

For such tasks, a classiﬁer ensemble can be used

to achieve more accurate results. Ensembles have the

advantage of being less sensitive to noise and gen-

eralizing better to new, unseen data (Polikar, 2006;

Dietterich, 2000). Ensemble algorithms train multiple

classiﬁers and combine their predictions with decision

fusion algorithms to reach a consensus decision. The

goal is to achieve more accurate predictions in com-

parison to a single classiﬁer (Polikar, 2006; Kuncheva,

2004; Dietterich, 2000). To this end, the classiﬁers

should be diverse in their predictions, i. e., they should

make both errors and correct predictions for different

https://orcid.org/0000-0003-4808-1922

https://orcid.org/0000-0002-6355-4591

https://orcid.org/0000-0003-0809-9159

data instances (Hansen and Salamon, 1990; Dietterich,

2000; Brown et al., 2005).

However, creating an ensemble for a given clas-

siﬁcation task is not straightforward. First, a set of

classiﬁer algorithms has to be selected and their hyper-

parameters have to be optimized. Thereby, the training

and optimization has to result in a set of not only accu-

rate, but also diverse classiﬁers (Hansen and Salamon,

1990; Dietterich, 2000). Second, another challenge

is to select a decision fusion algorithm and its hyper-

parameters to ensure that the ensemble prediction is

more accurate than the predictions of its single clas-

siﬁers. Both the optimization of the classiﬁers and

the decision fusion algorithm are complex and require

expert knowledge and time (Wilhelm et al., 2023).

A possible solution to these challenges, i. e., to re-

duce the amount of knowledge and time needed for

the ensemble creation, is the use of Automated Ma-

chine Learning

(AutoML)

oller and Huber, 2021;

He et al., 2021).

AutoML

frameworks are used to au-

tomatically create optimized machine learning models,

requiring minimal user interaction so that even novice

users can utilize machine learning. However, currently

no existing AutoML framework solves all challenges

for creating an ensemble. Current frameworks opti-

mize classiﬁers w. r. t. their accuracy, but not their

diversity. Furthermore, pre-selected decision fusion

Voggesberger, J., Reimann, P. and Mitschang, B.

Towards the Automatic Creation of Optimized Classiﬁer Ensembles.

DOI: 10.5220/0011988800003467

In Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS 2023) - Volume 1, pages 615-622

ISBN: 978-989-758-648-4; ISSN: 2184-4992

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

615

algorithms are often used and optimizing the decision

fusion algorithm is rarely considered.

In this position paper, we address the above-

mentioned challenges by introducing three approaches

to automatically create an optimized classiﬁer ensem-

ble. The approaches create a set of classiﬁers that is

optimized in terms of classiﬁer diversity and classiﬁca-

tion performance, while the decision fusion algorithm

is optimized w. r. t. classiﬁcation performance to fur-

ther increase the ensemble accuracy. We discuss and

compare these approaches regarding speciﬁc require-

ments, e. g., regarding the complexity of the conﬁgura-

tion space from which an optimal classiﬁer ensemble

has to be selected. We use the comparison results to

identify the approach that best meets the requirements.

The rest of the paper is structured as follows: Sec-

tion 2 introduces the problem statement and the re-

quirements for an automatic creation of classiﬁer en-

sembles. Related work of the paper is discussed in Sec-

tion 3. The three proposed approaches to the ensemble

creation problem and their comparison are discussed

in Section 4. Lastly, Section 5 concludes the paper and

discusses directions for future work.

2 PROBLEM STATEMENT

In this section, the problem of creating an optimized

classiﬁer ensemble is described. As shown in Figure 1,

a classiﬁer ensemble consists of a set of classiﬁca-

tion models (i. e., classiﬁers) and a decision fusion

algorithm which combines the predictions of the clas-

siﬁers into a consensus prediction. In the following,

we discuss requirements for the ensemble creation so

that the resulting ensemble is able to make accurate

predictions:

R1: Classiﬁcation Performance

The ﬁrst require-

ment in the creation of an ensemble is the classiﬁcation

performance. If the individual classiﬁers have a clas-

siﬁcation performance better than random, a higher

performance can be achieved for the ensemble (Hansen

and Salamon, 1990).

R2: Classiﬁer Diversity

The second requirement

is the classiﬁer diversity. In the context of ensem-

bles, high classiﬁer diversity means that individual

classiﬁers make both errors and correct predictions

for different data instances (Polikar, 2006; Kuncheva,

2004). Several classiﬁers that have high diversity thus

complement each other w. r. t. their correct predictions,

so that a decision fusion algorithm may reduce the

overall error. As a consequence, having high diver-

sity leads to a higher classiﬁcation performance of the

ensemble. In addition, it increases the generalization

Ensemble

Set of

Classiﬁers

i,1

i,2

i,k

Decision

Fusion

Figure 1: Representation of an ensemble. The input

is a

data instance,

i, j

are the predictions of the individual classi-

ﬁers and the output

is the combined ensemble prediction.

capability of the ensemble, as the risk of overﬁtting

is reduced by compensating for individual errors (Po-

likar, 2006). Two kinds of methods exist to generate

diversity: Implicit and explicit methods (Brown et al.,

2005). Implicit methods induce diversity by manipu-

lating the training processes of the classiﬁers, e. g., by

using different feature subsets for each classiﬁer. Ex-

plicit methods instead optimize the classiﬁer diversity

directly, e. g., by utilizing diversity measures.

R3: Decision Fusion Optimization The decision

fusion algorithm has to be selected in such a way that

the resulting classiﬁcation performance and general-

ization is high (Polikar, 2006). This results in a second

optimization problem that has to be solved in addi-

tion to the optimization of the individual classiﬁers.

Multiple decision fusion algorithms exist that employ

different combination strategies in order to derive a

consensus decision from the diverse predictions of the

individual classiﬁers. They can be divided into three

groups: utility-based, evidence-based and trainable

algorithms (Wilhelm et al., 2021). Utility-based algo-

rithms heuristically combine the classiﬁer predictions

without using any prior information, e. g., via a major-

ity voting scheme. Evidence-based algorithms make

use of prior information about the classiﬁers, i. e., its

evidence, to combine their predictions. Usually, the

classiﬁer performance is used as evidence, e. g., the

accuracy on a validation data set. Trainable algorithms

train a decision fusion algorithm by using predictions

of the classiﬁers as training data. Furthermore, stacked

generalization (Wolpert, 1992) can be seen as a kind

of trainable decision fusion algorithm.

R4: Automatic Optimization

The last require-

ment is that the ensemble has to be created automat-

ically, i. e., with minimal user interaction. As such,

the above requirements have to be fulﬁlled automat-

ically, e. g., by using an AutoML approach to create

and optimize the ensemble.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

616

3 RELATED WORK

In this section, we discuss literature related to the au-

tomatic creation of classiﬁer ensembles. We ﬁrst re-

view literature that deals with ensembles, decision

fusion and diversity, followed by an analysis of exist-

ing

AutoML

frameworks. Table 1 gives an overview

of the key ﬁndings.

3.1 Ensemble Algorithms

The most prominent ensemble algorithms are boost-

ing (Schapire, 1990) and bagging (Breiman, 1996), in

particular their representations AdaBoost (Freund and

Schapire, 1997) and Random Forest (Breiman, 2001).

Random Forests consist of a diverse set of decision

trees. Here, diversity is induced by using bootstrap

samples for training. i. e., the individual decision trees

are trained with different subsets of the training data.

Thus, the classiﬁcation performance of each tree is

optimized w.r.t. its data subset. The predictions of the

decision trees are aggregated using majority voting or

averaging.

AdaBoost, as a prominent example for boosting

algorithms, trains a diverse set of classiﬁers itera-

tively. In each iteration, the training data instances

are reweighted according to the errors of the classi-

ﬁer trained in the previous iteration in order to reduce

the classiﬁcation error. The decision fusion is accom-

plished by using a weighted averaging scheme over

the individual classiﬁers.

Both algorithms use ﬁxed decision fusion algo-

rithms and thus do not consider ﬁnding an optimal one.

Thus, automatic optimization is neither performed for

the decision fusion algorithms nor for the individual

classiﬁers. Furthermore, bagging does not explicitly

optimize diversity and instead assumes that randomly

changing the training data via bootstrapping is sufﬁ-

cient to induce implicit diversity. On the other hand,

AdaBoost does explicitly optimize its classiﬁer diver-

sity by reweighting the training data. This however

entails that the classiﬁers have to be trained sequen-

tially and that the resulting ensemble tends to overﬁt

on noisy data (Dietterich, 2000).

Some of the more recent literature focuses on

the creation of ensembles through the use of multi-

objective optimization. Examples are multi-objective

ensemble generation (MEG) (Moussa et al., 2022) or

ensembles for imbalanced data (Wegier et al., 2022).

The latter method uses multi-objective optimization

to optimize the weights of the majority voting fusion

w. r. t. precision and recall. However, the set of classi-

ﬁers is not optimized, no further decision fusion meth-

ods are considered, and diversity is only implicitly in-

duced. MEG, on the other hand, uses multi-objective

optimization to explicitly optimize the ensemble w. r. t.

both diversity and classiﬁcation performance (Moussa

et al., 2022). However, the pool of available classiﬁers

and decision fusion methods is highly limited to only

four classiﬁcation algorithms with 15 hyperparameter

variations and to four decision fusion methods.

A framework that allows for optimizing decision

fusion algorithms is PUSION (Wilhelm et al., 2023).

While PUSION enables the automatic selection of the

best performing decision fusion algorithm, it does

not optimize their hyperparameters. Furthermore,

PUSION

assumes that a sufﬁciently diverse set of clas-

siﬁers already exists.

3.2 AutoML

Automated Machine Learning (AutoML) denotes the

automated creation of a machine learning model for

given data within a given budget (Feurer et al., 2015).

The goal of an AutoML framework is to solve the prob-

lem of searching for an algorithm and its hyperparam-

eters that minimize a loss function

. This problem

is formalized as the ”Combined Algorithm Selection

and Hyperparameter optimization” (CASH) problem

(Thornton et al., 2013):

∗

∈ argmin

∈Λ

L (λ

, D

train

, D

valid

). (1)

Here,

is a conﬁguration space consisting of ma-

chine learning algorithms and their hyperparameters.

Each point

in this space is a conﬁguration, i. e., an

algorithm and its assigned hyperparameters. Then,

∗

∈ Λ

is the optimal algorithm and its optimal hyper-

parameters w. r. t. the loss function.

Several existing AutoML frameworks tackle the

CASH problem, but only a few of them create en-

sembles, e. g., Auto-sklearn (Feurer et al., 2015),

H2O (LeDell and Poirier, 2020), ESMBO (Lacoste

et al., 2014), Bayesian Optimization for Ensembles

evesque et al., 2016) and AutoGluon-Tabular (Er-

ickson et al., 2020). However, they do not consider

optimizing diversity explicitly, but instead assume that

using different algorithms and hyperparameters leads

to sufﬁciently diverse ensembles. Furthermore, most

of them use preselected decision fusion algorithms and

therefore do not consider optimizing them. H2O is the

only

AutoML

framework that optimizes the decision

fusion algorithm, but it restricts the set of possible de-

cision fusion algorithms to stacked generalization (Vil-

lanueva Zacarias et al., 2021).

Towards the Automatic Creation of Optimized Classiﬁer Ensembles

617

Table 1: Comparison of related work by requirements for automatically creating an ensemble. If a method fulﬁlls a requirement,

a tick

is written, a tick in braces (

) means that the method partially fulﬁlls the requirement, and a cross

denotes that a

requirement is not met.

Methods

R1:

Classiﬁcation

Performance

R2:

Classiﬁer

Diversity

R3:

Decision Fusion

Optimization

R4:

Automatic

Optimization

Ensemble

Algorithms

Bagging

(Breiman, 1996)

3 (3) 7 7

Boosting

(Schapire, 1990)

3 (3) 7 7

Ensemble for

Imbalanced Data

(Wegier et al., 2022)

3 (3) (3) 7

MEG

(Moussa et al., 2022)

3 (3) (3) (3)

PUSION

(Wilhelm et al., 2023)

7 7 (3) (3)

Automated

Machine

Learning

Auto-Sklearn

(Feurer et al., 2015)

3 (3) 7 3

H2O

(LeDell and Poirier, 2020)

3 (3) (3) 3

ESMBO

(Lacoste et al., 2014)

3 (3) 7 3

Bayesian Optimization

for Ensembles

evesque et al., 2016)

3 (3) 7 3

AutoGluon-Tabular

(Erickson et al., 2020)

3 (3) 7 3

4 OPTIMIZATION PROBLEM

The discussion in the previous section shows that none

of the existing frameworks allows the automatic cre-

ation of ensembles that optimize both their diversity

and decision fusion. Therefore, we investigate basic

approaches to address these problems. We propose

three approaches that extend AutoML to not only op-

timize several classiﬁers w. r. t. classiﬁcation perfor-

mance, but also regarding classiﬁer diversity. These

approaches are capable of automatically optimizing

decision fusion algorithms and their hyperparameters.

In the following, we introduce and compare the

three approaches according to the requirements de-

ﬁned in Section 2. The requirements R1 and R3 are

not discussed in detail, as they are solved by all ap-

proaches in a similar way. To compare the approaches

in terms of their ability to create diverse classiﬁers,

the requirement R2 is split into two parts: Creating

diversity implicitly and explicitly. Finally, the require-

ment R4 for automatic optimization of the ensemble

is compared according to the conﬁguration space com-

plexity of an approach. The higher the complexity of

a conﬁguration space, the more resources, e. g., time

and memory, are needed to ﬁnd an optimal ensemble

conﬁguration. A summary of the comparison is given

in Table 2.

4.1 Combined Conﬁguration Space

In this approach, the ensemble is viewed as one unit

which allows to formulate the problem deﬁned in Sec-

tion 2 as a single optimization problem (see Figure 2a)

with one conﬁguration space. This entails that the

components of the ensemble, i. e., the individual clas-

siﬁers and the decision fusion algorithm, are optimized

together. Thus, the conﬁguration space of the optimiza-

tion problem consists of both the classiﬁcation algo-

rithms for all individual classiﬁers and the decision

fusion algorithms.

To fulﬁll the requirements speciﬁed in Section 2,

the loss function for the AutoML problem must be

modeled to include both a metric for classiﬁcation per-

formance and an explicit metric for classiﬁer diversity.

Another method to increase classiﬁer diversity with-

out using explicit diversity metrics is to use implicit

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

618

Table 2: Comparison of the presented solutions. = fully supported, = partly supported, = not supported.

Combined

Conﬁguration Space

Separated

Conﬁguration Space

Simpliﬁed

Conﬁguration Space

Classiﬁcation Performance

Implicit Diversity

Explicit Diversity

(via Loss Function) (via Loss Function) (via Model Selection)

Decision Fusion Optimization

Conﬁguration Space Complexity O(



|Λ|



· |Γ|) O(



|Λ|



+ |Γ|) O(|Λ| + |Γ|)

diversity methods. This can be done by extending

the conﬁguration space with data transformation algo-

rithms for each classiﬁer, such as bootstrapping or a

random subspace method (Tin Kam Ho, 1998), with

the drawback of increasing its complexity. Note that

this extension is also possible for the following ap-

proaches.

Since the creation of an ensemble is viewed as a

single optimization problem, each point in the conﬁgu-

ration space must represent an ensemble conﬁguration.

Hence, each ensemble conﬁguration has to consist of

classiﬁers and one decision fusion algorithm – where

is the ensemble size. The optimization problem is

then to search for one ensemble conﬁguration that

minimizes the loss function. It is necessary that all

classiﬁer conﬁgurations are selected simultaneously,

since the decision fusion algorithm can only be trained

using a set of classiﬁers. As such, the conﬁguration

space

must consist of the cross product of

classi-

ﬁer conﬁguration spaces

and of one decision fusion

conﬁguration space Γ:

CS = Λ × · · · × Λ

| {z }

k times

×Γ. (2)

As the order of the classiﬁers in the ensemble is

not relevant, there are



|Λ|



possible classiﬁer combi-

nations for an ensemble of size

, with

|Λ|

being the

size of a single classiﬁer conﬁguration space. With the

addition of the decision fusion algorithm, this results

in a ﬁnal complexity in



|Λ|



· |Γ|)

. Using this kind

of conﬁguration space, we can select the ensemble that

exhibits the best overall performance.

To illustrate the complexity, we assume a simple

scenario with a conﬁguration space consisting of three

classiﬁcation algorithms (e. g., decision tree, kNN,

Random Forest) and two decision fusion algorithms

(e. g., majority voting, decision template).Assuming

that there are ten possible hyperparameter assignments

for each classiﬁcation algorithm and ﬁve for each de-

cision fusion algorithm, we get

|Λ| = 3 · 10 = 30

and

|Γ| = 2 · 5 = 10

possible conﬁgurations. If we now

want to create an ensemble of size

k = 5

, the size of the

combined conﬁguration space is



|Λ|



· |Γ| =





· 10

which are around 1.5 million possible conﬁgurations.

4.2 Separated Conﬁguration Space

In the second approach, the optimization of the de-

cision fusion algorithm is considered as a separate

problem, i. e., independent from the classiﬁer set op-

timization (see Figure 2b). Thus, the conﬁguration

space has a lower complexity, while it is still possible

to evaluate multiple ensemble conﬁgurations. To this

end, the classiﬁer set has to be optimized ﬁrst, fol-

lowed by the decision fusion in a separate step. Here,

each classiﬁer set conﬁguration consists of

individ-

ual classiﬁer conﬁgurations. These are then evaluated

w. r. t. classiﬁer performance and diversity. In the

subsequent optimization, multiple ensemble conﬁgura-

tions, i. e., combinations of classiﬁer set and decision

fusion conﬁgurations, are evaluated. This can be done

by iterating over the previously evaluated classiﬁer

set conﬁgurations and performing the decision fusion

optimization for each of these conﬁgurations.

The creation of classiﬁer diversity is the same as

in the ﬁrst approach, since one conﬁguration in the

conﬁguration space is still a set of

classiﬁers. Conse-

quently, they are optimized w. r. t. classiﬁcation perfor-

mance and explicit diversity so that the same diversity

creation methods apply.

On the other hand, the complexity of this approach

decreases compared to the ﬁrst approach, since the

decision fusion optimization is viewed as a separate

optimization problem. This results in two conﬁgu-

ration spaces: One for the classiﬁer set and one for

the decision fusion. The conﬁguration space for the

classiﬁer set optimization is the same as for the ﬁrst

approach, but without

, i. e., the complexity is in



|Λ|



)

. For the decision fusion optimization, the con-

ﬁguration space then consists only of the algorithms

, i. e., its complexity is in

O(|Γ|

). The overall com-

plexity is then the sum of the complexities of both

individual conﬁguration spaces.

Adapting the example of Section 4.1 for this ap-

proach, we get



|Λ|







= 142 506

as the number of

possible conﬁgurations for the classiﬁer set optimiza-

tion and ten possible conﬁgurations for optimizing the

decision fusion algorithm. Thus, the resulting conﬁgu-

Towards the Automatic Creation of Optimized Classiﬁer Ensembles

619

Classiﬁer set

and

decision fusion

optimization

Get best

ensemble

w. r. t. loss

(a) Representation of the approach ”combined conﬁguration space”.

Iterate over classiﬁer set conﬁgurations î

Classiﬁer set

optimization

Decision

fusion

optimization

Select best

decision

fusion

w. r. t. loss

Get best

ensemble

w. r. t. loss

(b) Representation of the approach ”separated conﬁguration space”.

Classiﬁer

optimization

Select set

of classiﬁers

w. r. t.

diversity

Decision

fusion

optimization

Get best

decision fusion

w. r. t. loss

Figure 2: Representations of the three basic approaches to optimize classiﬁer ensembles. 3= Optimization Problem.

ration space is about ten times smaller than that of the

previous approach.

4.3 Simpliﬁed Conﬁguration Space

Instead of directly retrieving a conﬁguration consisting

classiﬁers, we can ﬁrst address the optimization

problem for

CS = Λ

and then select a subset of size

from all evaluated conﬁgurations (see Figure 2c).

However, in this approach it is no longer possible

to integrate a diversity metric into the loss function, as

these metrics have to be computed by comparing mul-

tiple classiﬁers. Alternatively, the diversity can be ex-

plicitly optimized using the above-mentioned classiﬁer

selection step. This can be done by using an ”overpro-

duce and select” approach, i. e., creating a large set of

classiﬁers (overproduce) and then selecting a diverse

and accurate subset of these (select) (Kuncheva, 2004).

The classiﬁer optimization of the current approach can

be seen as the overproduce step, as multiple conﬁgura-

tions of classiﬁcation algorithms are evaluated when

optimizing a classiﬁer. Out of all evaluated conﬁgu-

rations, a subset of

classiﬁers can then be selected

using their classiﬁcation performance and diversity.

The separation of classiﬁer optimization and clas-

siﬁer set selection leads to a far lower complexity of

the conﬁguration space compared to the previously in-

troduced approaches. Since only single classiﬁers are

optimized in the ﬁrst step, the conﬁguration space of

this step is

, i. e., its complexity is in

O(|Λ|)

and thus

only grows linearly with the size of the classiﬁer con-

ﬁguration space. The conﬁguration space complexity

for the decision fusion optimization is again in

O(|Γ|)

So, the overall complexity is in

O(|Λ| + |Γ|)

. Contin-

uing the example from Section 4.1, the conﬁguration

space for classiﬁers contains 30 possible conﬁgura-

tions, while the one for decision fusion contains 20,

i. e., we have in total 50 possible conﬁgurations that

can be evaluated. Compared to the second approach,

the size of the conﬁguration space is smaller by ap-

proximately a factor of 3 000 or of 30 000 compared

to the ﬁrst approach.

4.4 Final Discussion

As shown in Table 2, all three approaches fulﬁll most

of the requirements deﬁned in Section 2 in a similar

way. For example, creating classiﬁers with high classi-

ﬁcation performance is solved by all approaches via

a loss function for decreasing prediction errors. Simi-

larly, inducing implicit diversity is partly possible for

all approaches, since they all create a set of classiﬁers

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

620

consisting of different algorithms and hyperparameters.

Furthermore, the classiﬁer conﬁguration spaces of all

approaches can easily be extended to include implicit

diversity methods such as bootstrapping. However,

this would result in a higher complexity of the conﬁg-

uration space

. Explicit diversity can also be created

by all approaches, but they differ in the way to realize

this. The approaches described in Section 4.1 and 4.2

can incorporate a diversity measure into the loss func-

tion. The third approach instead uses an overproduce

and select approach, i. e., it creates multiple classiﬁers

and then utilizes explicit diversity measures to select

the

most diverse classiﬁers among them. Finally, all

three approaches fulﬁll the requirement to optimize

the decision fusion algorithm to the same extent.

The requirement in which the three approaches dif-

fer most is the complexity of the conﬁguration space,

which depends strongly on how the ensemble is opti-

mized. In the ﬁrst approach introduced in Section 4.1,

the entire ensemble, including the classiﬁer set and

the decision fusion algorithm, is optimized simultane-

ously. Consequently, the complexity of the associated

conﬁguration space, which lies in



|Λ|



·|Γ|)

, is high.

Due to the exploding complexity with increasing

, the

optimization is therefore not feasible in practice.

The second approach introduced in Section 4.2

has a slightly smaller conﬁguration space with a com-

plexity in



|Λ|



+ |Γ|)

, as it carries out the classiﬁer

set optimization and the decision fusion optimization

in two separate steps. However, the ﬁrst step of op-

timizing the classiﬁer set still shows the same high

complexity as in the ﬁrst approach. It thus accounts

for the largest part of the overall complexity.

Finally, the third approach proposed in Section 4.3

exhibits the lowest complexity of the conﬁguration

space. This is possible by reducing the optimization

problem of creating an optimized set of

classiﬁers

to the problem of creating single classiﬁers that are

optimized independently of each other. Consequently,

the complexity of the conﬁguration space is only in

O(|Λ|+|Γ|)

and is thus independent of

. By applying

this approach, the creation of an automatically opti-

mized ensemble becomes feasible in an efﬁcient way.

The complexity of the conﬁguration space has the

highest impact on the ensemble creation, as having a

high complexity results in an inefﬁcient optimization.

Here, the third approach presents the greatest advan-

tage, as its complexity is considerably lower than the

complexity of the other two approaches. Therefore,

we suggest that the approach presented in Section 4.3

should be explored in more detail in future work.

5 CONCLUSION AND OUTLOOK

Classiﬁer ensembles allow for generating accurate clas-

siﬁcation models with high generalizability. However,

the creation of such an ensemble is complex: An ac-

curate and diverse set of classiﬁers has to be created

and a decision fusion algorithm has to be selected and

optimized. To simplify the creation of an ensemble,

we propose to automate it. To this end, we ﬁrst iden-

tiﬁed the requirements for such a solution. We then

introduced three basic approaches for the automatic

creation and optimization of classiﬁer ensembles and

discussed the extent to which these approaches satisfy

the requirements. One of the approaches surpassed the

other two, as its conﬁguration space is of considerably

lower complexity than the complexities of the other

two approaches.

In future work, we intend to implement the third

approach as an extended AutoML framework. For

this purpose, several questions remain to be addressed,

such as which metrics have to be used to measure clas-

siﬁcation performance and classiﬁer diversity, as well

as which optimization method is able to solve the op-

timization problem in the most efﬁcient and effective

way. The extended AutoML framework should then be

evaluated on several real-world datasets, e. g., based on

the predictive performance of the resulting optimized

classiﬁer ensembles.

In addition, the optimization w. r. t. diversity is of

particular interest for future research. While diversity

is known to be an important requirement for ensembles,

it is still unclear how it inﬂuences the classiﬁcation

performance. Moreover, the automatic selection of

decision fusion algorithms already showed promise

when using the PUSION framework (Wilhelm et al.,

2023). However, the combination with AutoML and

the optimization of the hyperparameters of decision

fusion algorithms are still unexplored.

While we presented three basic approaches, vari-

ants of these are possible. In particular, the third ap-

proach may be studied more closely. Finally, com-

pletely new approaches to solve the presented chal-

lenge of automatic ensemble creation may be explored

in future work.

ACKNOWLEDGEMENTS

Parts of this work were ﬁnancially supported by the

Ministry of Science, Research and the Arts of the

State of Baden Wurttemberg within the sustainability

support of the projects of the Exzellence Iinitiative II.

Towards the Automatic Creation of Optimized Classiﬁer Ensembles

621

REFERENCES

Breiman, L. (1996). Bagging predictors. Machine Learning,

24(2):123–140.

Breiman, L. (2001). Random Forests. Machine Learning,

45(1):5–32.

Brown, G., Wyatt, J., Harris, R., and Yao, X. (2005). Diver-

sity Creation Methods: A Survey and Categorisation.

Information Fusion, 6(1):5–20.

Dietterich, T. G. (2000). Ensemble Methods in Machine

Learning. In Goos, G., Hartmanis, J., and van Leeuwen,

J., editors, Multiple Classiﬁer Systems, volume 1857,

pages 1–15. Springer-Verlag, Berlin, Heidelberg.

Erickson, N., Mueller, J., Shirkov, A., Zhang, H., Larroy,

P., Li, M., and Smola, A. (2020). AutoGluon-Tabular:

Robust and Accurate AutoML for Structured Data.

Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.,

Blum, M., and Hutter, F. (2015). Efﬁcient and Robust

Automated Machine Learning. In Cortes, C., Lawrence,

N., Lee, D., Sugiyama, M., and Garnett, R., editors,

Advances in Neural Information Processing Systems,

volume 28. Curran Associates, Inc.

Freund, Y. and Schapire, R. E. (1997). A Decision-Theoretic

Generalization of On-Line Learning and an Applica-

tion to Boosting. Journal of Computer and System

Sciences, 55(1):119–139.

Hansen, L. and Salamon, P. (Oct./1990). Neural Network

Ensembles. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 12(10):993–1001.

He, X., Zhao, K., and Chu, X. (2021). AutoML: A Survey of

the State-of-the-Art. Knowledge-Based Systems, 212.

Hirsch, V., Reimann, P., and Mitschang, B. (2019). Data-

Driven Fault Diagnosis in End-of-Line Testing of Com-

plex Products. In Proc. of the 6

IEEE International

Conference on Data Science and Advanced Analytics

(DSAA), Washington, D.C., USA. IEEE.

Hirsch, V., Reimann, P., Treder-Tschechlov, D., Schwarz, H.,

and Mitschang, B. (2023). Exploiting Domain Knowl-

edge to Address Class Imbalance and a Heterogeneous

Feature Space in Multi-Class Classiﬁcation. The VLDB

Journal.

Kuncheva, L. I. (2004). Combining Pattern Classiﬁers:

Methods and Algorithms. J. Wiley, Hoboken, NJ.

Lacoste, A., Larochelle, H., Marchand, M., and Laviolette, F.

(2014). Sequential Model-Based Ensemble Optimiza-

tion. In Proc. of the 13

Conference on Uncertainty in

Artiﬁcial Intelligence, UAI’14, page 440–448, Arling-

ton, Virginia, USA. AUAI Press.

LeDell, E. and Poirier, S. (2020). H2O AutoML: Scalable

Automatic Machine Learning. In Proc. of the AutoML

Workshop at ICML.

evesque, J.-C., Gagn

e, C., and Sabourin, R. (2016).

Bayesian Hyperparameter Optimization for Ensemble

Learning. In Ihler, A. and Janzing, D., editors, Proc.

of the 32

Conference on Uncertainty in Artiﬁcial

Intelligence, New York City, NY, USA. AUAI Press.

Moussa, R., Guizzo, G., and Sarro, F. (2022). MEG: Multi-

objective Ensemble Generation for Software Defect

Prediction. In ACM / IEEE International Symposium

on Empirical Software Engineering and Measurement

(ESEM), pages 159–170, Helsinki Finland. ACM.

Polikar, R. (2006). Ensemble Based Systems in Decision

Making. IEEE Circuits and Systems Magazine, 6(3):21–

45.

Schapire, R. E. (1990). The Strength of Weak Learnability.

Machine Learning, 5(2):197–227.

Thornton, C., Hutter, F., Hoos, H. H., and Leyton-Brown, K.

(2013). Auto-WEKA: Combined Selection and Hyper-

parameter Optimization of Classiﬁcation Algorithms.

In Proc. of the 19

ACM SIGKDD International Con-

ference on Knowledge Discovery and Data Mining,

pages 847–855, Chicago, IL, USA. ACM.

Tin Kam Ho (Aug./1998). The Random Subspace Method

for Constructing Decision Forests. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

20(8):832–844.

Villanueva Zacarias, A. G., Weber, C., Reimann, P., and

Mitschang, B. (2021). AssistML: A Concept to Rec-

ommend ML Solutions for Predictive Use Cases. In

Proc. of the 8

International Conference on Data Sci-

ence and Advanced Analytics (DSAA), pages 148–155.

IEEE.

Wegier, W., Koziarski, M., and Wozniak, M. (2022). Multi-

criteria Classiﬁer Ensemble Learning for Imbalanced

Data. IEEE Access, 10:16807–16818.

Wilhelm, Y., Reimann, P., Gauchel, W., Klein, S., and

Mitschang, B. (2023). PUSION- A Generic and Auto-

mated Framework for Decision Fusion. In Proc. of the

International Conference on Data Engineering

(ICDE), Anaheim, CA, USA. IEEE.

Wilhelm, Y., Reimann, P., Gauchel, W., and Mitschang,

B. (2021). Overview on Hybrid Approaches to Fault

Detection and Diagnosis: Combining Data-driven,

Physics-based and Knowledge-based Models. Pro-

cedia CIRP, 99:278–283.

Wilhelm, Y., Schreier, U., Reimann, P., Mitschang, B., and

Ziekow, H. (2020). Data Science Approaches to Qual-

ity Control in Manufacturing: A Review of Problems,

Challenges and Architecture. In Proc. of the 14

Sym-

posium on Service-Oriented Computing (SummerSOC),

Communications in Computer and Information Science

(CCIS), pages 45–65. Springer-Verlag.

Wolpert, D. H. (1992). Stacked Generalization. Neural

Networks, 5(2):241–259.

oller, M.-A. and Huber, M. F. (2021). Benchmark and

Survey of Automated Machine Learning Frameworks.

Journal of Artiﬁcial Intelligence Research, 70:409–

472.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

622