PurGE: Towards Responsible Artiﬁcial Intelligence Through Sustainable

Hyperparameter Optimization

Gauri Vaidya

1,2 a

, Meghana Kshirsagar

1,2 b

and Conor Ryan

1,2 c

Department of Computer Science and Information Systems, University of Limerick, Ireland

Lero the Research Ireland Centre for Software, Ireland

Keywords:

Grammatical Evolution, Hyperparameter Optimization, Machine Learning, Deep Learning, Search Space

Pruning, Energy Efﬁcient Computing.

Abstract:

Hyperparameter optimization (HPO) plays a crucial role in enhancing the performance of machine learning

and deep learning models, as the choice of hyperparameters signiﬁcantly impacts their accuracy, efﬁciency,

and generalization. Despite its importance, HPO remains a computationally intensive process, particularly for

large-scale models and high-dimensional search spaces. This leads to prolonged training times and increased

energy consumption, posing challenges in scalability and sustainability. Consequently, there is a pressing de-

mand for efﬁcient HPO methods that deliver high performance while minimizing resource consumption. This

article introduces PurGE, an explainable search-space pruning algorithm that leverages Grammatical Evolu-

tion to efﬁciently explore hyperparameter conﬁgurations and dynamically prune suboptimal regions of the

search space. By identifying and eliminating low-performing areas early in the optimization process, PurGE

signiﬁcantly reduces the number of required trials, thereby accelerating the hyperparameter optimization pro-

cess. Comprehensive experiments conducted on ﬁve benchmark datasets demonstrate that PurGE achieves

test accuracies that are competitive with or superior to state-of-the-art methods, including random search, grid

search, and Bayesian optimization. Notably, PurGE delivers an average computational speed-up of 47x, reduc-

ing the number of trials by 28% to 35%, and achieving signiﬁcant energy savings, equivalent to approximately

2,384 lbs of CO

e per optimization task. This work highlights the potential of PurGE as a step toward sustain-

able and responsible artiﬁcial intelligence, enabling efﬁcient resource utilization without compromising model

performance or accuracy.

1 INTRODUCTION

Optimizing hyperparameters is essential to maxi-

mize the performance of Machine Learning (ML)

and Deep Learning (DL) models in numerous high-

impact applications, including healthcare, object de-

tection, and image classiﬁcation (Simonyan and Zis-

serman, 2015). Effective tuning can improve model

accuracy, efﬁciency, and robustness, allowing ML

models to better generalize across complex datasets

and real-world environments. Despite this potential,

determining the best hyperparameter conﬁgurations

is often challenging, with manual tuning requiring

considerable expertise, time, and computational re-

sources (Diaz et al., 2017; Yu and Zhu, 2020).

https://orcid.org/0000-0002-9699-522X

https://orcid.org/0000-0002-8182-2465

https://orcid.org/0000-0002-7002-5815

The energy consumption and environmental im-

pact of HPO are becoming increasingly signiﬁcant

concerns. As ML and deep DL models grow in size

and complexity, their training and optimization re-

quire substantial computational resources, leading to

considerable carbon emissions. For example, opti-

mizing a natural language processing pipeline can

produce approximately 78,468 lbs of CO

e (carbon

dioxide equivalent), while neural architecture search

techniques can generate up to 626,155 lbs of emis-

sions (Strubell et al., 2019). These ﬁgures underscore

the urgency of developing more resource-efﬁcient

HPO methods that balance computational demands

with environmental sustainability.

Traditional HPO methods aim to automate hy-

perparameter selection, reducing manual effort and

improving model performance. For example, Ran-

dom Search (RS) (Bergstra and Bengio, 2012) and

Grid Search (GS) are two widely used model-free

622

Vaidya, G., Kshirsagar, M. and Ryan, C.

PurGE: Towards Responsible Artiﬁcial Intelligence Through Sustainable Hyperparameter Optimization.

DOI: 10.5220/0013262100003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 2, pages 622-633

ISBN: 978-989-758-737-5; ISSN: 2184-433X

ML or NN

model

Dataset

Hyperparameters

Optimal

hyperparameter

configurations

Number of trials

(Budget)

Trial #1 Trial #2

Trial #n

…

Hyperparameter

Optimizer

Figure 1: Traditional Hyperparameter Optimization pro-

cess.

methods that blindly sample conﬁgurations from the

search space. While straightforward to implement,

these methods are computationally expensive and of-

ten waste resources by evaluating many suboptimal

conﬁgurations. Such inefﬁciencies make them im-

practical for large, high-dimensional search spaces.

To address these limitations, more advanced

model-based methods, such as Bayesian Optimization

(BO) and Evolutionary Algorithms (EA), employ it-

erative, feedback-driven strategies to guide the search

for promising conﬁgurations (Yang and Shami, 2020).

These techniques balance exploration and exploita-

tion, reducing the number of trials required to identify

near-optimal hyperparameters. Although more efﬁ-

cient than their model-free counterparts, they still in-

cur signiﬁcant computational costs, particularly when

applied to complex models with vast hyperparame-

ter spaces. The intensity of resource in these methods

highlights the need for optimization strategies that are

not only effective but also computationally sustain-

able.

Recent advancements have focused on multiﬁ-

delity optimization strategies, such as Bayesian Op-

timization and Hyperband (BOHB)(Falkner et al.,

2018) and Differential Evolution and Hyperband

(DEHB)(Awad et al., 2021). These hybrid meth-

ods integrate model-based HPO with techniques like

Hyperband, which allocate computational resources

more efﬁciently by prioritizing promising candidates

and terminating evaluations of underperforming con-

ﬁgurations early. Although these approaches improve

efﬁciency, they still require a signiﬁcant number of

evaluations due to the inherent vastness of the hyper-

parameter search space.

One promising avenue for addressing the com-

putational demands of HPO is search space prun-

ing. This technique aims to reduce resource con-

sumption by focusing computational efforts on the

most promising regions of the search space, thereby

minimizing evaluations of suboptimal conﬁgurations.

For example, PriorBand(Mallik et al., 2023) inte-

grates expert knowledge to prioritize high-potential

regions, adaptively eliminating less promising areas.

Similarly, techniques such as Successive Halving(Li

et al., 2016) dynamically allocate resources to con-

ﬁgurations with better intermediate performance, ef-

fectively pruning the search space. Other approaches,

such as Learning Search Spaces for Bayesian Opti-

mization(Perrone et al., 2019) and Hyperparameter

Transfer Learning(Horv

ath et al., 2021), leverage his-

torical HPO data to reﬁne search spaces across related

tasks, thus reducing computational overhead for sim-

ilar models or datasets.

However, these pruning techniques often face

practical limitations. Many rely on extensive prior

data, which may not always be available, or make

task-speciﬁc assumptions that limit their generaliz-

ability. Furthermore, heuristic-based methods or pre-

trained models used to predict promising regions may

struggle to adapt to novel or highly complex archi-

tectures. These challenges emphasize the need for

a robust, adaptive approach to search space pruning

that is domain-agnostic and dynamically responsive

to evolving observations during the optimization pro-

cess.

This paper addresses these challenges by intro-

ducing PurGE, an innovative two-staged framework

driven by Grammatical Evolution (GE). PurGE dy-

namically prunes the hyperparameter search space to

optimize both efﬁciency and performance. In Stage

1, PurGE systematically narrows the search space by

eliminating low-potential regions based on learned

patterns. In Stage 2, it focuses on ﬁne-tuning within

the reﬁned space to identify the optimal hyperparame-

ter conﬁguration. By leveraging GE, PurGE achieves

a balance between exploration and exploitation, re-

ducing computational costs without sacriﬁcing model

accuracy.

The remainder of this paper is organized as fol-

lows: Section 2 provides an overview of HPO, GE,

and search space pruning techniques. Section 3 de-

tails the PurGE framework, while Section 4 outlines

the experimental setup. Section 5 presents compara-

tive results, and Section 6 discusses implications and

directions for future research.

2 BACKGROUND

This section provides an overview of GE and its ap-

plication in tuning the hyperparameters. It discusses

recent advancements in HPO for reducing computa-

tional cost, including model pruning, dataset sam-

pling, and search space pruning.

2.1 Grammatical Evolution

GE (Ryan et al., 1998) is a grammar-based variant of

Genetic Programming that employs binary strings to

PurGE: Towards Responsible Artiﬁcial Intelligence Through Sustainable Hyperparameter Optimization

623

represent candidate solutions. GE can evolve com-

puter programs in any arbitrary language, provided

that the language is deﬁned using Backus-Naur Form

(BNF) grammar. The process begins by mapping the

genotype (binary strings) to the phenotype (computer

program). The genetic operators of crossover and mu-

tation are applied to the population of binary strings,

with evolution progressing across successive genera-

tions.

A key strength of GE lies in its mapping mech-

anism, which offers ﬂexibility to incorporate various

grammatical structures according to speciﬁc require-

ments easily. GE seeks the optimal solution to a

problem by maximizing or minimizing an objective

function, with the grammar determining the set of le-

gal structures that can evolve. Furthermore, domain

knowledge can be integrated through grammar. For

example, the optimization of hyperparameters in Con-

volutional Neural Networks (CNNs) can be expressed

within the same BNF grammar used to evolve CNN

architectures.

2.2 Hyperparameter Optimization

HPO problem involves selecting the optimal set of

hyperparameters to maximize the performance of

a model, given a learning algorithm (inducer) and

dataset. Let A represent the learning algorithm, which

induces a model M based on a set of hyperparameters

h from a search space H . Given a dataset D, we aim

to ﬁnd the hyperparameters h

∗

that maximize the per-

formance f of the model M = A(D;h) induced by A

on D:

∗

= argmax

h∈H

f (M = A (D; h)), (1)

subject to constraint functions that deﬁne the feasible

region of H :

(h) ≤ 0, i = 1, 2, . . . , m,

(h) = 0, j = 1, 2, . . . , n.

In this formulation, the learning algorithm A acts

as the inducer that generates the model M from D and

h, with a

(h) and b

(h) representing inequality and

equality constraints, respectively, to deﬁne the bound-

aries of the hyperparameter search space H .

2.3 Related Works

Various approaches have been proposed to mitigate

the overall computational cost of HPO (Vaidya et al.,

2022; Li et al., 2016; Jamieson and Talwalkar, 2015).

These strategies can broadly be categorized into three

main types: model pruning, dataset sampling, and hy-

perparameter search space pruning.

2.3.1 Model Pruning

Neural Network Model pruning has been extensively

explored since it was introduced as a solution to

over-parameterized networks by Lecun et al. (Le-

Cun et al., 1989). One widely studied technique is

the Connection Sensitivity Score (SNIP) (Lee et al.,

2019), which employs an initialization-based pruning

method. SNIP has demonstrated the ability to prune

networks effectively without signiﬁcantly degrading

model performance.

In addition to SNIP, Lee and Yim (Lee and Yim,

2022) proposed an alternative pruning method known

as Synﬂow. They demonstrated that pruning can be

seamlessly integrated into the HPO process, show-

ing that the depth of neural networks does not signif-

icantly affect hyperparameter conﬁgurations. More-

over, their work highlighted that hyperparameters op-

timized for smaller or pruned models can be suc-

cessfully transferred to larger models within the same

family, such as from ResNet8 to ResNet50.

2.3.2 Dataset Sampling

Another approach to reducing the computational

overhead in HPO is using subsets of datasets, rather

than the full dataset, during the optimization process.

DeCastro-Garc

ıa et al. (DeCastro-Garc

ıa et al., 2019)

conducted a study comparing various data sampling

techniques on image classiﬁcation benchmarks. Their

results showed that this strategy enhanced computa-

tional efﬁciency and maintained comparable perfor-

mance to full dataset training.

Similarly, the HyperEstimator Vaidya et al. (2022)

and HyperGE framework (Vaidya et al., 2023)

demonstrated that ﬁne-tuning CNNs using dataset

subsets could yield results that are competitive with

state-of-the-art methods, further validating the effec-

tiveness of this approach.

2.3.3 Pruning the Hyperparameter Search

Space

Reducing the hyperparameter search space has been

a key focus in HPO research. Hyperband (Li et al.,

2016), a well-known HPO framework, prunes the

search space by employing early stopping of trials.

In this approach, a predeﬁned threshold is set for the

number of trials, and if a trial’s performance does not

improve within this threshold, it is halted. Resources

are then reallocated to more promising trials, allowing

the system to focus its computational budget on the

more fruitful conﬁgurations. This technique is partic-

ularly effective in reducing unnecessary computations

and improving the overall efﬁciency of the search pro-

cess.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

624

Another popular technique for pruning the search

space is the successive-halving (Jamieson and Tal-

walkar, 2015) algorithm. This method runs a set num-

ber of trials within a speciﬁed budget and over sev-

eral iterations, evaluating their performance and dis-

carding the worst-performing half. This process is re-

peated until only one trial remains.Successive halving

efﬁciently narrows the search space by incrementally

focusing on the most promising conﬁgurations, thus

ensuring that computational resources are allocated

effectively.

PriorBand (Mallik et al., 2023), a recent exten-

sion of Hyperband, improves upon the early stop-

ping mechanism by using prior knowledge about the

search space. This approach dynamically adjusts the

budget allocation for each trial based on historical

data or expert knowledge, allowing for more intelli-

gent pruning. This enables the system to allocate re-

sources more effectively based on prior performance,

further enhancing the efﬁciency of the HPO process.

BOHB (Falkner et al., 2018) combines the

strengths of Bayesian optimization and Hyperband to

achieve more efﬁcient search space pruning. BOHB

leverages Bayesian optimization to model the per-

formance of hyperparameter conﬁgurations and iter-

atively narrows the search space. In contrast, Hyper-

band allocates resources to the most promising con-

ﬁgurations. This hybrid approach improves the explo-

ration and exploitation of the search space, making it

more suitable for complex models and large datasets.

Similarly, DEHB (Awad et al., 2021) integrates

differential evolution with Hyperband, providing an

efﬁcient way to handle large-scale HPO problems.

DEHB optimizes hyperparameters using differential

evolution, while utilizing Hyperband for resource al-

location. This combination enables more efﬁcient

search space exploration, especially for challenging

optimization tasks.

Wistuba et al. (Wistuba et al., 2015) proposed an-

other search space pruning strategy that analyzes the

performance of HPO based on related datasets. By

identifying non-promising areas through this analy-

sis, the irrelevant regions of the search space can be

pruned. This approach was tested on machine learn-

ing models with 19 different classiﬁers and showed

promising results in reducing computational costs by

narrowing the search to more relevant areas.

Despite the numerous advancements in reducing

computational costs in HPO, the research area re-

mains highly signiﬁcant due to the complexity of ma-

chine learning models and datasets. As DL mod-

els become more sophisticated and large-scale, the

search for optimal hyperparameters grows exponen-

tially, making efﬁcient HPO essential for practical

applications. Existing methods such as pruning,

early stopping, and dynamic resource allocation have

shown promising results but often suffer from lim-

itations, such as lack of dynamic adaptation to di-

verse model types or dataset variations. Therefore,

the continued development of more adaptive and efﬁ-

cient search space pruning methods remains a crucial

challenge in HPO.

3 PurGE

This article presents PurGE, a two-stage approach to

automatically tuning hyperparameters using GE. The

primary objective of PurGE is to reduce the compu-

tational burden associated with large hyperparameter

search spaces by focusing on high-potential regions.

An overview of the proposed framework is illustrated

in Figure 2.

3.1 Stage 1: Pruning the Search Space

The primary objective of Stage 1 is to systemati-

cally narrow the hyperparameter search space, fo-

cusing computational resources on the most promis-

ing regions. This stage utilizes a grammar-guided

approach, leveraging 60% of the total trial budget

to identify and eliminate low-performing conﬁgura-

tions. A trial is deﬁned as a single evaluation of a hy-

perparameter conﬁguration for a speciﬁc model and

dataset.

The pruning process is driven by two comple-

mentary statistical techniques: the Pearson Correla-

tion Coefﬁcient (r) and Individual Conditional Ex-

pectation (ICE) functions. These techniques ana-

lyze the relationships between hyperparameters and

model performance, enabling the identiﬁcation of

high-potential regions within the search space.

The Pearson Correlation Coefﬁcient, deﬁned in

Equation 2, quantiﬁes the linear relationship between

individual hyperparameters and validation accuracy.

Speciﬁcally, for each hyperparameter conﬁguration h

and its corresponding objective function value o

(e.g.,

validation accuracy), the correlation provides insights

into the inﬂuence of that hyperparameter on model

performance:

r =

∑

i=1

−

h)(o

− o)

∑

i=1

− h)

∑

i=1

− o)

(2)

where n represents the total number of trials, h is the

mean of the hyperparameter values, and o is the mean

of the objective values. Hyperparameters with high

PurGE: Towards Responsible Artiﬁcial Intelligence Through Sustainable Hyperparameter Optimization

625

ML or NN

model

Dataset

Hyperparameters

Better performing individuals

Worse performing individuals

Grammatical

Evolution

Optimizer

Stage I: Data-

driven search

space reduction

Stage II: Refining

hyperparameters

with reduced

search space

Optimal

hyperparameter

configurations

Number of trials

(Budget)

Figure 2: Architecture of PurGE, a two-staged Grammatical Evolution driven approach for automatically evolving hyperpa-

rameters with search space pruning.

correlation to validation accuracy are selected for fur-

ther analysis.

In parallel, ICE functions are employed to extract

high-performance regions for each hyperparameter.

These functions identify the speciﬁc ranges within the

hyperparameter space that yield superior validation

accuracy, reﬁning the focus of the search. Together,

the Pearson correlation and ICE functions guide the

evolutionary process by deﬁning optimal ranges for

each hyperparameter.

For instance, with a total budget of 80 trials, Stage

1 allocates 48 trials (60% of the budget) to explore the

hyperparameter space, represented as a Backus-Naur

Form (BNF) grammar (see Figure 6). From an ini-

tial space of 103,680 potential conﬁgurations, Stage

1 evolves 72 conﬁgurations, reducing the solution set

by approximately 90%.

The pruning process operates in two phases:

1. Correlation Analysis. Compute the correlation

between each hyperparameter and validation ac-

curacy, producing a set H1, which deﬁnes pre-

liminary bounds for promising hyperparameter

ranges.

2. Interdependence Reﬁnement. Identify pairs of

hyperparameters with signiﬁcant mutual correla-

tion, forming set H2. These interdependencies

further reﬁne the bounds in H1, ensuring that

promising conﬁgurations account for interactions

between hyperparameters.

The combination of H1 and H2 results in a fo-

cused and compact search space. Figures 4 and 5 il-

lustrate this process, while Figure 3 depicts a heatmap

of hyperparameter correlations for the EfﬁcientNet

model on the CIFAR10 dataset.

3.2 Stage 2: Optimization Within the

Pruned Space

Following the search space pruning in Stage 1, Stage

2 focuses on reﬁning the search for the optimal hy-

perparameter conﬁguration. By narrowing the scope

to high-potential regions, computational resources are

concentrated on conﬁgurations with the greatest like-

lihood of yielding superior performance.

This stage employs an iterative process, utilizing

Grammatical Evolution to explore and evolve conﬁg-

urations within the pruned space. The compact search

space allows for more intensive evaluation of individ-

ual conﬁgurations, enabling ﬁner-grained optimiza-

tion without incurring the computational overhead of

the original space.

The algorithm dynamically balances exploration

and exploitation within the reduced space, ensuring

that both promising conﬁgurations and less-explored

regions are considered. By iteratively evolving con-

ﬁgurations, Stage 2 converges on the optimal hyper-

parameter set that maximizes validation accuracy.

The complete algorithm, summarized below, com-

bines the pruning strategy of Stage 1 with the focused

optimization of Stage 2, offering a robust framework

for hyperparameter tuning that reduces computational

overhead while maintaining performance.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

626

Algorithm 1: PurGE: Automated Hyperparameter Search

Space Pruning.

Input: Hyperparameter set

H = {h

, h

, . . . , h

}, Objective

function f (VA), Dataset D

Output: Optimal conﬁguration h

∗

that

maximizes validation accuracy (VA)

1 Step 1: Initialize Search Space: Deﬁne full

search space H

from H;

2 Step 2: Stage 1 - Statistical Pruning:;

3 Filter Low-Performing Conﬁgurations

4 H

← {h ∈ H

| f (h) ≥ 0.5};

5 Hyperparameter-Objective Correlations

6 For each h

∈ H, calculate Pearson

correlation r(h

, VA);

7 Deﬁne H1 ← {h

∈ H | |r(h

, VA)| ≥ δ};

8 Inter-Hyperparameter Dependencies

9 For each (h

, h

) ∈ H × H, compute

r(h

, h

);

10 Deﬁne H2 ← {(h

, h

) | |r(h

, h

)| ≥ γ};

11 Extract High-Performance Ranges

12 For each h

∈ H1, set R

top-performing values;

13 For each pair (h

, h

) ∈ H2, set optimal

ranges for both h

and h

;

14 Step 3: Obtain Pruned Search Space:

←

∏

∈H1

∪

∏

)∈H2

× R

;

15 Step 4: Stage 2 - Iterative Optimization:;

16 Evolve Pruned Conﬁgurations

17 Conduct trials over H

;

18 Convergence Check

19 Stop when VA stabilizes or budget B is

reached;

20 Step 5: Output: Return

∗

= argmax

h∈H

f (h);

3.3 Example of PurGE for EfﬁcientNet

on CIFAR-10

The procedure for pruning the search space when tun-

ing the EfﬁcientNet model on the CIFAR-10 dataset

across 48 trials is outlined. The hyperparameters

under consideration include: {batch size, optimizer,

learning rate (lr), momentum, dropout, layers}, with

the objective function being validation accuracy (VA).

The BNF grammar used to generate the hyperpa-

rameter conﬁgurations automatically is shown in Fig-

ure 6a. The possible combinations of each hyperpa-

rameter in Figure 6a lead to a search space of 103,680

conﬁgurations (the product of all combinations), out

of which Stage 1 yields 72 conﬁgurations which are

fed into Stage 2.

Figure 3: Heatmap depicting the correlation between hyper-

parameters and validation accuracy for EfﬁcientNet model

on CIFAR10 dataset.

Initially, hyperparameter conﬁgurations with VA

below 50% are discarded. PurGE then computes

the correlation between the remaining conﬁgura-

tions. Figure 3 illustrates the correlation heatmap

between the hyperparameters and VA, revealing

a negative correlation between {lr, dropout, layers}

and VA. Based on these correlations, three hy-

perparameters are selected for further exploration:

{batch size, optimizer, momentum} (H1). Addition-

ally, hyperparameter pairs with signiﬁcant mu-

tual correlation are identiﬁed, including {optimizer-

momentum, lr-dropout, momentum-layers} (H2),

which are also considered for further reﬁnement.

Figure 4 is a visual representation of the pruning

algorithm, PurGE. The Y-axis in the plots represents

the dependence of VA on the hyperparameter conﬁg-

urations, with higher values indicating better perfor-

mance. The shaded regions in the plots correspond

to areas with high VA for each hyperparameter within

H1. For example, the interpretation of the ICE plot

for optimizer in Figure 4 suggests that higher partial

dependence values for RMSProp indicate a potential

for higher VA, while lower values for Adam and SGD

correspond to lower VA.

PurGE yields the following restricted ranges for

the hyperparameters:

• Optimizer: {Adam, RMSProp}

• Momentum: {0.5, 0.6, 0.7, 0.8}

• Batch Size: 128

PurGE reﬁnes the set H1 based on mutual cor-

relation amongst hyperparameter pairs momentum-

optimizer, dropout-lr, layers-momentum, as shown

in Figure 5 to yield H2. The heatmap for the

pair layers-momentum reveals that momentum values

{0.5, 0.6, 0.7} and layer sizes {96, 128} exhibit high

correlation. As a result, the search space for momen-

tum is narrowed to {0.5, 0.6, 0.7}.

PurGE: Towards Responsible Artiﬁcial Intelligence Through Sustainable Hyperparameter Optimization

627

Optimal

Dependence

Optimal

Hyperparameter

Range

Batch Size

Optimal

Dependence

Optimal

Hyperparameter Range

Momentum

Individual Conditional Expectation

Average Partial Dependence

Optimizers

OptimizersAdam

RMSProp

SGD

Partial Dependence

Figure 4: Explainable PurGE for search space pruning, visualized with Individual Conditional Expectation (ICE) plots. The

plot illustrates the effect of hyperparameters of the EfﬁcientNet model on the CIFAR-10 dataset, with validation accuracy on

the Y-axis. Higher values on the Y-axis indicate better performance. The shaded region represents the Region of Interaction

(ROI), highlighting the area where optimal performance is achieved.

Optimizers

SGD

Adam

RMSProp

Momentum

Dropout

Momentum

Layers

Learning Rate

Figure 5: Heatmaps for hyperparameter pairs {optimizer-momentum, lr-dropout, momentum-layers} for the Efﬁcient-

Net model on the CIFAR-10 dataset.

The ﬁnal pruned search space for each hyperpa-

rameter is as follows:

• Batch size: 128

• Momentum: {0.5, 0.6, 0.7}

• Optimizer: RMSProp

• Learning rate: {1e-05, 5e-05, 0.0001}

• Layers: {96, 128}

• Dropout rate: {0.1, 0.3, 0.4, 0.8}

This results in only 72 unique hyperparameter

combinations (the product of all possible values for

each hyperparameter), representing a signiﬁcant re-

duction of the solution set by approximately 90%.

In Stage 2, the remaining budget of 32 trials is al-

located in order to determine the ﬁnal hyperparameter

conﬁguration from the 72 conﬁgurations identiﬁed in

Stage 1. This conﬁguration is then used to train the

model.

In this way, given a model and dataset, PurGE

automatically yields optimal conﬁgurations, while its

two-stage approach ensures explainability, effectively

eliminating the black-box nature of the process.

4 EXPERIMENTAL SETUP

The goal of the experimental setup is to address the

following research questions:

RQ1: How does pruning the search space and mod-

els impact the performance of hyperparameter opti-

mization?

RQ2: How does pruning the search space and mod-

els affect resource utilization during hyperparame-

ter optimization?

4.1 Datasets Details

We conducted experiments using two standard bench-

marks in image classiﬁcation: CIFAR10 and CI-

FAR100. These datasets are widely used in the

deep learning community for evaluating model per-

formance. CIFAR10 consists of 60,000 RGB images

categorized into ten classes, while CIFAR100 con-

tains the same number of images divided into 100

classes. Each dataset was partitioned into training,

validation, and testing subsets in a 60:20:20 ratio.

In addition to image datasets, we included tabu-

lar datasets to evaluate PurGE on non-image tasks.

These datasets were selected to have no missing val-

ues, ensuring the HPO process was not inﬂuenced by

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

628

<learning_rate> <momentum>

<batch_size> ::= 32 | 64 | 128 | 256

<dropout_rate> ::= 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9

<num_layers> ::= 32 | 64 | 96 | 128 | 160 | 192 | 224 | 256 |

288 | 320 | 352 | 384 | 416 | 448 | 480 | 512

<optimizer> ::= adam | sgd | rmsprop

<learning_rate> ::= 0.00001 | 0.0001 | 0.001 | 0.01 | 0.1 |

0.00005 | 0.0005 | 0.005 | 0.05 | 0.5

<momentum> ::= 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 0.99

(a)

<colsample_bylevel> <subsample>

<learning_rate> ::= 0.025 | 0.05 | 0.1 | 0.2 | 0.3

<gamma> ::= 0 | 0.1 | 0.2 | 0.3 | 0.4 | 1.0 | 1.5 | 2.0

<max_depth> ::= 2 | 3 | 5 | 7 | 10 | 100

<colsample_bylevel> ::= 0.25 | 0.5 | 0.75 | 1.0

<subsample> ::= 0.15 | 0.5 | 0.75 | 1.0

(b)

Figure 6: Search space represented as BNF grammar for (a)

EfﬁcientNet and ResNet; (b) XGBoost.

Table 2: Experimental Settings.

Parameter Value

Runs 5

Search Algorithm GA

Initialisation PI grow

Selection Tournament

Tournament Size 2

Crossover Type Variable one point

Crossover Probability 0.95

Mutation Type Integer ﬂip per codon

Mutation Probability 0.01

Population Size 10

Total Generations Stage 1: 5, Stage 2: 3

imputation techniques. The tabular datasets were also

split into training, validation, and testing subsets in

a 60:20:20 ratio. Table 1 summarizes the details of

the datasets, including the number of instances and

classes.

4.2 Models and Hyperparameters

The experiments involved three models: ResNet, Efﬁ-

cientNet, and XGBoost. The ﬁrst two are CNNs, rep-

resenting different trade-offs between performance

and computational efﬁciency, while XGBoost is a

gradient-boosting algorithm widely used in tabular

data classiﬁcation.

The hyperparameter search space for the models,

including XGBoost and CNNs, is represented in the

BNF grammar, as illustrated in Figure 6. These hy-

perparameters are treated as a discrete search space,

which PurGE explores and compares with traditional

hyperparameter optimization methods.

4.3 GE Parameters

For the PurGE framework, we utilize GE with key pa-

rameters as presented in Table 2: Genetic Algorithm

(GA) for the search process, Tournament selection

with a size of 2, and a variable one-point crossover

with a 95% probability. Mutation occurs with a 1%

probability per codon, and the population size is set

to 10, spanning ﬁve generations in Stage 1 and 3 gen-

erations in Stage 2 for efﬁcient optimization.

4.4 Baseline Methods

To benchmark the performance of PurGE, we used

three popular hyperparameter optimization tech-

niques as baselines using the Optuna (Akiba et al.,

2019) framework:

1. Random Search (RS). A basic search method

where conﬁgurations are randomly sampled from

the deﬁned search space.

2. Grid Search (GS). A more exhaustive approach

that evaluates all possible combinations of hyper-

parameter values within a predeﬁned grid.

3. Tree-structured Parzen Estimator (TPE). A

BO method (Bergstra et al., 2011) that builds a

probabilistic model to estimate the performance

of hyperparameter conﬁgurations, guiding the

search for optimal conﬁgurations more efﬁciently.

These baselines were used to compare PurGE’s accu-

racy, computational efﬁciency, and resource utiliza-

tion performance.

4.5 Training Budget

In image classiﬁcation tasks, a trial is deﬁned as a sin-

gle hyperparameter conﬁguration trained on the entire

dataset using a pruned model as a surrogate for ﬁve

Table 1: Dataset Details

Modality Dataset Abbrv. #classes #instances

Models Employed

Abbrv.

Image

CIFAR0 C10 10 60000 EfﬁcientNet7

ResNet50

RNCIFAR100 C100 100 60000

Tabular

Segment SG 7 2310

XGBoost XB

Waveform WV 3 5000

Bank BK 2 11163

PurGE: Towards Responsible Artiﬁcial Intelligence Through Sustainable Hyperparameter Optimization

629

epochs. A trial corresponds to training an XGBoost

model using one hyperparameter conﬁguration over

the entire dataset for tabular data classiﬁcation. The

ﬁtness score for each trial is based on the model’s per-

formance on the val split.

The budget for each experiment was ﬁxed at 80

trials as suggested in literature (Bergstra et al., 2011).

After completing the 80 trials, the best hyperparam-

eter conﬁguration was selected and used to train the

model for an additional 50 epochs. This conﬁguration

was applied to both PurGE and the baseline models.

For PurGE, the experiments were performed in

two distinct stages:

• Stage 1. A population size of 10 with a generation

count of 5, leading to 50 trials.

• Stage 2. A population size of 10 with a generation

count of 3, resulting in 30 trials.

The number of trials in each stage is calculated as:

Total trials = Pop size × Gen count (3)

5 RESULTS AND DISCUSSIONS

This section presents the results of the experiments

designed to address the research questions (RQ1 and

RQ2). Speciﬁcally, the impact of search space prun-

ing on hyperparameter optimization is evaluated in

terms of performance (accuracy) for RQ1. The effect

of pruning on resource utilization, such as computa-

tional time, is examined for RQ2. The results are an-

alyzed through comparisons with baseline methods,

including RS, GS, and TPE.

Table 3: P-values and Signiﬁcance Interpretation (S = Sig-

niﬁcant, NS = Not Signiﬁcant) against PurGE.

Model Dataset

p-values against PurGE

GS RS TPE

SG S S NS

WV S S S

BK S S S

EN C10 S S NS

RN C100 S S S

PonyGE2 (Fenton et al., 2017), a GE implemen-

tation in Python, was adapted to run all the experi-

ments with the Pytorch framework. All experiments

were conducted simultaneously on Intel Xeon Silver

4215R CPU @ 3.20 GHz with Quadro RTX 8000

GPU.

Dataset

#trials

SG WV BK C10 C100

RS/GS/TPE PurGE

Figure 7: Benchmarking the performance of PurGE across

all datasets with respect to the allocated budget, compared

against baseline methods.

5.1 Impact on Performance

On the tabular datasets, PurGE demonstrated com-

petitive performance compared to baseline methods.

For the SG dataset, PurGE achieved an accuracy of

97.92%, while RS, GS, and TPE reported slightly

higher accuracies of 98.26%, 98.44%, and 98.61%,

respectively. On the WV dataset, PurGE obtained

an accuracy of 87.04%, performing comparably to

RS (87.6%) and GS (88.04%), while outperform-

ing TPE, which reported an accuracy of 84.96%.

For the BK dataset, PurGE achieved an accuracy of

85.34%, which was slightly lower than RS (85.77%),

GS (85.45%), and TPE (85.99%).

For the image datasets, PurGE showed varying

performance across tasks. On the C10 dataset, PurGE

achieved an accuracy of 75.57%, which was com-

parable to TPE (75.57%) but slightly lower than RS

(79.01%). GS, however, reported a signiﬁcantly lower

accuracy of 60.88%, highlighting the inefﬁciency of

grid-based methods in this context. On the C100

dataset, PurGE outperformed the baseline methods,

achieving an accuracy of 17%, while RS, GS, and

TPE reported substantially lower accuracies in the

range of 4-6%.

The relatively weak performance of all methods

on the C100 dataset can be attributed to the limited

number of samples available per class, which presents

a signiﬁcant challenge for hyperparameter optimiza-

tion. Despite this, PurGE’s ability to achieve higher

accuracy on C100 underscores its potential for tack-

ling complex, high-dimensional search spaces more

effectively than traditional methods.

5.1.1 Statistical Signiﬁcance

Mann-Whitney U tests with Beck and Hollern’s cor-

rection were conducted to evaluate the statistical sig-

niﬁcance of the observed performance differences, as

shown in Table 3. The results indicate that PurGE sig-

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

630

(a) XGBoost-Segment (b) XGBoost-Waveform (c) XGBoost-Bank

(d) EfﬁcientNet-CIFAR10 (e) ResNet-CIFAR100

Figure 8: Violin plots depicting the performance of various optimizers during the exploratory search phase.

niﬁcantly outperforms baseline methods (GS, RS, and

TPE) in many cases. Speciﬁcally, PurGE consistently

achieves statistically signiﬁcant improvements over

GS and RS across most datasets. While the differ-

ences between PurGE and TPE vary across datasets,

signiﬁcant improvements are observed for SG, WV,

and BK, whereas the differences for C100 are less

pronounced.

Figure 8 shows violin plots of validation accuracy

across multiple runs for each method. An interest-

ing observation is the low standard deviation of vali-

dation accuracy during Stage 2 of PurGE. This indi-

cates that PurGE focuses on a high-yielding region of

the hyperparameter space, where multiple conﬁgura-

tions yield near-optimal or optimal performance. The

narrow distribution suggests that PurGE consistently

converges on effective conﬁgurations, enhancing the

reliability of the optimization process. This observa-

tion highlights the potential for further research into

decision-making frameworks that can select among

multiple optimal conﬁgurations based on factors such

as hardware efﬁciency or energy consumption.

5.2 Impact on Resource Utilization

PurGE demonstrates substantial improvements in re-

source utilization by reducing the number of trials

required for hyperparameter optimization compared

to baseline methods. As shown in Figure 7, PurGE

achieves a consistent reduction in the number of trials

across all datasets. For example, on tabular datasets

such as SG and WV, PurGE reduces the required

trials by approximately 20-25%, while maintaining

competitive performance. On image datasets, such

as C10, PurGE achieves a similar reduction in tri-

als, with a more signiﬁcant improvement observed on

C100, where baseline methods require substantially

more trials to achieve lower accuracy.

The reduction in trials directly impacts computa-

tional efﬁciency. For instance, on datasets like SG and

WV, the savings in trials translate to a reduction of

computational effort by approximately 20-30%. On

more complex datasets like C100, where training and

evaluation are resource-intensive, PurGE completes

the optimization process with fewer trials, reducing

energy consumption while still achieving competitive

accuracy.

Figure 9 highlights the speed-ups achieved by

PurGE. On tabular datasets, PurGE achieves notable

speed-ups, such as 319.91x on SG and 47.39x on

WV, which are attributed to the early pruning of low-

performing conﬁgurations. For image datasets, while

the speed-ups are more modest (2.06x on ResNet-

C100 and 2.23x on EfﬁcientNet-C10), they are sig-

niﬁcant given the computational complexity of these

tasks.

PurGE’s approach of dynamically pruning the hy-

perparameter space allows it to concentrate resources

on promising regions, reducing unnecessary evalu-

ations. This reduction in computational overhead,

combined with the consistent performance across

datasets, highlights the utility of PurGE for resource-

conscious hyperparameter optimization tasks.

5.3 Energy Savings with PurGE

The energy savings provided by PurGE compared to

traditional methods regarding reduced carbon emis-

PurGE: Towards Responsible Artiﬁcial Intelligence Through Sustainable Hyperparameter Optimization

631

sions are estimated by considering the average speed-

up and reduction in the number of trials. It has been

reported that optimizing an NLP pipeline generates

approximately 78,468 lbs of CO

e (Strubell et al.,

2019).

An average speed-up of 47x is assumed for PurGE,

meaning that the optimization task can be completed

in 1/47th of the time required by traditional methods,

assuming energy consumption is proportional to time

spent. Additionally, PurGE is reported to reduce the

number of trials by approximately 28% to 35% on av-

erage. Since energy consumption per trial is assumed

to be constant, this trial reduction further lowers the

computational load and energy consumption. If a 47x

speed-up and a 30% reduction in trials are achieved,

the total Energy Reduction Factor (ERF) is approxi-

mated as:

ERF = 47 × (1 − 0.30) = 47 × 0.70 = 32.9 (4)

Thus, the Energy Savings (ES) in terms of CO

emissions can be calculated as:

Savings =

78, 468 lbs of CO e

32.9

≈ 2, 384.3lbs of CO

(5)

Based on the NLP example, PurGE could save ap-

proximately 2,384 lbs of CO

e per optimization task

compared to traditional methods.

In summary, the ﬁndings show that PurGE suc-

cessfully addresses both research questions by im-

proving the efﬁciency and performance of hyperpa-

rameter optimization through systematic pruning of

the search space. The results conﬁrm that pruning

boosts model performance and signiﬁcantly reduces

resource use, making PurGE a practical solution for

resource-efﬁcient hyperparameter optimization.

6 CONCLUSIONS

This article introduces PurGE, a two-stage approach

for automatically tuning hyperparameters of ML and

DL models through search space pruning driven by

GE. PurGE achieves test accuracies that are compet-

itive with or superior to state-of-the-art methods, in-

cluding RS, GS, and BO, across all tested datasets.

Notably, PurGE delivers an average computational

speed-up of 47x and reduces the number of trials by

28% to 35%. Furthermore, it results in signiﬁcant en-

ergy savings, equivalent to approximately 2,384 lbs

of CO

e per optimization task. These ﬁndings high-

light PurGE’s ability to enhance both model perfor-

mance and resource utilization, positioning it as an

efﬁcient and environmentally responsible approach to

hyperparameter optimization. Future work will in-

volve benchmarking PurGE across a broader set of

domains to further assess its scalability and applica-

bility.

Figure 9: Speedup with PurGE against RS, BS and TPE.

ACKNOWLEDGEMENTS

This publication has emanated from research con-

ducted with the ﬁnancial support of Taighde

Eireann

– Research Ireland under Grant No. 18/CRT/6223.

REFERENCES

Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M.

(2019). Optuna: A next-generation hyperparameter

optimization framework.

Awad, N., Mallik, N., and Hutter, F. (2021). Dehb: Evo-

lutionary hyperband for scalable, robust and efﬁcient

hyperparameter optimization.

Bergstra, J., Bardenet, R., Bengio, Y., and K

egl, B. (2011).

Algorithms for hyper-parameter optimization. In Pro-

ceedings of the 24th International Conference on Neu-

ral Information Processing Systems, NIPS’11, page

2546–2554, Red Hook, NY, USA. Curran Associates

Inc.

Bergstra, J. and Bengio, Y. (2012). Random search for

hyper-parameter optimization. J. Mach. Learn. Res.,

13(null):281–305.

DeCastro-Garc

ıa, N., Casta

neda,

A. L. M., Garc

ıa, D. E.,

and Carriegos, M. V. (2019). Effect of the sampling

of a dataset in the hyperparameter optimization phase

over the efﬁciency of a machine learning algorithm.

Complex., 2019:6278908:1–6278908:16.

Diaz, G., Fokoue, A., Nannicini, G., and Samulowitz, H.

(2017). An effective algorithm for hyperparameter op-

timization of neural networks.

Falkner, S., Klein, A., and Hutter, F. (2018). Bohb: Robust

and efﬁcient hyperparameter optimization at scale.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

632

Fenton, M., McDermott, J., Fagan, D., Forstenlechner, S.,

Hemberg, E., and O’Neill, M. (2017). Ponyge2:

grammatical evolution in python. In Proceedings of

the Genetic and Evolutionary Computation Confer-

ence Companion, GECCO ’17. ACM.

Horv

ath, S., Klein, A., Richt

arik, P., and Archambeau, C.

(2021). Hyperparameter transfer learning with adap-

tive complexity.

Jamieson, K. G. and Talwalkar, A. (2015). Non-stochastic

best arm identiﬁcation and hyperparameter optimiza-

tion. CoRR, abs/1502.07943.

LeCun, Y., Denker, J., and Solla, S. (1989). Optimal brain

damage. In Touretzky, D., editor, Advances in Neural

Information Processing Systems, volume 2. Morgan-

Kaufmann.

Lee, K. and Yim, J. (2022). Hyperparameter optimization

with neural network pruning.

Lee, N., Ajanthan, T., and Torr, P. (2019). SNIP: Single-

shot pruning based on connecion sensitivity. In Inter-

national Conference on Learning Representations.

Li, L., Jamieson, K. G., DeSalvo, G., Rostamizadeh, A., and

Talwalkar, A. (2016). Efﬁcient hyperparameter opti-

mization and inﬁnitely many armed bandits. CoRR,

abs/1603.06560.

Mallik, N., Bergman, E., Hvarfner, C., Stoll, D., Janowski,

M., Lindauer, M., Nardi, L., and Hutter, F. (2023).

Priorband: Practical hyperparameter optimization in

the age of deep learning.

Perrone, V., Shen, H., Seeger, M., Archambeau, C., and Je-

natton, R. (2019). Learning search spaces for bayesian

optimization: Another view of hyperparameter trans-

fer learning.

Ryan, C., Collins, J., and Neill, M. O. (1998). Grammati-

cal evolution: Evolving programs for an arbitrary lan-

guage. In Banzhaf, W., Poli, R., Schoenauer, M., and

Fogarty, T. C., editors, Genetic Programming, pages

83–96, Berlin, Heidelberg. Springer Berlin Heidel-

berg.

Simonyan, K. and Zisserman, A. (2015). Very deep con-

volutional networks for large-scale image recognition.

In International Conference on Learning Representa-

tions.

Strubell, E., Ganesh, A., and McCallum, A. (2019). Energy

and policy considerations for deep learning in NLP.

Vaidya, G., Ilg, L., Kshirsagar, M., Naredo, E., and Ryan,

C. (2022). Hyperestimator: Evolving computationally

efﬁcient cnn models with grammatical evolution. In

Proceedings of the 19th International Conference on

Smart Business Technologies. SCITEPRESS - Science

and Technology Publications.

Vaidya, G., Kshirsagar, M., and Ryan, C. (2023). Gram-

matical evolution-driven algorithm for efﬁcient and

automatic hyperparameter optimisation of neural net-

works. Algorithms, 16(7).

Wistuba, M., Schilling, N., and Schmidt-Thieme, L. (2015).

Hyperparameter search space pruning – a new compo-

nent for sequential model-based hyperparameter op-

timization. In Appice, A., Rodrigues, P. P., San-

tos Costa, V., Gama, J., Jorge, A., and Soares, C., ed-

itors, Machine Learning and Knowledge Discovery in

Databases, pages 104–119, Cham. Springer Interna-

tional Publishing.

Yang, L. and Shami, A. (2020). On hyperparameter opti-

mization of machine learning algorithms: Theory and

practice. Neurocomputing, 415:295–316.

Yu, T. and Zhu, H. (2020). Hyper-parameter optimization:

A review of algorithms and applications.

PurGE: Towards Responsible Artiﬁcial Intelligence Through Sustainable Hyperparameter Optimization

633