Text-Based Feature-Free Automatic Algorithm Selection

Amanda Salinas-Pinto

1 a

, Bryan Alvarado-Ulloa

1 b

, Dorit Hochbaum

2 c

Mat

ıas Francia-Carrami

nana

1 d

, Ricardo

Nanculef

1 e

and Roberto As

ın-Ach

1 f

Universidad T

ecnica Federico Santa Mar

ıa, Chile

University of California, Berkeley, U.S.A.

{amanda.salinas, bryan.alvarado}@usm.cl, dhochbaum@berkeley.edu,

Keywords:

Algorithm Selection, Deep Learning, SAT, CSP.

Abstract:

Automatic Algorithm Selection involves predicting which solver, among a portfolio, will perform best for a

given problem instance. Traditionally, the design of algorithm selectors has relied on domain-speciﬁc fea-

tures crafted by experts. However, an alternative approach involves designing selectors that do not depend on

domain-speciﬁc features, but receive a raw representation of the problem’s instances and automatically learn

the characteristics of that particular problem using Deep Learning techniques. Previously, such raw represen-

tation was a ﬁxed-sized image, generated from the input text ﬁle specifying the instance, which was fed to

a Convolutional Neural Network. Here we show that a better approach is to use text-based Deep Learning

models that are fed directly with the input text ﬁles specifying the instances. Our approach improves on the

image-based feature-free models by a signiﬁcant margin and furthermore matches traditional Machine Learn-

ing models based on basic domain-speciﬁc features, known to be among the most informative features.

1 INTRODUCTION

Automatic Algorithm Selection (AAS) aims to pre-

dict the optimal solver for a given problem instance

from a portfolio. Traditionally, this process relies on

domain-speciﬁc features crafted by experts, which,

while effective, limits scalability and transferability

due to the need for extensive domain knowledge and

labor-intensive analysis.

Recent advances in Deep Learning (DL) (Vaswani

et al., 2017), where models learn from raw data, offer

a compelling alternative to feature-based models. Pre-

vious work (Loreggia et al., 2016) in AAS has trans-

formed raw data into ﬁxed-sized images processed by

Convolutional Neural Networks (CNNs), but this still

requires image-processing techniques.

Our study introduces a novel text-based deep

learning approach that directly processes raw tex-

tual ﬁles specifying problem instances, simplifying

https://orcid.org/0009-0007-2216-4371

https://orcid.org/0009-0008-7468-5723

https://orcid.org/0000-0002-2498-0512

https://orcid.org/0009-0000-8680-7347

https://orcid.org/0000-0003-3374-0198

https://orcid.org/0000-0002-1820-9019

the computational pipeline, and enhancing represen-

tation.

In this paper, we present our text-based deep

learning framework for AAS and evaluate its perfor-

mance against traditional image-based and feature-

based models. Our analysis shows that text-based

models are superior in capturing complex informa-

tion in problem descriptions, leading to more effective

and adaptable algorithm selection strategies as com-

pared to image-based methods. Nevertheless, there is

still a gap in performance as compared to specialized

feature-base models, and closing this gap will still be

the base of future research in the area of feature-free

algorithm selection.

Our contributions include demonstrating the feasi-

bility of text-based deep learning for AAS and provid-

ing a thorough analysis of how these techniques out-

perform existing feature-free methods. We establish

new benchmarks, advancing the ﬁeld of feature-free

AAS, and offer insights into the performance gap be-

tween feature-free and feature-based methodologies.

The subsequent sections review relevant literature,

deﬁne key terms and criteria, outline our text-based

AAS framework, present empirical assessments, and

conclude with ﬁndings and future research directions.

Salinas-Pinto, A., Alvarado-Ulloa, B., Hochbaum, D., Francia-Carramiñana, M., Ñanculef, R. and Asín-Achá, R.

Text-Based Feature-Free Automatic Algorithm Selection.

DOI: 10.5220/0012913700003838

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Joint Conference on Knowledge Discover y, Knowledge Engineering and Knowledge Management (IC3K 2024) - Volume 1: KDIR, pages 267-274

ISBN: 978-989-758-716-0; ISSN: 2184-3228

267

2 RELATED WORK

2.1 Algorithm Selection Systems

Automatic Algorithm Selection (AAS), introduced by

(Rice, 1976), optimizes computational processes by

selecting the most suitable algorithm for a given prob-

lem instance. This approach is rooted in the “No Free

Lunch” theorem (Adam et al., 2019), which posits

that no single algorithm universally excels across all

scenarios.

AAS typically employs a training phase to asso-

ciate problem instance features with algorithm per-

formance. The trained model then evaluates new in-

stances to predict the most effective algorithm. Re-

cent literature has explored AAS in various domains,

including timetabling (Seiler et al., 2020; Bossek

and Neumann, 2022), SAT (Xu et al., 2008), and

Multi-Agent Path-Finding (Bulitko, 2016; Ach

a et al.,

2022).

Kerschke et al. (Kerschke et al., 2019) provide a

comprehensive survey of algorithm selection and con-

ﬁguration, introducing a taxonomy that distinguishes

between “per-set” and “per-instance” methods. Our

focus is on “per-instance” AAS, which considers each

problem instance individually.

While many AAS systems employ complex strate-

gies, such as the hybrid methodology of semi-static

solver schedules (3S) (Kadioglu et al., 2011) or Aut-

ofolio (Lindauer et al., 2015), our study concentrates

on straightforward approaches. We assume an ML

model receives an instance characterization and se-

lects a single solver to execute until completion or

time limit.

Most AAS research relies on domain-speciﬁc,

expert-crafted features. However, an alternative ap-

proach involves developing ML methods that uti-

lize raw/generic instance representations, allowing

the learning process to identify relevant features au-

tonomously. This approach was ﬁrst explored by

(Loreggia et al., 2016).

2.2 Deep Learning for Algorithm

Portfolios

(Loreggia et al., 2016) introduced a groundbreaking

approach to Automatic Algorithm Selection (AAS)

based on deep learning. Unlike traditional AAS tech-

niques that use hand-crafted, domain-speciﬁc fea-

tures, this method leverages generic raw data — the

text ﬁle contents describing the problems.

The process transforms text ﬁles into a ﬁxed-size

image format suitable for Convolutional Neural Net-

work (CNN) analysis:

1. Convert textual input into a vector of ASCII

codes.

2. Reorganize the vector into a

√

N ×

√

N matrix,

where N is the total character count.

3. Resize the resulting “ASCII image” to a uniform

scale.

The CNN can be trained as a multi-class classiﬁer,

multi-label classiﬁer, or regressor. Evaluated using

SAT and Constraint Satisfaction Problems (CSP) in-

stances, this method showed potential to outperform

the Single Best Solver (see Subsection 2.3).

Despite its successes, this approach may not per-

form as well as methods utilizing domain-speciﬁc fea-

tures.

2.3 Performance Metric for

Meta-Solvers

We deﬁne an algorithm-selection-based meta-solver

as a system comprising a portfolio of solvers. It ana-

lyzes an input instance and runs one or more solvers to

resolve it. A solver solves an instance if it can decide

its satisﬁability (for decision problems) or ﬁnd and

certify the optimal solution (for optimization prob-

lems) within a time limit.

All our meta-solvers here operate uniformly:

1. Accept an input instance.

2. Use an ML model to predict the most efﬁcient

solver, identify capable solvers, or estimate solv-

ing times.

3. Select and run one solver based on these predic-

tions.

We evaluate the meta-solver’s performance using two

baselines:

Single Best Solver (SBS): The solver performing

best on average across all training instances.

Virtual Best Solver (VBS): A hypothetical meta-

solver always choosing the most effective algo-

rithm for each instance.

Performance is measured using the PAR10 metric

(Lindauer et al., 2019). For a solver s on instance i:

(i) =

(

(i) if t

(i) ≤ τ

10τ otherwise

where τ is the timeout constant and t

(i) is the solving

time.

We use the performance measure ˆm (Lindauer

et al., 2019) to evaluate meta-solvers:

ˆm

−m

V BS

SBS

−m

V BS

(1)

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

268

Values of ˆm

close to 0 indicate performance near

VBS, while values close to 1 suggest performance

similar to SBS. Values above 1 indicate the meta-

solver is less effective than SBS.

3 TEXT-BASED FEATURE-FREE

AAS

We follow (Loreggia et al., 2016)’s approach, work-

ing directly with raw problem instance representa-

tions. Our Deep Learning models are fed with raw

text representations, rather than pre-processed image-

like inputs.

3.1 Architecture Overview

Figure 1: Overall architecture of our text-based Deep

Learning Model for AAS.

Our architecture (Figure 1) is a modiﬁed Transformer

neural network, using only the encoder component

similar to the one of (Vaswani et al., 2017). The input

text x is truncated, tokenized, and converted into em-

beddings x = hx

, x

, . . . , x

i. The encoder’s outputs

z = hz

, z

, . . . , z

i are fused into a global descriptor

z using Global Max Pooling (Christlein et al., 2019),

then mapped to a prediction through a fully connected

output layer.

3.2 Tokenizers and Embeddings

We explore two tokenization approaches:

Pre-trained Tokenization: Using SentencePiece

(Kudo and Richardson, 2018).

Trained Tokenization: Using Charformer (Tay

et al., 2021).

3.3 Encoder Architecture

Our encoder computes M = 4 hierarchical transfor-

mations Z

(k)

= EBlock(Z

(k−1)

). Each block includes

a self-attention mechanism and a position-wise feed-

forward net. The self-attention mechanism computes:

P = SelfAttention(Z) = softmax



√



Z, (2)

where Q, K, and Z are learnable matrices that project

(k−1)

into a d

-dimensional latent space. We use

multi-head attention with H = 4 heads. The ﬁnal

block’s output Z

(k)

is obtained after applying a resid-

ual connection (He et al., 2016) and layer normaliza-

tion (Ba et al., 2016) around each sublayer. We did

not use positional embeddings.

3.4 Problem Framing Strategies

We explore three strategies:

Multi-Class Classiﬁcation: Identiﬁes the most suit-

able solver and the meta-solver runs it. The output

layer is a softmax function, and the loss function

is categorical cross-entropy.

Multi-Label Classiﬁcation: Identiﬁes all solvers ca-

pable of solving the instance within the deﬁned

time limit τ. Each solver corresponds to an ele-

ment in the output vector, with a sigmoid func-

tion applied element-wise. The loss is measured

through the Hamming loss function. Since the

probabilities here are not complementary, they de-

termine the likelihood that a solver will be ﬁt for

the problem instance. The meta-solver executes

the solver that exhibits the highest likelihood.

Regression: Estimates normalized log delta runtime

for each solver. The mean squared error function

serves as the loss function, and the output layer is

linear. The meta-solver runs the solver predicted

to have the shortest runtime.

s,i

= log(1 + m

(i) −min

s∈S

(i)))

s,i

−mean(r

s,i

)

std(r

s,i

)

Text-Based Feature-Free Automatic Algorithm Selection

269

4 EXPERIMENTAL SETUP AND

BASELINES

4.1 Libraries and Hardware

We implemented our Deep Learning models in

Python 3.10, using PyTorch 2.0.0. For the text-

based models, we used the Charformer tokenizer

0.0.4

, and SentencePiece 0.2.0. For the image scal-

ing, needed by the image-based models, we used

OpenCV 4.7.0.72. The feature-based models were

implemented using scikit-learn 1.4.2.

The experiments were carried out on a machine

with an Intel Xeon Skylake (2x16 @2.1 GHz) pro-

cessor and an Nvidia A40 GPU. The machine runs

Scientiﬁc Linux 7 and has 48GB of RAM.

4.2 Benchmark Sets

To evaluate our approach, ﬁrst, we aimed to use the

same benchmark sets used in (Loreggia et al., 2016).

However, the precise sets of instances and the parti-

tions used in that study were not disclosed publicly

and could not be provided by the authors when asked

in an internal communication. We then searched for

similar-nature benchmarks for which the instance ﬁles

and hand-crafted features used in the AAS commu-

nity were available. Unfortunately, we could not

ﬁnd meaningful benchmark sets similar to the ones

named “SAT Random” and “SAT Crafted” in (Loreg-

gia et al., 2016). However, we were able to collect

the most interesting benchmark sets reported in such

study, “SAT Industrial” and “CSP”. These benchmark

sets are the more interesting because of their diver-

sity in size, complexity, and complementarity of the

solvers.

SAT Industrial. This benchmark includes in-

stances used in the SAT competition between 2003

and 2016 in the industrial/application categories. The

performance of the solvers in these competitions was

retrieved from ASLib, speciﬁcally from the SAT03-16-

INDU-ALGO scenario. We removed 269 instances

that could not be solved by any solver in the port-

folio within the given τ time limit. After ﬁltering,

the dataset contains 1, 730 instances and 10 different

solvers.

as implemented in https://github.com/lucidrains/char

former-pytorch

CSP. We used the benchmark from the 2009 CSP

competition

. The performance data for each

solver was obtained from the PROTEUS-2014 sce-

nario (Hurley et al., 2014) in ASlib. We ﬁltered the

instances by removing the “easy” instances that could

be solved by all solvers within the time-limit equiv-

alent to compute the instance’s features, in addition

to removing the “difﬁcult” instances that were not

solved by any of the solvers within the given time

limit τ. This resulted in a total of 1, 613 instances and

22 different solvers.

4.3 Data Partitioning and Evaluation

Criteria

We split each benchmark into train and test datasets.

For the train dataset we used 80% of the total in-

stances, and the remaining 20% is reserved as the test

dataset. The training dataset is used for training and

model selection, while the test dataset is used to com-

pare the in-production performance of the best text-

based, image-based, and feature-based approaches.

To select the best model for each approach, we

performed 10-fold cross-validation with the training

set. We compared the models based on the ˆm metric

associated with a meta-solver using them. We then

selected the best model based on the mean ˆm metric

across the different folds.

4.4 Feature-Based Models

To offer a comprehensive view of our study on

feature-free models, we also implement and evalu-

ate feature-based models employing both state-of-the-

art crafted features and basic informative features,

using Random Forest models. The comparison of

feature-free models with these feature-based counter-

parts serves a dual purpose: ﬁrstly, to analyze and

document the performance disparities between these

two paradigms, and secondly, to provide the research

community with a benchmark on the effectiveness of

applying state-of-the-art crafted features in a straight-

forward manner on ASLib scenarios that are widely

used.

Basic Features: Two basic features extracted from

the text describing a problem instance are: the

number of variables and the number of con-

straints. The motivation for these two features is

that the instance size usually appears among the

most simple and informative ones. We expect that

https://www.cril.univ-artois.fr/CSC09/results/global

bybench.php?idev=30&idcat=38&idSubCat=60

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

270

training ML models on these two features estab-

lishes a baseline for the other methods.

We note here that, for the CSP benchmark,

the number of variables and constraints in the

text ﬁle differ from the direct nvariables and

direct nclauses features of the ASLib scenario,

since the former seem to be computed after

grounding the CSP formula to SAT.

Full Set of Features: These features represent the

state-of-the-art in domain-speciﬁc algorithm se-

lection, as provided in the corresponding scenar-

ios of ASLib. All these 483 SAT features were

introduced in (Xu et al., 2008), and constitute the

by-default standard for AAS in SAT.

For CSP, ASLib provides the 198 domain-speciﬁc

features as proposed in (Hurley et al., 2014). We

note here that our evaluation does not consider the

time needed to compute all these features, even

though some of them are expensive to compute

and others are captured during runtime, from a

reference solver.

Although these features and scenarios are com-

monly referenced in the literature, we were unable

to ﬁnd reported performance values ( ˆm) for meta-

solvers that utilize these features directly. Con-

sequently, our aim is to document these values to

serve as a reference for future research.

4.5 Image-Based Models

We implemented the approach presented by (Loreg-

gia et al., 2016) carefully following the experimen-

tal setup described there. For the training, we used

Stochastic Greedy Descent (SGD) with Nesterov mo-

mentum of 0.9 and a learning rate of 0.03. As ﬁrst

layer, we included a batch normalization layer, as pro-

posed in (Ioffe and Szegedy, 2015). The output layer

changes depending on the learning task, as mentioned

in Section 3.4. We set a training batch size of 128 and

100 epochs.

4.6 Text-Based Models

Due to limitations on the hardware needed to train our

model on arbitrary-size instances, we truncated the

size of the instances to 10, 000 characters. To avoid

introducing biases into the model, we removed from

the text ﬁles any comments or other kind of meta-

information like the folder name where the instance is

located, or the name of the generator of the instance.

In a preliminary evaluation, we noted that these meta-

information ﬁelds may unfairly help the text-based

models and decided not to consider this information.

For training our text-based models, we used

AdamW optimizer (Loshchilov and Hutter, 2017)

with a learning rate of 10

−5

. We set a batch size of 8

samples and an embedding size d of 128. The training

was set to take 100 epochs.

Since the sequence length produced by Sentence-

Piece can vary among instances while our encoder

accepts a ﬁxed-length sequence, we computed the

median length of SentencePiece’s output, truncating

longer sequences and padding shorter ones. The vo-

cabulary size v for SentencePiece, was set to 1024.

Charformer, which operates at the character level, had

a vocabulary size of 257 (256 ASCII values plus one

token reserved for padding). We set the max block size

and the downsample factor to their default values (4).

Additionally, we employed the block attention scores

proposed in Section 2.1.4 of (Tay et al., 2021) to form

latent subwords.

5 RESULTS

5.1 Feature-Based Validation Results

Table 1: ˆm metric values across 10-fold validation sets

for different handcrafted-features-based meta-solvers for

CSP and SAT Industrial benchmark sets. Here, HF =

Handcrafted-based, F=Full set of features, B=Basic set of

features, ML=Multi-label model, Reg= Regression model,

MC=Multi-class model.

Model CSP SAT Industrial

HF-F-ML 0.409 ± 0.064 0.680 ± 0.312

HF-F-Reg 0.416 ± 0.087 0.640 ± 0.291

HF-F-MC 0.546 ± 0.066 0.676 ± 0.228

HF-B-ML 0.638 ± 0.068 1.054 ± 0.361

HF-B-Reg 0.557 ± 0.066 0.939 ± 0.365

HF-B-MC 0.582 ± 0.075 1.22 ± 0.387

Table 1 shows the average and standard deviation

of the ˆm values computed across 10-fold cross-

validation subsets for six feature-based meta-solvers.

The ﬁrst three meta-solvers are based on the full set

of features provided in the ASLib, while the last three

meta-solvers only use the two basic features related

to the size of the instances. For a fair compari-

son with our feature-free model, these feature-based

meta-solvers can cast AAS as a multi-label task (ML),

a regression task (Reg), or a multi-class (ML) prob-

lem. As can be seen, for the CSP benchmark, the most

successful meta-solver using the full set of features

is the one based on multi-label classiﬁcation (ML).

In contrast, for the SAT Industrial benchmark, the

best meta-solver, using the full set of features, is the

one based on regression (Reg). Nevertheless, we note

Text-Based Feature-Free Automatic Algorithm Selection

271

that even for these state-of-the-art crafted features, the

meta-solvers are quite sensible to the test set in SAT,

as is evident from the considerable standard deviation.

Regarding the meta-solvers using only the two

basic features, the meta-solvers based on regression

show better performance in both benchmark sets. We

note that, on average, only using these two basic fea-

tures allows the meta-solvers to outperform the SBS.

For CSP, we found a considerable margin of advan-

tage, and for SAT Industrial, a smaller margin.

We report the performance of these feature-based

solvers in the test set in Subsection 5.3. All the results

reported here are consistent with the literature.

5.2 Image-Based Validation Results

Table 2: ˆm metric values across 10-fold validation sets for

different image-based meta-solvers for CSP and SAT Indus-

trial benchmark sets. Here, Im = Image-based, ML=Multi-

label model, Reg= Regression model, MC=Multi-class

model.

Model CSP SAT Industrial

Im-ML 0.640 ± 0.088 1.25 ± 0.407

Im-Reg 0.609 ± 0.104 1.14 ± 0.346

Im-MC 0.898 ± 0.109 1.66 ± 0.527

Table 2 shows the statistics of the ˆm values com-

puted across 10-fold cross-validation subsets for three

image-based meta-solvers. For a fair comparison with

our text-based model, we trained image-based meta-

solvers based on multi-label, regression, and multi-

class formulations. The results in Table 2 demonstrate

that, although the regression approach was not con-

sidered in (Loreggia et al., 2016), the most successful

image-based meta-solver is the one based on regres-

sion for both benchmark sets.

The meta-solver for CSP outperforms CSP’s Sin-

gle Best Solver by a signiﬁcant margin while main-

taining a considerable gap with the Virtual Best

Solver for CSP. These results are in line with the ones

reported in (Loreggia et al., 2016). However, an ex-

act match between our image-based results and those

in (Loreggia et al., 2016) is virtually impossible since

the training/validation/test differ.

Image-based SAT meta-solvers cannot outperform

the Single Best Solver. This result diverges from

the results of (Loreggia et al., 2016), which reported

an image-based meta-solver that outperforms SBS on

SAT. This discrepancy may happen due to differences

in the speciﬁc SAT industrial benchmark set used or

differences in the training/test partitions. However,

we also observe that the performance of the SAT

image-based meta-solver varies signiﬁcantly depend-

ing on the training and validation set (standard devia-

tion of 0.346 among cross-validation folds).

5.3 Text-Based Validation Results

Table 3: ˆm metric values across 10-fold validation sets for

different text-based ML models for CSP and SAT Indus-

trial benchmark sets. Here, Txt = Text-based, Cha=Trained

tokenizer Charformer, Sen= Pre-trained tokenizer Senten-

piece, ML=Multi-label model, Reg= Regression model,

MC=Multi-class model.

Model CSP SAT Industrial

Txt-Cha-ML 0.488 ± 0.047 0.952 ± 0.281

Txt-Cha-Reg 0.469 ± 0.050 0.889 ± 0.303

Txt-Cha-MC 0.581 ± 0.076 1.312 ± 0.354

Txt-Sen-ML 0.482 ± 0.082 1.078 ± 0.252

Txt-Sen-Reg 0.536 ± 0.100 1.119 ± 0.448

Txt-Sen-MC 0.608 ± 0.120 1.470 ± 0.319

Table 3 shows the average and standard deviation of

the ˆm values for our text-based meta-solvers com-

puted by 10-fold cross-validation. The ﬁrst three

meta-solvers are text-based models jointly trained

with the tokenizer (Charformer), while the last three

meta-solvers use the pre-trained tokenizer (Sentence-

Piece). As can be seen, the most successful meta-

solver is the one that uses a regression model jointly

trained with the tokenizer.

The CSP meta-solver signiﬁcantly improves the

performance of the SBS for this domain. With an av-

erage ˆm value equal to 0.469 and little standard de-

viation, this meta-solver’s performance can be inter-

preted as closer to the VBS than to the SBS.

Despite the formulation, obtaining a ˆm lower than

1 for SAT Industrial was impossible using image-

based methods. Noticeably, our best text-based meta-

solver outperforms the Single Best Solver with an av-

erage ˆm value of 0.889 in this benchmark. Neverthe-

less, as for the previous models, the standard devi-

ation is high (0.303), which suggests that the meta-

solver’s performance varies considerably depending

on the validation instances used.

5.4 Test Set Results

Here we compare feature-based, image-based and

text-based meta-solvers on the test set of each bench-

mark. For each category, we selected the best ap-

proach using 10-fold cross-validation, and trained the

model with the whole training set. Again, we note that

results given on feature-based models are reported as

a reference to gain perspectives as well as to com-

municate performance values of meta-solvers using

straightforward models.

As anticipated, the meta-solvers that yield the best

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

272

Table 4: ˆm metric values of the testing set, for each “best”

model for each approach and benchmark set.

Model CSP SAT Industrial

HF-B-Reg 0.549 0.975

HF-F-ML 0.442 —

HF-F-Reg — 0.674

Im-Reg 0.642 1.309

Txt-Char-Reg 0.556 1.037

results are those that utilize expert-designed features

speciﬁc to the domain. In the case of CSP, the meta-

solver employing a multi-label classiﬁcation model

achieves an ˆm value of 0.442. This signiﬁcantly nar-

rows the performance disparity between the SBS and

the VBS in CSP scenarios. Similarly, for the SAT In-

dustrial benchmark, the regression-based meta-solver

records an ˆm value of 0.674. Considering the com-

plexity of this benchmark, this score is notably satis-

factory. These outcomes align with those from con-

temporary meta-solvers specialized for CSP and SAT

Industrial. It is important to note that this assess-

ment only gauges the effectiveness of the features in

a well-adjusted ML model. This overview omits the

consideration that many sophisticated features, while

beneﬁcial, are computationally intensive and may not

be regularly employed in elaborate Algorithm Se-

lection Systems that utilize both a presolver and a

solver scheduler. Hence, the current ˆm values of the

meta-solvers that incorporate these advanced features

likely represent a lower bound for any straightforward

methodology.

When comparing the two feature-free meta-

solvers, our text-based method signiﬁcantly surpasses

the image-based method and nearly matches the per-

formance of the meta-solvers that incorporate the two

basic crafted features. This suggests that the image-

based models may fail to capture even basic infor-

mation, such as the size of the problem instance.

Conversely, the text-based models appear capable of

recognizing information akin to these features, even

though our system uses only basic vanilla encoders.

Converting these ˆm scores to average running times

reveals that the expected average time for the text-

based model is approximately 13% lower than that

of the image-based model for the CSP benchmark.

For the SAT Industrial benchmark, this reduction

is about 20%. Collectively, these ﬁgures demon-

strate that our novel text-based feature-free frame-

work signiﬁcantly decreases the performance gap be-

tween feature-free and feature-based Algorithm Se-

lection Systems (AAS).

6 CONCLUSIONS AND FUTURE

WORK

We present here a novel approach to Automatic Al-

gorithm Selection that leverages the capabilities of

text-based deep learning models. Our results clearly

demonstrate that this method not only simpliﬁes the

feature extraction process (by eliminating the need

of image-based preprocessing) but also signiﬁcantly

enhances the performance of existing feature-free al-

gorithm selection paradigms. By directly processing

raw textual descriptions of problem instances, our ap-

proach has shown a marked improvement over tradi-

tional, image-based CNN approaches in terms of both

performance and robustness across benchmarks.

The effectiveness of our method was validated

through extensive experiments on benchmarks con-

taining a variety of problem instances. The experi-

mental results underscore the potential of deep learn-

ing techniques that operate directly on raw data, pro-

viding a more scalable and ﬂexible end-to-end solu-

tion for the ﬁeld of AAS.

Our experiments conﬁrm that, up to date, no

feature-free algorithm selection approach can outper-

form meta-solvers based on validated domain-speciﬁc

crafted features by experts. However, results also

show that text-based feature-free models can match

the performance of meta-solvers based on basic in-

formative features. This ﬁnding suggests that deep

learning methods can learn problem representations

beyond the most crude and elementary characteriza-

tion.

While our study has made signiﬁcant strides in the

application of text-based models to algorithm selec-

tion, several avenues remain open for further explo-

ration. Future work may include:

• More Complex AAS Systems: Our proposal can

be the base for more complex AAS systems, in-

cluding dynamic portfolios and schedulers.

• More Complex ML Models: More complex

transformer architectures can also be tested. Be-

sides, AAS can be framed in a more sophisticated

way to leverage advances in ranking, metric learn-

ing, and recommender systems.

• Handling the Whole Text Files: A plethora of ar-

chitectures have been proposed for long text mod-

eling in deep learning. These methods should be

systematically evaluated to overcome the limita-

tions of our text-based meta-solver.

• Anytime AAS: Extending our method to Any-

time Algorithm Selection could signiﬁcantly ben-

eﬁt environments where decisions should be made

based on the available computational resources.

Text-Based Feature-Free Automatic Algorithm Selection

273

• Transfer Learning: Exploring transfer learning

techniques to adapt models trained on one set

of problem instances to handle others effectively

could contribute to a general purpose AAS.

• Interpretable AI Models: Enhancing the inter-

pretability of deep learning models used in AAS

to provide insights into why certain algorithms are

preferred for speciﬁc instances could help reﬁne

the models further and in gaining trust from users.

• Benchmarks and Datasets: Applying our frame-

work to other domains, possibly including opti-

mization problems whose domain metrics ˆm in-

volve the values of the objective function.

In conclusion, the research presented in this paper sets

a new benchmark in the ﬁeld of feature-free AAS and

opens up numerous possibilities for the evolution of

more intelligent and autonomous algorithm selection

systems. Our future efforts will focus on expanding

the capabilities of our framework and exploring these

promising directions to further enhance the ﬁeld of

algorithm selection.

ACKNOWLEDGEMENTS

Authors 1st, 2nd, 3rd, 4th, and 6th are supported in

part by AI institute NSF award 2112533.

REFERENCES

Ach

a, R. A., L

opez, R., Hagedorn, S., and Baier, J. A.

(2022). Multi-agent path ﬁnding: A new boolean en-

coding. Journal of Artiﬁcial Intelligence Research,

75:323–350.

Adam, S. P., Alexandropoulos, S.-A. N., Pardalos, P. M.,

and Vrahatis, M. N. (2019). No free lunch theorem: A

review. Approximation and optimization: Algorithms,

complexity and applications, pages 57–82.

Ba, J. L., Kiros, J. R., and Hinton, G. E. (2016). Layer

normalization. arXiv preprint arXiv:1607.06450.

Bossek, J. and Neumann, F. (2022). Exploring the feature

space of tsp instances using quality diversity. In Pro-

ceedings of the Genetic and Evolutionary Computa-

tion Conference, pages 186–194.

Bulitko, V. (2016). Evolving real-time heuristic search al-

gorithms. In Artiﬁcial Life Conference Proceedings

13, pages 108–115. MIT Press.

Christlein, V., Spranger, L., Seuret, M., Nicolaou, A., Kr

al,

P., and Maier, A. (2019). Deep generalized max pool-

ing. In 2019 International conference on document

analysis and recognition (ICDAR), pages 1090–1096.

IEEE.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Hurley, B., Kotthoff, L., Malitsky, Y., and O’Sullivan, B.

(2014). Proteus: A hierarchical portfolio of solvers

and transformations. In Integration of AI and OR

Techniques in Constraint Programming: 11th Inter-

national Conference, CPAIOR 2014, Cork, Ireland,

May 19-23, 2014. Proceedings 11, pages 301–317.

Springer.

Ioffe, S. and Szegedy, C. (2015). Batch normalization: Ac-

celerating deep network training by reducing internal

covariate shift. In International conference on ma-

chine learning, pages 448–456. pmlr.

Kadioglu, S., Malitsky, Y., Sabharwal, A., Samulowitz, H.,

and Sellmann, M. (2011). Algorithm selection and

scheduling. In International Conference on Principles

and Practice of Constraint Programming, pages 454–

469. Springer.

Kerschke, P., Hoos, H. H., Neumann, F., and Trautmann, H.

(2019). Automated algorithm selection: Survey and

perspectives. Evolutionary computation, 27(1):3–45.

Kudo, T. and Richardson, J. (2018). Sentencepiece: A sim-

ple and language independent subword tokenizer and

detokenizer for neural text processing. arXiv preprint

arXiv:1808.06226.

Lindauer, M., Hoos, H. H., Hutter, F., and Schaub, T.

(2015). Autofolio: An automatically conﬁgured al-

gorithm selector. Journal of Artiﬁcial Intelligence Re-

search, 53:745–778.

Lindauer, M., van Rijn, J. N., and Kotthoff, L. (2019). The

algorithm selection competitions 2015 and 2017. Ar-

tiﬁcial Intelligence, 272:86–100.

Loreggia, A., Malitsky, Y., Samulowitz, H., and Saraswat,

V. (2016). Deep learning for algorithm portfolios. In

Thirtieth AAAI Conference on Artiﬁcial Intelligence.

Loshchilov, I. and Hutter, F. (2017). Decoupled weight de-

cay regularization. arXiv preprint arXiv:1711.05101.

Rice, J. R. (1976). The algorithm selection problem. In Ad-

vances in computers, volume 15, pages 65–118. Else-

vier.

Seiler, M., Pohl, J., Bossek, J., Kerschke, P., and Traut-

mann, H. (2020). Deep learning as a competitive

feature-free approach for automated algorithm selec-

tion on the traveling salesperson problem. In Interna-

tional Conference on Parallel Problem Solving from

Nature, pages 48–64. Springer.

Tay, Y., Tran, V. Q., Ruder, S., Gupta, J., Chung, H. W.,

Bahri, D., Qin, Z., Baumgartner, S., Yu, C., and Met-

zler, D. (2021). Charformer: Fast character transform-

ers via gradient-based subword tokenization. arXiv

preprint arXiv:2106.12672.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.

(2017). Attention is all you need. Advances in neural

information processing systems, 30.

Xu, L., Hutter, F., Hoos, H. H., and Leyton-Brown, K.

(2008). Satzilla: portfolio-based algorithm selection

for sat. Journal of artiﬁcial intelligence research,

32:565–606.

KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval

274