TX-Gen: Multi-Objective Optimization for Sparse Counterfactual

Explanations for Time-Series Classiﬁcation

Qi Huang

, Sofoklis Kitharidis

, Thomas B

ack

and Niki van Stein

Institute of Advanced Computer Science, Leiden University, Einsteinweg 55, Leiden, The Netherlands

Keywords:

Explainable Artiﬁcial Intelligence, Counterfactuals, Time-Series Classiﬁcation, Evolutionary Computation.

Abstract:

In time-series classiﬁcation, understanding model decisions is crucial for their application in high-stakes do-

mains such as healthcare and ﬁnance. Counterfactual explanations, which provide insights by presenting

alternative inputs that change model predictions, offer a promising solution. However, existing methods

for generating counterfactual explanations for time-series data often struggle with balancing key objectives

like proximity, sparsity, and validity. In this paper, we introduce TX-Gen, a novel algorithm for generating

counterfactual explanations based on the Non-dominated Sorting Genetic Algorithm II (NSGA-II). TX-Gen

leverages evolutionary multi-objective optimization to ﬁnd a diverse set of counterfactuals that are both sparse

and valid, while maintaining minimal dissimilarity to the original time series. By incorporating a ﬂexible

reference-guided mechanism, our method improves the plausibility and interpretability of the counterfactuals

without relying on predeﬁned assumptions. Extensive experiments on benchmark datasets demonstrate that

TX-Gen outperforms existing methods in generating high-quality counterfactuals, making time-series models

more transparent and interpretable.

1 INTRODUCTION

The increasing adoption of machine learning mod-

els for time-series classiﬁcation (TSC) in critical do-

mains such as healthcare (Morid et al., 2023) and ﬁ-

nance (Chen et al., 2016) has raised the demand for

interpretable and transparent decision-making pro-

cesses. However, the black-box nature of many high-

performing classiﬁers, such as neural networks or en-

semble models, limits their interpretability, making it

difﬁcult for practitioners to understand why a partic-

ular decision is made. In this context, counterfactual

explanations have emerged as a valuable approach in

Explainable AI (XAI) (Theissler et al., 2022), pro-

viding instance-speciﬁc insights by identifying alter-

native inputs that lead to different classiﬁcation out-

comes. Despite the growing body of work in this area,

generating meaningful counterfactuals for time-series

data presents unique challenges due to its sequential

nature, dependency on temporal structure, and multi-

dimensional complexity.

https://orcid.org/0009-0007-4989-135X

https://orcid.org/0009-0005-8404-0724

https://orcid.org/0000-0001-6768-1478

https://orcid.org/0000-0002-0013-7969

While counterfactual generation has been exten-

sively explored for tabular and image data, methods

tailored to time-series classiﬁcation remain scarce and

underdeveloped. The few existing methods often fail

to balance key properties such as proximity, spar-

sity, and validity. Moreover, most approaches require

high computational resources or rely on rigid heuris-

tics, limiting their applicability in real-world scenar-

ios. This gap motivates the development of an ef-

ﬁcient, model-agnostic method capable of generat-

ing high-quality counterfactual explanations for time-

series classiﬁers.

In this paper, we propose TX-Gen, a novel al-

gorithm for generating counterfactual explanations in

time-series classiﬁcation tasks using a modiﬁed Non-

dominated Sorting Genetic Algorithm II (NSGA-II).

Unlike previous approaches that combine evolution-

ary computing with explainable AI (Zhou et al.,

2024), TX-Gen leverages the power of evolutionary

multi-objective optimization to ﬁnd a diverse set of

Pareto-optimal counterfactual solutions that simulta-

neously minimize multiple objectives, such as dis-

similarity to the original time-series and sparsity of

changes, while ensuring the classiﬁer’s decision is al-

tered. Additionally, our approach incorporates a ﬂex-

ible reference-based mechanism, which guides the

Huang, Q., Kitharidis, S., Bäck, T. and van Stein, N.

TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classiﬁcation.

DOI: 10.5220/0013066400003886

In Proceedings of the 1st International Conference on Explainable AI for Neural and Symbolic Methods (EXPLAINS 2024), pages 62-74

ISBN: 978-989-758-720-7

counterfactual generation process without relying on

restrictive assumptions or predeﬁned shapelets.

Contributions. The key contributions of this work

are as follows:

• We introduce TX-Gen, a novel framework for

generating counterfactual explanations speciﬁ-

cally tailored to time-series classiﬁcation, which

efﬁciently balances proximity, sparsity, and valid-

ity through multi-objective optimization.

• Our method employs a reference-guided approach

to select informative subsequences, improving the

plausibility and interpretability of counterfactuals.

• We demonstrate the effectiveness of TX-Gen

across multiple benchmark datasets, showing that

our approach outperforms existing methods in

terms of sparsity, validity and proximity of the

counterfactual examples.

In the following sections, we detail the method-

ology behind TX-Gen, present our experimental re-

sults, and discuss the implications of our ﬁndings for

advancing explainability in time-series classiﬁcation.

1.1 Problem Statement

We begin our problem formulation by ﬁrst introduc-

ing the context of this research.

Given a collection (set) of univariate time se-

ries, denoted as S = {T

,...,T

}, where each T =

∈ R}

i=1

consists of observations orderly recorded

across m timestamps. In the context of a time series

classiﬁcation (TSC) task, each time series T is asso-

ciated with a true descriptive label L ∈ {L

,...,L

}

from k unique categories. The objective of TSC is to

develop an algorithmic predictor f (·) that can maxi-

mize the probability P( f (T ) = L) for T ∈ S and also

any unseen T /∈ S (provided that the true label of such

a T /∈ S also equals L). Here, we presume the classiﬁer

provides probabilistic or logit outputs. Moreover, as-

sume a function g : R

→ R

that can element-wise

perturb or transform any given time series T into a

new in-distribution time series instance

T . We then

say that

T is a counterfactual of T regarding a TSC

predictor f if f (

T ) ̸= f (T ) while the dissimilarity be-

tween T and

T is minimized (Wachter et al., 2017; De-

laney et al., 2021). It is noteworthy that counterfactual

explanations are fundamentally instance-based expla-

nations aimed at interpreting the predictions of the

classiﬁer being explained, rather than the true labels,

regardless of whether the real labels are known or not.

Goal. Considering a robust time series classiﬁca-

tion predictor f trained on the dataset S, and a time

series instance T

to be explained, this work proposes

an explainable AI method, i.e., the aforementioned

function g, which can identify the counterfactual of

concerning f . Notably, T

may be either part of the

dataset S or external to it.

2 BACKGROUND

2.1 Evaluation Metrics

The evaluation criteria for our counterfactual genera-

tion method are articulated through several key met-

rics, each addressing speciﬁc aspects of counterfac-

tual quality and effectiveness:

• Proximity. Measures the element-wise similarity

between the counterfactual instance and the target

time series, typically by using the L

metrics.

• Validity. Assesses whether the counterfactuals

successfully alter the original class label, thereby

reﬂecting their effectiveness.

• Diversity. Measures the variety in the generated

counterfactual solutions for each to-be-explained

time series instance.

• Sparsity. Quantiﬁes the simplicity of the counter-

factual by counting the number of element-wise

differences between the generated counterfactual

and the original series.

These metrics, widely recognized in explainable

AI for time series, provide a robust framework for

evaluating counterfactual explanations, emphasizing

minimalism, diversity, validity, and closeness to orig-

inal instances. Our selection of metrics follows the

methodologies of key studies in the ﬁeld, including

(Delaney et al., 2021); (Li et al., 2022); (Li et al.,

2023); (H

ollig et al., 2022) and (Refoyo and Luengo,

2024).

2.2 Related Work

The evolution of explainable artiﬁcial intelligence

(XAI) for generating counterfactuals for time se-

ries has been marked by signiﬁcant advances in

methods that provide clear, actionable insights into

model decisions. This line of research has been

initiated by (Wachter et al., 2017), introducing an

optimization-based approach that focuses on mini-

mizing a loss function to modify decision outcomes

while maintaining minimal Manhattan distance from

the original input. This foundational method encour-

aged further development of algorithms that enhance

TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classiﬁcation

the quality and effectiveness of counterfactual expla-

nations through additional modiﬁcations to the loss

function.

Building upon the concept of inﬂuencing speciﬁc

features within time series data, the introduction of

local tweaking via a Random Shapelet Forest clas-

siﬁer (Karlsson et al., 2020) targets speciﬁc, inﬂu-

ential shapelets for localized transformations within

time series data. In contrast, global tweaking (Karls-

son et al., 2018) uses a k-nearest neighbor classiﬁer to

guide broader transformations ensuring minimal yet

meaningful alterations to change class outcomes.

Advancing these concepts, (Delaney et al., 2021)

developed the Native Guide Counterfactual Explana-

tion (NG-CF) which utilizes the Dynamic Barycen-

ter Averaging with the nearest unlike neighbor. Sub-

sequently, (Li et al., 2022) introduced the Shapelet-

Guided Counterfactual Explanation (SG-CF), exploit-

ing time series subsequences that are maximally

representative of a class to guide the perturbations

needed for generating counterfactual explanations.

This method was further elaborated upon with the

introduction of the Attention-Based Counterfactual

Explanation for Multivariate Time Series (AB-CF)

by (Li et al., 2023) which incorporates attention

mechanisms to further reﬁne the selection of shapelets

and optimize the perturbations.

Moreover, TSEvo by (H

ollig et al., 2022) utilizes

the Non-Dominated Sorting Genetic Algorithm to

strategically balance multiple explanation objectives

such as proximity, sparsity, and plausibility while in-

corporating three distinct mutations. Most recently,

Sub-SpaCE (Refoyo and Luengo, 2024) employs ge-

netic algorithms too, with customized mutation and

initialization processes, promoting changes in only

a select few subsequences. This method focuses on

generating counterfactual explanations that are both

sparse and plausible without extensive alterations.

Despite the progress in generating counterfactual

explanations for time-series classiﬁcation, these ex-

isting approaches still face several signiﬁcant lim-

itations. Many methods, such as optimization-

based (Refoyo and Luengo, 2024) or shapelet-guided

(Karlsson et al., 2018; Karlsson et al., 2020; Li et al.,

2022) approaches, rely heavily on predeﬁned assump-

tions about the data, such as the importance of speciﬁc

subsequences or the necessity for local perturbations.

These assumptions often reduce the generalizability

of the methods across diverse datasets and classiﬁers.

Furthermore, most methods struggle to balance the

trade-offs between proximity, sparsity, and diversity

of the generated counterfactuals, often prioritizing

one metric at the expense of others (Delaney et al.,

2021). This can lead to counterfactuals that are ei-

ther too similar to the original instance to be mean-

ingful or excessively altered, making them unrealistic

or hard to interpret. Additionally, many existing ap-

proaches lack scalability, becoming computationally

expensive when applied to larger or more complex

time-series data (H

ollig et al., 2022). In contrast,

our proposed method, TX-Gen, addresses these lim-

itations by leveraging the ﬂexibility of evolutionary

multi-objective optimization to dynamically explore

the solution space and by guiding the search using

reference samples from the training data. This allows

TX-Gen to generate diverse, sparse, and valid coun-

terfactuals, making it more applicable to real-world

time-series classiﬁcation tasks.

Among the discussed works, TSEvo (H

ollig et al.,

2022), Sub-SpaCE (Refoyo and Luengo, 2024), and

our proposed TX-Gen all generate candidate coun-

terfactual instances by applying evolutionary heuris-

tics to optimize objective functions based on the cri-

teria outlined in Section 2.1. However, while both

TSEvo and Sub-SpaCE focus on heuristically and di-

rectly manipulating the variables of the instance be-

ing explained to generate counterfactuals, our method

instead models the differences between the desired

counterfactual instances and the original instance.

This represents a fundamental deviation from the

other two strategies, despite their similarities in form.

3 METHODOLOGY

The Framework. Our proposed method, TX-Gen,

aims to jointly locate the subsequence of interest in

a to-be-explained time series T while also transform-

ing T into its counterfactual. This is achieved by mu-

tually minimizing two objective functions under the

guidance of reference samples, using a customized

iterative optimization heuristic based on the Non-

Dominated Sorting Genetic Algorithm II (NSGA-

II) (Deb et al., 2002). A high-level pseudocode of

our algorithm is given in Algorithm 1 assuming the

respect readers are familiar with the concepts of evo-

lutionary algorithms (Back et al., 1997; B

ack et al.,

2023) Following the self-explainable high-level pseu-

docode, the key components proposed in our algo-

rithm are further explained.

3.1 Model-Based Selection of

References

In TSC, consider a time series instance T that belongs

to category C. A counterfactual-based explainer can

be viewed as a reference-guided method if it lever-

ages example instances—originating from the same

EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods

Algorithm 1: High-level pseudocode for our coun-

terfactual discovery algorithm TX-Gen .

Data: Population size N; crossover probability p

;

mutation probability p

, number of

generations G; to-be-explained time series

T ; classiﬁer f ; a set of references S; number

of ﬁnal reference instances K

Result: Set of non-dominated counterfactual

candidate solutions

1 Initialize population P

of size N as described in

section 3.3.1

▷ Selection of references using Algorithm 2

2 ψ ← SelectReference(T, S, f ,K)

3 Generate the counterfactual candidate sets

i=1

{

} for all individuals in P

using the

Generate function from Algorithm 5.

4 Evaluate the ﬁtness of each individual in P

(Θ

)

using two objective functions (see section 3.2)

5 for t = 1 to G do

6 Generate offspring Q

from P

t−1

by:

7 - Select the parents for mating using binary

tournament selection

8 - Crossover the parents (with probability p

)

to obtain Q

′

using Algorithm 3

9 - Mutate the individuals in Q

′

in place (with

probability p

) using Algorithm 4

10 - Enumeratively expand each

−→

∈ Q

′

into K

chromosomes and merge them into a new set

[

i=1

[

j=1

−→

i, j

where each

−→

i, j

= (x

i,1

i,2

, j)

11 Obtain the counterfactual candidates Θ

for all

samples in Q

using Generate function

12 Evaluate the ﬁtness of each candidate in Θ

and assign the resulting values to respective

individuals in Q

13 Combine parent population P

t−1

and offspring

population Q

into a combined population R

of size (2K + 1)N

14 Perform non-dominated sorting on R

and

assign crowding distances to individuals

within each front

15 Select individuals for the next population P

based on rank and crowding distance,

ensuring the population size is N

16 end

17 Return the non-dominated individuals as well as

their corresponding counterfactual candidates

from the ﬁnal population P

task but known to belong to categories other than C

—to assist in transforming T into its counterfactual.

Existing literature on reference-guided methods pri-

marily identiﬁes the reference instances by selecting

those with low shape-based or distance-based simi-

larities to the given instance to be explained, e.g., the

nearest unlike neighbors as seen in (Delaney et al.,

2021; H

ollig et al., 2022). In contrast, our work pro-

poses selecting reference samples solely based on the

classiﬁer’s outputs.

Following the settings in Section 1.1, let f (·) be

a probabilistic classiﬁer for a TSC task Ω with k

classes. That is, instead of predicting the exact label,

f predicts the probability distribution over all candi-

date class labels for any given target time series T.

Deﬁnition 1 (Distance in the Classiﬁer). For any

pairs of time series (T

) of the TSC task Ω, we de-

ﬁne their Distance in the Classiﬁer f as the Jensen-

Shannon distance between f (T

) and f (T

) =

KL( f (T

) | P

) + KL( f (T

) | P

)

(1)

where P

is the element-wise mean of f (T

) and

f (T

), and KL denotes the well-known Kullback-

Leibler (KL) divergence (Endres and Schindelin,

2003). Apart from the properties that a valid metric

holds, this distance is also bounded, D

) ∈ [0,1]

if we use 2 as the logarithm base in KL divergence.

With this deﬁnition, as described in Algorithm 2,

we can retrieve reference instances for a to-be-

explained time series T from a candidate set that con-

tains samples that both are predicted to belong to dif-

ferent categories than T and are close to T with re-

spect to the distance in the classiﬁer.

Algorithm 2: Select References for a Time Series.

Input: Target time series T ; the classiﬁer f ; refer-

ence set S; number of reference samples K

Output: A set of reference samples ψ ⊆ S..

1 Function SelectReference(T , S, f , K):

2 foreach T

∈ S do

3 if f (T

) ̸= f (T ) then

4 Compute D

(T,T

) as in Deﬁnition 1

5 else

6 Set D

(T,T

) ← 1.01

7 end

8 end

9 Get the set: ψ ← arg min

∈S

|ψ|=K

(T,T

)

10 return ψ

3.2 The Objective Functions

To address the idea (see section 1.1) that a decent

counterfactual

T = {

∈ R}

i=1

shall closely resemble

the target time series T = {t

∈ R}

i=1

while ﬂipping

the prediction of a given classiﬁer f , we propose to

determine feasible candidates

∗

by jointly minimiz-

ing two real-valued objectives as follow:

• The minimum Distance in Classiﬁer between a

counterfactual candidate

∗

and the samples in

TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classiﬁcation

the reference set ψ:

(

∗

,ψ) =

(

minD

(

∗

), ∀T

∈ ψ if f (

∗

) ̸= f (T ),

1.01 if f (

∗

) = f (T ).

The second condition is to penalize the case where

∗

fails to ﬂip the prediction of the classiﬁer.

• Joint sparsity and proximity between

∗

and T :

2,1

(

∗

,T ) =

∑

i=[1,...,m]

̸=t

2,2

(

∗

,T ) =

∗

− T |

∗

+ |T |

(

∗

,T ) =

2,1

(

∗

,T ) + F

2,2

(

∗

,T ))

Here we normalize and combine the two indica-

tors into one objective (F

), where the sparsity

2,1

) measures the degree of element-wise per-

turbation, and the second indicator F

2,2

aims to

quantify the scale change between

∗

and T .

In summary, the two objective functions can jointly

ensure the found counterfactual solutions are valid

and minimized. What is more, the proposed two ob-

jective functions are both bounded, saying F

∈ [0, 1]

and F

∈ [0, 1].

3.3 Locating the Sequences of Interest

3.3.1 Chromosome Representation

Previous works that utilize evolutionary computation

predominantly adopt a uniﬁed chromosome represen-

tation that encodes the entire counterfactual instance.

In other words, the number of variables in each solu-

tion is at least as large as the number of timestamps in

the target time series. Furthermore, these approaches

typically apply crossover, mutation, and other evo-

lutionary operators directly to the chromosomes to

generate the next set of feasible candidates in one-

go (H

ollig et al., 2022; Refoyo and Luengo, 2024).

One advantage of encoding the entire time series into

the individual is that it can ﬁnd multiple Segment or

Subsequence of Interest (SoI) at the same time.

In contrast, our approach reduces the complex-

ity of candidate solutions by representing each one

with only three mutable integer variables, i.e., X =

], regardless of the number of timestamps in

the target time series. Speciﬁcally, x

and x

denote

the starting and ending indices of the single subse-

quence of interest considered in this individual, re-

spectively. Meanwhile, x

represents the index of the

reference sample in the set ψ, which is used to guide

the discovery of potential counterfactual observations

for the SoI (see section 3.4).

Similar to previous approaches, our method can

also identify multiple SoI for each target time series

by generating various counterfactual solutions. Fur-

thermore, by leveraging the concept of Pareto efﬁ-

ciency in multi-objective optimization, our approach

ensures that the diverse SoI discovered are all equally

signiﬁcant.

Initialization. Several existing works explicitly in-

corporate low-level XAI strategies to identify the SoI

in the target instance, thereby facilitating the counter-

factual search process. These strategies include fea-

ture attribution methods, such as the GradCAM fam-

ily (Selvaraju et al., 2017), which are employed in

works like (Delaney et al., 2021; Refoyo and Luengo,

2024). Additionally, subsequence mining methods,

such as Shapelet extraction (Ye and Keogh, 2009), are

utilized in studies like (Li et al., 2022; Huang et al.,

2024). Unlike other XAI techniques, we do not use

this (possibly biased) initialization, instead the ﬁrst

population in our search algorithm is randomly sam-

pled. Given a to-be-explained time series T = {t

∈

i=1

and its reference sample set ψ of size K, the

variables in a individual solution

−→

= [x

j,1

j,2

j,3

]

are step-by-step uniformly randomly sampled as:

j,1

∼ U(1, m − 1)

j,2

∼ U(x

j,1

,m)

j,3

∼ U(1, K)

3.3.2 Crossover

Crossover is a crucial operator in evolutionary algo-

rithms, designed to create new offspring by combin-

ing decision variables from parent solutions. This

process helps the algorithm exploit the search space

within the current population. The parent solutions

are chosen through a selection mechanism. In this

work, we employ the well-known tournament selec-

tion method to select mated parents, and we refer in-

terested readers to the literature (Goldberg and Deb,

1991) for more details.

Bearing in mind the idea behind crossover, our

customized crossover operator is dedicated to fulﬁll-

ing two additional criteria: (i) minimizing the inter-

section between the SoI of offspring. (ii) minimizing

the total lengths of the SoI tracked by offspring.

The proposed crossover is demonstrated in Algo-

rithm 3. The crossover operation by default produces

two offspring. The function GetUnique returns the

unique, non-duplicate values of its inputs in an array,

sorted in ascending order. Notably, in line 22 of the

EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods

Algorithm 3: The Crossover Operator.

Input: The parent individuals

−→

= (x

1,1

1,2

1,3

)

and

−→

= (x

2,1

2,2

2,3

); crossover probability P

Output: The two offspring

−→

= (y

1,1

1,2

1,3

)

and

−→

= (y

2,1

2,2

2,3

). .

1 Function GetUnique(a

2 b

← AscendingSort(a

)

−→

β ← (b

)

4 if b2 ̸= b1 then

−→

β ←

−→

β ∥(b

) ▷ Concatenate b

−→

6 end

7 if b3 ̸= b2 then

−→

β ←

−→

β ∥(b

)

9 end

10 if b4 ̸= b3 then

−→

β ←

−→

β ∥(b

)

12 end

13 return

−→

14 Function Crossover(

−→

15 p

∼ U(0,1)

−→

α ← GetUnique(x

1,1

1,2

2,1

2,2

)

−→

←

−→

18 if p

≤ P

then

19 if |

−→

α | = 4 then

20 A ← |x

1,1

− x

2,1

| + |x

1,2

− x

2,2

21 B ← |x

1,1

− x

2,2

| + |x

1,2

− x

2,1

22 if A ≤ B then

23 y

1,1

← min(x

1,1

2,1

)

24 y

1,2

← max(x

1,1

2,1

)

25 y

2,1

← min(x

1,2

2,2

)

26 y

2,2

← max(x

1,2

2,2

)

27 else

28 y

1,1

← min(x

1,1

2,2

)

29 y

1,2

← max(x

1,1

2,2

)

30 y

2,1

← min(x

1,2

2,1

)

31 y

2,2

← max(x

1,2

2,1

)

32 end

33 else if |

−→

α | = 3 then

34 y

1,1

1,2

← α

,α

35 y

2,1

2,2

← α

,α

36 else

37 y

1,1

2,2

← α

,α

38 y

1,2

∼ U(α

,α

)

39 y

2,1

← y

1,2

40 end

41 end

42 return (

−→

)

pseudocode, we compare the total lengths of the SoI

under two feasible crossover options and intentionally

set the relational operator to ≤ instead of <. This en-

sures that, in cases where the SoIs of two parents do

not overlap, their offspring will have minimal inter-

sections. Between lines 33 and 40, we address the

corner cases where the indices recorded in the parents

are partially or entirely the same.

3.3.3 Mutation

In general, the mutation operators help to explore un-

foreseen areas of the solution space and prevent pre-

mature convergence. Unlike the crossover operator

that mixes the SoIs from different parent solutions,

the mutation operator proposed in this work aims to

adjust the length of the SoI tracked by each individual

selected for mutation. The pseudocode of our muta-

tion strategy is provided in Algorithm 4.

Algorithm 4: The Mutation Operator.

Input: The individual

−→

X = (x

); mutation

probability P

; tolerable SoI length ratio τ; M, the

number of timestamps (length) of the target time

series T .

Output: The offspring

−→

Y = (y

) after muta-

tion. .

1 Function Mutate(

−→

X ,P

,τ, M):

−→

Y ←

−→

3 p

∼ U(0,1)

▷ Conﬁgure the p of a Bin(n, p)

4 if p

≤ P

then

5 if τ ∈ (0,1) then

6 σ ←

log(0.5)

−x

7 p

← e

8 else

9 p

← 0.5

10 end

▷ Determine the direction to scale

∼ U(0,1)

▷ Sample the new SoI length

l ∼ Bin(2(x

− x

), p

)

11 if p

≤ 0.5 then

12 y

← min(max(y

+ l, y

+ 1),M)

13 else

14 y

← max(min(y

− l, y

− 1),1)

15 end

16 end

17 return

−→

The strategy employs a dynamic scaling of the

SoI length for the new instance

−→

Y based on the SoI

length of the previous instance

−→

X . The new length

is randomly sampled from a binomial distribution

Bin(n, p), where the number of trials n is twice the

length of the SoI in

−→

X . This approach allows the

SoI length to potentially increase almost exponen-

tially between adjacent generations during optimiza-

tion, effectively simulating the idea of a binary search.

Another vital parameter to specify a binomial distri-

bution is p, the success rate of trials, which can be

interpreted as the probability of extending (p > 0.5)

or shrinking (p < 0.5) the SoI in this context. In lines

TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classiﬁcation

5 to 10 of Algorithm 4, given τ, a tolerable (or ideal)

ratio of the SoI length to the length of the time se-

ries, the value p is determined such that the peak of

the probability mass function of the binomial distribu-

tion is close to ⌊M · τ⌋. Empirical illustrations of this

concept are provided in Figure 1. As shown in Fig-

ure 1a, under the same τ, the success rate decreases

(increases) as the SoI length increases (decreases).

Additionally, for the same SoI length, an increase (de-

crease) in τ leads to a corresponding increase (de-

crease) in the likelihood of extending the SoI. Fur-

thermore, Figure 1b demonstrates that the probability

of sampling a new length close to the tolerable length

is increased, enabling the adaptive adjustment of the

SoI length, particularly discouraging excessively long

SoIs. This can be observed as the peak of n = 80 (SoI

length 40) is shifted further left of the tolerable length

compared to that of n = 40 (SoI length 20).

(a) An empirical plot of multiple success rates p

versus the

length of SoI in to-be-mutated individuals X under various

tolerable SoI length ratios τ.

(b) An example Probability Mass Function (PMF) of a bi-

nomial distribution under our conﬁguration. The tolerable

SoI length ratio is set to τ = 0.4, which results in a tolerable

length of 16. And the ﬁve displayed PMFs are calculated

based on SoI lengths 5, 10, 20, 30, and 40, respectively.

Figure 1: In both ﬁgures, the length of the to-be-explained

time series is M = 40.

3.4 Modeling of the New SoI

It is noteworthy that several previous works also sep-

arate counterfactual generation into two stages: the

discovery of SoIs (or other forms of sensitivity anal-

ysis) and the subsequent generation or search for the

actual values. While the crossover and mutation op-

erators described above facilitate the search for op-

timal SoIs within the time series, the speciﬁc values

corresponding to these subsequences must still be de-

termined to ﬁnd true counterfactual explanations. Ex-

isting approaches to determining the values for SoIs

can be broadly categorized into three types: 1. heuris-

tic methods, which iteratively optimize speciﬁc crite-

ria (Wachter et al., 2017; H

ollig et al., 2022); 2. di-

rect use of (sub)sequences from reference data, typ-

ically selected from the training set (Delaney et al.,

2021; Li et al., 2022); 3. generation of values using

machine learning models, such as generative mod-

els (Lang et al., 2023; Huang et al., 2024). Interest-

ingly, the second strategy essentially complements the

last. By combining these two approaches, it becomes

feasible to obtain an SoI efﬁciently without being re-

stricted to values from the reference data, which is the

foundation of our proposed approach.

The proposed method relies on two assumptions.

Given a univariate time series T of length m and one

of its counterfactual candidate

T . Suppose the located

subsequence of interest is between timestamps i and j

where i < j, then the ﬁrst assumption is as follows:

Assumption 1. The time series T = {t

∈ R}

k=1

and

its counterfactual

T = {

∈ R}

k=1

differ only in the

subsequence of interest, i.e., t

, ∀k /∈ [i, j].

While this assumption aims to ensure the true im-

portance of the found SoI, the second assumption is

developed to further restrict the values of SoI to a sim-

pliﬁed form for efﬁcient modeling and generation.

Assumption 2. The difference between the SoI of the

counterfactual candidate

T = {

∈ R}

k=1

and that

of the target time series T = {t

∈ R}

k=1

, denoted as

Z = {

−t

}

k=i

, is a stationary process where the cur-

rent values are linearly dependent on its past values.

Consequently, Z can be modeled by an autoregressive

model of order p (AR-p) with a Gaussian error term.

This assumption is straightforward and aims to de-

ﬁne the nature of the difference between the SoIs.

To the best of our knowledge, in contrast to previ-

ous approaches that directly obtain the observations

(values) of counterfactuals, we are the ﬁrst to model

the difference between the counterfactual and the to-

be-explained time series instance. Moreover, in sec-

tion 3.1, it is mentioned that this method is reference-

guided. By utilizing the reference instances found

EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods

using Algorithm 2 and the chromosome representa-

tions, we now describe the methodology of generat-

ing counterfactual instances in Algorithm 5. Given

Algorithm 5: Generation of the SoI of counterfac-

tual candidate

Input: The target time series T = {t

∈ R}

k=1

;

the individual

−→

X = (x

); a reference sample

T = {

∈ R}

k=1

; the order (lags) of the autoregres-

sive model p.

Output: The counterfactual candidate

T .

1 Function Generate(T,

−→

X ,ψ, p,m):

T ← T

3 l ← x

− x

4 i, j ← max(x

− p,1), min(x

+ p,m)

5 ζ ← {

−t

}

k=i

6 Fit an autoregressive model AR(p) on ζ

7 Predict ζ ← {t

}

j−i+1

k=1

using the AR(p)

8 i

∗

← min(x

, p)

9 j

∗

← i

∗

+ l

T [x

: x

] ← ζ[i

∗

: j

∗

] + T [x

: x

]

11 return

a target time series T and a chromosome representa-

tion of the SoI

−→

X , the algorithm generates a counter-

factual candidate based on a chosen instance

T from

the reference set. In lines 4 and 5, the element-wise

differences between the reference and the target are

computed as a new time series ζ. Further, from lines

6 to 10, an autoregressive model of order p is ﬁtted to

learn the distribution of ζ through conditional max-

imum likelihood estimation. The in-sample predic-

tions are then added to the SoI values of T to form the

counterfactual SoI. In this context, the autoregressive

model essentially acts as a process of denoising and

smoothing.

4 EXPERIMENTAL SETUP

The experiments are performed using a diverse selec-

tion of time-series classiﬁcation benchmarks and two

popular time-series classiﬁers.

The UCR datasets (Bagnall et al., 2017) from the

time-series classiﬁcation archive (Chen et al., 2015)

chosen in this study represent a diverse array of ap-

plication domains in time-series classiﬁcation, all of

which are one-dimensional. Furthermore, they are

categorized in Table 1 based on their domain speci-

ﬁcity and characteristics, providing a comprehensive

assessment of the proposed method’s robustness and

adaptability across varied types of time series data.

For the classiﬁcation tasks within our framework,

Table 1: The summary of the datasets along with the accu-

racy of the trained classiﬁers used in the experiments.

Dataset Length Train + Test Size Balanced Classes Catch22 STSF

ECG200 96 100 + 100 No 2 0.83 0.87

GunPoint 150 50 + 150 Yes 2 0.94 0.94

Coffee 286 28 + 28 Yes 2 1.00 0.964

CBF 128 30 + 900 Yes 3 0.96 0.98

Beef 470 30 + 30 Yes 5 0.60 0.66

Lightning7 319 70 + 73 No 7 0.69 0.75

ACSF1 1460 100 + 100 Yes 10 0.87 0.82

each dataset was trained and tested using two spe-

ciﬁc classiﬁers: the Catch22 classiﬁer (C22) (Lubba

et al., 2019) and the Supervised Time Series Forest

(TSF) (Cabello et al., 2020). These classiﬁers were

meticulously chosen not only for their computational

efﬁciency and ease of implementation but also for

their ability to produce probabilistic outputs. The

generation of probabilistic outputs, rather than mere

binary labels, is crucial for our methodology. It al-

lows for the nuanced detection of subtle variations in

the probability distributions across different classes.

This feature is integral to our approach as it supports

the generation of counterfactual explanations that are

highly sensitive to minor but signiﬁcant shifts in the

data.

4.1 Hyperparameters Setup

The customized NSGA-II algorithm central to our ap-

proach is designed with several hyperparameters that

can inﬂuence its operation and performance. The

standard NSGA-II hyper-parameters (and their set-

ting between brackets), population size (50), number

of generations (50), the probability of crossover (0.7)

and probability of mutation (0.7), are very common

in evolutionary optimization algorithms, and are set

based on the results of a few trials. A new hyper-

parameter, Number of Reference Instances, is set

to 4 in this work. This parameter speciﬁes both the

number of reference cases used to evaluate the per-

formance of a candidate solution and the number

of teachers employed to guide the search for ﬁnd-

ing counterfactual examples. These hyper-parameters

could be further optimized in future work.

4.2 Metrics and Baselines

As introduced in section 4.2, ﬁve criteria can be used

to assess the effectiveness of the counterfactual ex-

planation algorithms. Given a to-be-explained time

series T = {t

∈ R}

i=1

, a set of n counterfactuals can-

didates Θ = {

= {

∈ R}

i=1

}

j=1

, and the classiﬁer

f , we deﬁne the ﬁve evaluation metrics as follow:

1. L1-Proximity(T,

T ) =

T −T |

T |+|T |

TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classiﬁcation

2. L2-Proximity(T,

T ) =

T −T |

T |

+|T |

3. Validity(T,

T | f ) = 1

f (T )̸= f (

T )

4. Sparsity(T,

T ) =

∑

i=[1,...,m]

̸=t

5. Diversity(T,Θ | f ) =

N−1

∑

i=1

(

∏

j=i+1

̸=

) · 1

f (T )̸= f (

)

∈ Θ

Among the ﬁve metrics, the ﬁrst four are pairwise

metrics. In contrast, Diversity is focused on mea-

suring the number of unique and valid counterfac-

tuals generated by the explainer for each target time

series in a single run. In this work, we evalu-

ate the performance of TX-Gen against four well-

established algorithms which can be separated into

two groups: w-CF (Wachter et al., 2017) and Native-

Guide (Delaney et al., 2021), the hall-of-fame base-

lines; TSEvo (H

ollig et al., 2022) and AB-CF (Li

et al., 2023), the state-of-the-art counterfactual ex-

plainers.

Table 2: The Validity metric for each counterfactual XAI

method on two different classiﬁers (TSF and C22) and

seven different benchmark datasets.

Explainer

ECG200 Coffee GunPoint CBF Lightning7 ACSF1 Beef

Validity Validity Validity Validity Validity Validity Validity

NG (C22) 0.56 0.46 0.43 0.59 – – 0.83

w-CF (C22) 0.03 – 0.01 0.01 – 0.11 0.13

AB-CF (C22) 0.93 1.00 1.00 0.99 0.99 1.00 1.00

TSEvo (C22) 0.39 0.54 0.45 0.50 0.56 0.51 0.07

TX-Gen (C22) 1.00 1.00 1.00 1.00 1.00 1.00 1.00

NG (TSF) 0.35 0.50 0.50 – – – 0.10

w-CF (TSF) – – – 0.01 0.11 0.17

AB-CF (TSF) 0.74 1.00 1.00 1.00 0.99 0.99 0.97

TSEvo (TSF) 0.35 0.50 0.50 0.35 0.44 0.22 0.20

TX-Gen (TSF) 1.00 1.00 1.00 1.00 1.00 1.00 1.00

5 RESULTS AND DISCUSSIONS

Our experimental results clearly demonstrate that

TX-Gen consistently outperforms competing meth-

ods across several key metrics, as shown in Tables

2-5. In terms of validity, TX-Gen achieves a 100%

success rate in generating valid counterfactuals across

all datasets, regardless of the classiﬁer used (Table 2).

This signiﬁcantly outperforms baseline methods like

w-CF, which struggles with generating valid counter-

factuals, achieving a mere 1% ∼ 17% validity across

different datasets and classiﬁers. TSEvo, while better

than w-CF, also lags behind TX-Gen in validity, par-

ticularly when using the Catch22 classiﬁer. AB-CF

is in terms of validity only marginally worse than our

proposed solution.

Sparsity is a key strength of TX-Gen, as demon-

strated in Table 3. While methods like TSEvo also

optimize for sparsity, TX-Gen achieves lower sparsity

values across most datasets. In terms of proximity,

Tables 4 and 5 show that TX-Gen performs compet-

itively with state-of-the-art methods such as AB-CF

and TSEvo, which also optimize for proximity. How-

ever, TX-Gen’s use of multi-objective optimization

allows it to maintain a strong proximity score while

simultaneously achieving higher validity and diver-

sity. The L1-Proximity and L2-Proximity results re-

ﬂect that TX-Gen generates counterfactuals that are

closer to the original time series, ensuring more inter-

pretable and actionable insights.

As shown in the examples of Figure 2 and also in

Table 6, our method generates a diverse set of Pareto-

optimal solutions allowing for a wide range of plau-

sible counterfactuals. All other methods only provide

one counterfactual example per instance. This is par-

ticularly important in real-world applications, where

generating multiple plausible alternatives can provide

more comprehensive insights for end-users. The num-

ber of sub-sequences of the counterfactual examples

by TX-Gen is always 1, while for TSEvo this is on av-

erage 13.8 sub-sequences, Native-Guide provides 1.8,

w-CF 1.7 and AB-CF 1.8 sub-sequences on average

over all datasets. In general, less sub-sequences are

more informative for counterfactual examples. How-

ever, it could be a limitation that the proposed method

only provides 1 sub-sequence, however, TX-Gen pro-

vides multiple examples per instance, one could com-

bine the sub-sequences from these examples to allevi-

ate this limitation. All results and source code can be

found in our Zenodo repository

In summary, TX-Gen’s use of evolutionary multi-

objective optimization and its reference-guided mech-

anism make it more effective than existing methods

across key metrics. The method not only generates

more valid and sparse counterfactuals but does so

with a high degree of proximity and diversity, offering

a remarkable advancement in generating explainable

counterfactuals for time-series classiﬁcation.

6 CONCLUSIONS

In this paper, we presented TX-Gen, a novel frame-

work for generating counterfactual explanations for

time-series classiﬁcation tasks using evolutionary

multi-objective optimization. Through extensive ex-

perimentation, we demonstrated that TX-Gen consis-

tently outperforms state-of-the-art methods in terms

of validity, sparsity, proximity, and diversity across

multiple benchmark datasets. By leveraging the ﬂex-

https://doi.org/10.5281/zenodo.13711886

EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods

Table 3: The Sparsity metric for each counterfactual XAI method on two different classiﬁers (TSF and C22) and seven

different benchmark datasets.

Explainer

ECG200 Coffee GunPoint CBF Lightning7 ACSF1 Beef

Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std.

NG (C22) 0.403 0.310 0.962 0.131 0.052 0.065 0.989 0.067 – – – – 1.000 0.000

w-CF (C22) 0.010 0.000 – – 0.007 0.000 0.669 0.468 – – 0.092 0.287 0.252 0.432

AB-CF (C22) 0.430 0.254 0.437 0.106 0.525 0.296 0.461 0.276 0.350 0.269 0.534 0.334 0.467 0.267

TSEvo (C22) 0.408 0.154 0.280 0.189 0.237 0.197 0.485 0.213 0.347 0.271 0.558 0.291 0.698 0.302

TX-Gen (C22) 0.113 0.137 0.016 0.016 0.121 0.161 0.043 0.052 0.044 0.056 0.041 0.086 0.040 0.091

NG (TSF) 0.967 0.096 0.932 0.169 0.963 0.061 – – – – – – 0.667 0.470

w-CF (TSF) – – – – – – – – 0.006 0.000 0.818 0.385 0.207 0.397

AB-CF (TSF) 0.589 0.218 0.871 0.030 0.307 0.063 0.410 0.284 0.580 0.298 0.589 0.265 0.528 0.324

TSEvo (TSF) 0.154 0.105 0.180 0.072 0.170 0.084 0.194 0.076 0.140 0.072 0.512 0.338 0.128 0.040

TX-Gen (TSF) 0.113 0.131 0.054 0.037 0.051 0.046 0.049 0.029 0.030 0.027 0.053 0.081 0.033 0.044

Table 4: The L1-Proximity metric for each counterfactual XAI method on two different classiﬁers (TSF and C22) and seven

different benchmark datasets.

Explainer

ECG200 Coffee GunPoint CBF Lightning7 ACSF1 Beef

Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std.

NG (C22) 0.121 0.086 0.098 0.260 0.010 0.011 0.383 0.220 – – – – 1.000 0.000

w-CF (C22) 0.001 0.000 – – 0.001 0.000 0.001 0.001 – – 0.00003 0.00002 0.00003 0.0002

AB-CF (C22) 0.132 0.081 0.024 0.007 0.069 0.059 0.236 0.144 0.222 0.114 0.060 0.084 0.103 0.195

TSEvo (C22) 0.311 0.161 0.021 0.011 0.036 0.023 0.275 0.097 0.240 0.139 0.090 0.075 0.036 0.023

TX-Gen (C22) 0.062 0.076 0.002 0.002 0.039 0.067 0.032 0.037 0.045 0.062 0.008 0.017 0.047 0.133

NG (TSF) 0.164 0.118 0.026 0.011 0.062 0.029 – – – – – – 0.515 0.365

w-CF (TSF) – – – – – – – – 0.0001 0.0000 0.035 0.061 0.0003 0.0001

AB-CF (TSF) 0.161 0.067 0.045 0.009 0.043 0.037 0.168 0.110 0.251 0.169 0.065 0.073 0.092 0.150

TSEvo (TSF) 0.067 0.054 0.016 0.005 0.015 0.012 0.073 0.029 0.083 0.041 0.067 0.074 0.143 0.110

TX-Gen (TSF) 0.056 0.060 0.007 0.004 0.012 0.019 0.030 0.020 0.039 0.062 0.012 0.020 0.020 0.036

Table 5: The L2-Proximity metric for each counterfactual XAI method on two different classiﬁers (TSF and C22) and seven

different benchmark datasets.

Explainer

ECG200 Coffee GunPoint CBF Lightning7 ACSF1 Beef

Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std.

NG (C22) 0.182 0.068 0.105 0.259 0.046 0.034 0.478 0.216 – – – – 1.000 0.000

w-CF (C22) 0.006 0.001 – – 0.008 0.000 0.007 0.005 – – 0.001 0.000 0.004 0.002

AB-CF (C22) 0.193 0.086 0.041 0.008 0.118 0.083 0.349 0.130 0.376 0.127 0.145 0.167 0.132 0.228

TSEvo (C22) 0.440 0.145 0.045 0.011 0.096 0.047 0.430 0.067 0.430 0.161 0.200 0.155 0.045 0.031

TX-Gen (C22) 0.134 0.105 0.017 0.008 0.076 0.098 0.121 0.076 0.117 0.127 0.036 0.061 0.084 0.207

NG (TSF) 0.175 0.113 0.036 0.016 0.091 0.046 – – – – – – 0.561 0.395

w-CF (TSF) – – – – – – 0.001 0.000 0.058 0.128 0.058 0.128 0.002 0.000

AB-CF (TSF) 0.209 0.068 0.055 0.010 0.101 0.090 0.283 0.094 0.332 0.160 0.137 0.149 0.111 0.172

TSEvo (TSF) 0.169 0.094 0.043 0.007 0.051 0.038 0.181 0.046 0.210 0.113 0.125 0.105 0.357 0.255

TX-Gen (TSF) 0.133 0.107 0.027 0.011 0.047 0.057 0.112 0.062 0.100 0.102 0.045 0.060 0.060 0.104

Table 6: The Diversity metric for each counterfactual XAI method on two different classiﬁers (TSF and C22) and seven

different benchmark datasets. (For all other methods this is always 1.00).

Explainer

ECG200 Coffee GunPoint CBF Lightning7 ACSF1 Beef

Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std.

TX-Gen (C22) 6.95 2.73 7.46 3.52 10.70 4.08 14.45 4.39 25.07 8.93 24.86 8.49 20.80 6.60

TX-Gen (TSF) 4.89 0.92 4.79 0.82 4.12 1.26 5.27 1.00 5.89 1.67 6.58 2.28 6.13 1.28

TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classiﬁcation

(a) An example of the Pareto-efﬁcient counterfactuals for the test sample (20) of a binary classiﬁcation task Coffee

using Catch22 (left) and Supervised Time-series Forest (right).

(b) An example of the Pareto-efﬁcient counterfactuals for the test sample (10) of a binary classiﬁcation task CBF using

Catch22 (left) and Supervised Time-series Forest (right).

(c) An example of the Pareto-efﬁcient counterfactuals for the test sample (35) of a multi-class classiﬁcation task Light-

ning7 using Catch22 (left) and Supervised Time-series Forest (right).

(d) An example of the Pareto-efﬁcient counterfactuals for the test sample (10) of a multi-class classiﬁcation task Beef

using Catch22 (left) and Supervised Time-series Forest (right).

Figure 2: Selection of examples of TX-Gen on different datasets and using different classiﬁers. In each subﬁgure, the left part

shows the distribution of objective values (F

on the x axis and F

on the y axis) of the Pareto front. The right part of each

ﬁgure displays the counterfactual examples and corresponding labels. The time series being explained is highlighted in green,

while the counterfactual subsequence of interest (SoIs) are color-coded according to their position on the Pareto front.

ibility of the NSGA-II algorithm and incorporat-

ing a reference-guided mechanism, our approach en-

sures that the generated counterfactuals are both in-

terpretable and computationally efﬁcient.

The good performance of TX-Gen particularly in

achieving 100% validity and maintaining high diver-

sity while optimizing for proximity and sparsity, high-

lights its potential for real-world applications where

EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods

model transparency is critical. Our proposed method

strikes an effective balance between multiple conﬂict-

ing objectives, offering a robust solution for generat-

ing meaningful counterfactuals in time-series classiﬁ-

cation.

Future work can explore several promising direc-

tions. First, further tuning of the hyper-parameters,

particularly the number of reference instances, could

lead to even greater improvements in performance.

Additionally, extending TX-Gen to handle multivari-

ate time-series data and real-time counterfactual gen-

eration could broaden its applicability to more com-

plex, real-world scenarios.

ACKNOWLEDGEMENT

This publication is part of the project XAIPre (with

project number 19455) of the research program Smart

Industry 2020 which is (partly) ﬁnanced by the Dutch

Research Council (NWO).

REFERENCES

Back, T., Fogel, D. B., and Michalewicz, Z. (1997). Hand-

book of Evolutionary Computation. IOP Publishing

Ltd., GBR, 1st edition.

ack, T. H., Kononova, A. V., van Stein, B., Wang, H.,

Antonov, K. A., Kalkreuth, R. T., de Nobel, J., Ver-

metten, D., de Winter, R., and Ye, F. (2023). Evolu-

tionary algorithms for parameter optimization—thirty

years later. Evolutionary Computation, 31(2):81–122.

Bagnall, A., Lines, J., Bostrom, A., Large, J., and Keogh,

E. (2017). The great time series classiﬁcation bake

off: A review and experimental evaluation of recent

algorithmic advances. Data Mining and Knowledge

Discovery, 31(3):606–660.

Cabello, N., Naghizade, E., Qi, J., and Kulik, L. (2020).

Fast and accurate time series classiﬁcation through su-

pervised interval search. In 2020 IEEE International

Conference on Data Mining (ICDM), pages 948–953.

IEEE.

Chen, J.-F., Chen, W.-L., Huang, C.-P., Huang, S.-H., and

Chen, A.-P. (2016). Financial time-series data analysis

using deep convolutional neural networks. In 2016 7th

International conference on cloud computing and big

data (CCBD), pages 87–92. IEEE.

Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A.,

Mueen, A., and Batista, G. (2015). The ucr time se-

ries classiﬁcation archive. www.cs.ucr.edu/

∼

eamonn/

time series data/.

Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002).

A fast and elitist multiobjective genetic algorithm:

NSGA-II. IEEE Transactions on Evolutionary Com-

putation, 6(2):182–197.

Delaney, E., Greene, D., and Keane, M. T. (2021). Instance-

Based Counterfactual Explanations for Time Series

Classiﬁcation. In S

anchez-Ruiz, A. A. and Floyd,

M. W., editors, Case-Based Reasoning Research and

Development, pages 32–47, Cham. Springer Interna-

tional Publishing.

Endres, D. and Schindelin, J. (2003). A new metric for

probability distributions. IEEE Transactions on Infor-

mation Theory, 49(7):1858–1860.

Goldberg, D. E. and Deb, K. (1991). A Comparative Analy-

sis of Selection Schemes Used in Genetic Algorithms.

In Rawlins, G. J. E., editor, Foundations of Genetic

Algorithms, volume 1, pages 69–93. Elsevier.

ollig, J., Kulbach, C., and Thoma, S. (2022). TSEvo: Evo-

lutionary Counterfactual Explanations for Time Se-

ries Classiﬁcation. In 2022 21st IEEE International

Conference on Machine Learning and Applications

(ICMLA), pages 29–36.

Huang, Q., Chen, W., B

ack, T., and van Stein, N. (2024).

Shapelet-based Model-agnostic Counterfactual Local

Explanations for Time Series Classiﬁcation. Presented

at the AAAI 2024 Workshop on Explainable Machine

Learning for Sciences (XAI4Sci).

Karlsson, I., Rebane, J., Papapetrou, P., and Gionis, A.

(2018). Explainable time series tweaking via irre-

versible and reversible temporal transformations.

Karlsson, I., Rebane, J., Papapetrou, P., and Gionis, A.

(2020). Locally and globally explainable time se-

ries tweaking. Knowledge and Information Systems,

62(5):1671–1700.

Lang, J., Giese, M. A., Ilg, W., and Otte, S. (2023). Generat-

ing Sparse Counterfactual Explanations for Multivari-

ate Time Series. In Iliadis, L., Papaleonidas, A., An-

gelov, P., and Jayne, C., editors, Artiﬁcial Neural Net-

works and Machine Learning – ICANN 2023, pages

180–193, Cham. Springer Nature Switzerland.

Li, P., Bahri, O., Boubrahimi, S. F., and Hamdi, S. M.

(2022). SG-CF: Shapelet-Guided Counterfactual Ex-

planation for Time Series Classiﬁcation. In 2022 IEEE

International Conference on Big Data (Big Data),

pages 1564–1569.

Li, P., Bahri, O., Boubrahimi, S. F., and Hamdi, S. M.

(2023). Attention-Based Counterfactual Explanation

for Multivariate Time Series. In Wrembel, R., Gam-

per, J., Kotsis, G., Tjoa, A. M., and Khalil, I., editors,

Big Data Analytics and Knowledge Discovery, pages

287–293, Cham. Springer Nature Switzerland.

Lubba, C. H., Sethi, S. S., Knaute, P., Schultz, S. R.,

Fulcher, B. D., and Jones, N. S. (2019). catch22:

CAnonical time-series CHaracteristics: Selected

through highly comparative time-series analysis. Data

Mining and Knowledge Discovery, 33(6):1821–1852.

Morid, M. A., Sheng, O. R. L., and Dunbar, J. (2023).

Time series prediction using deep learning methods

in healthcare. ACM Transactions on Management In-

formation Systems, 14(1):1–29.

Refoyo, M. and Luengo, D. (2024). Sub-SpaCE:

Subsequence-Based Sparse Counterfactual Explana-

tions for Time Series Classiﬁcation Problems. In

Longo, L., Lapuschkin, S., and Seifert, C., editors,

TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classiﬁcation

Explainable Artiﬁcial Intelligence, pages 3–17, Cham.

Springer Nature Switzerland.

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,

Parikh, D., and Batra, D. (2017). Grad-CAM: Vi-

sual Explanations from Deep Networks via Gradient-

Based Localization. In 2017 IEEE International Con-

ference on Computer Vision (ICCV), pages 618–626.

Theissler, A., Spinnato, F., Schlegel, U., and Guidotti, R.

(2022). Explainable ai for time series classiﬁcation:

a review, taxonomy and research directions. Ieee Ac-

cess, 10:100700–100724.

Wachter, S., Mittelstadt, B., and Russell, C. (2017). Coun-

terfactual explanations without opening the black box:

Automated decisions and the GDPR. SSRN Electronic

Journal.

Ye, L. and Keogh, E. (2009). Time series shapelets: A new

primitive for data mining. In Proceedings of the 15th

ACM SIGKDD International Conference on Knowl-

edge Discovery and Data Mining, KDD ’09, pages

947–956, New York, NY, USA. Association for Com-

puting Machinery.

Zhou, R., Bacardit, J., Brownlee, A., Cagnoni, S., Fyvie,

M., Iacca, G., McCall, J., van Stein, N., Walker, D.,

and Hu, T. (2024). Evolutionary computation and ex-

plainable ai: A roadmap to transparent intelligent sys-

tems.

EXPLAINS 2024 - 1st International Conference on Explainable AI for Neural and Symbolic Methods