Union and Intersection K-Fold Feature Selection

Artur J. Ferreira

1,3 a

and M

ario A. T. Figueiredo

2,3 b

ISEL, Instituto Superior de Engenharia de Lisboa, Instituto Polit

ecnico de Lisboa, Portugal

IST, Instituto Superior T

ecnico, Universidade de Lisboa, Portugal

Instituto de Telecomunicac¸

oes, Lisboa, Portugal

ﬁ

Keywords:

Explainability, Feature Selection, Filter, Interpretability, Intersection of Filters, K-Fold Feature Selection,

Union of Filters.

Abstract:

Feature selection (FS) is a vast research topic with many techniques proposed over the years. FS techniques

may bring many beneﬁts to machine learning algorithms. The combination of FS techniques usually improves

the results as compared to the use of one single technique. Recently, the concepts of explainability and in-

terpretability have been proposed in the explainable artiﬁcial intelligence (XAI) framework. The recently

proposed k-fold feature selection (KFFS) algorithm provides dimensionality reduction and simultaneously

yields an output suitable for explainability purposes. In this paper, we extend the KFFS algorithm by perform-

ing union and intersection of the individual feature subspaces of two and three feature selection ﬁlters. Our

experiments performed on 20 datasets show that the union of the feature subsets typically attains better results

than the use of individual ﬁlters. The intersection also attains adequate results, yielding human manageable

(e.g., small) subsets of features, allowing for explainability and interpretability on medical domain data.

1 INTRODUCTION

The machine learning (ML) ﬁeld is focused on learn-

ing from examples on a given dataset. The perfor-

mance of ML techniques can be improved by reduc-

ing the dimensionality of the input data by keeping

only the most relevant features, the key beneﬁts are

faster training and better generalization performance.

For dimensionality reduction, the use of feature

selection (FS) techniques has been found appropri-

ate. FS aims to identify the best performing set of

features on a given task (Guyon et al., 2006; Guyon

and Elisseeff, 2003; Bolon-Canedo et al., 2015). FS

has a long research history and work towards improv-

ing FS techniques still continues (Alipoor et al., 2022;

Chamlal et al., 2022; Huynh-Cam et al., 2022; Jeon

and Hwang, 2023; Xu et al., 2022). FS techniques

can be grouped into four categories: ﬁlters, wrappers,

embedded, and hybrid (Guyon et al., 2006; Bolon-

Canedo et al., 2015). In this paper, we use ﬁlter tech-

niques, which assess the quality of subsets of features

by using some metrics over the data, without resorting

to any learning algorithm. In this sense, ﬁlter tech-

https://orcid.org/0000-0002-6508-0932

https://orcid.org/0000-0002-0970-7745

niques are referred to as agnostic. When dealing with

high-dimensional data, we often ﬁnd that ﬁlters are

the only suitable approach, since the other techniques

are too time-consuming and their use becomes com-

putationally prohibitive (Hastie et al., 2009; Guyon

et al., 2006; Escolano et al., 2009). For recent

surveys on FS techniques, please see the publica-

tions by Remeseiro and Bolon-Canedo (2019), Pud-

jihartono et al. (2022a), and Dhal and Azad (2022).

In this work, we address the use of unsupervised

and supervised FS ﬁlter techniques for different types

of data. We propose to improve and extend the k-fold

feature selection (KFFS) algorithm proposed by Fer-

reira and Figueiredo (2023), using combinations of

heterogeneous ﬁlters. These combinations attain both

adequate dimensionality reduction and improved per-

formance. Moreover, the small dimensionality of re-

duced feature subspace allows for the human end user

to focus on explainability and interpretability tasks.

1.1 Combination of Filters

We ﬁnd the use of combination of ﬁlters in different

applications. The problem of sleep disease diagnos-

tic was addressed by

Alvarez Est

evez et al. (2011),

with the monitoring of bio-physiological signals of

360

Ferreira, A. J. and Figueiredo, M. A. T.

Union and Intersection K-Fold Feature Selection.

DOI: 10.5220/0013282300003905

In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025), pages 360-367

ISBN: 978-989-758-730-6; ISSN: 2184-4313

patients during sleep, with polysomnography (PSG)

data. A dataset with PSG of patients was used for

the detection of arousals in sleep. From a set of 42

features extracted from biosignals methods to detect

sleep events were developed. Using FS techniques

the goal was to remove redundant features, identify-

ing the best subset of features preserving classiﬁca-

tion accuracy. Wrapper and ﬁlter methods and com-

binations of these were considered, by union and in-

tersection operations. Discarding the irrelevant fea-

tures, a reduced dimensionality dataset was obtained,

improving the accuracy of the classiﬁers.

The heterogeneous ensemble feature selection

(HEFS) method proposed by Damtew et al. (2023)

fuses the output feature subsets of ﬁve FS ﬁlters with

an union combination. It resorts to a merit-based eval-

uation to minimize redundancy of the obtained en-

semble of features. In a multi-class intrusion detec-

tion dataset, HEFS leads to better performance than

the individual FS methods.

Mochammad et al. (2022) proposed the multi-

ﬁlter clustering fusion (MFCF) technique. A multi-

ﬁlter method combining ﬁlter methods is applied as

a ﬁrst step for feature clustering; then, the key fea-

tures are selected. The union of key features is used to

ﬁnd all potentially important features. An exhaustive

search ﬁnds the best combination of selected features,

to maximize the accuracy of the classiﬁcation model.

For rotating machinery problems, the fault classiﬁca-

tion models using MFCF yields good accuracy.

The intersection of common features selected by

ﬁlter, wrapper, and embedded FS techniques was pro-

posed by Bashir et al. (2022). A support vector ma-

chines (SVM) classiﬁer is then trained on medical do-

main data, attaining better results as compared to the

individual use of the FS methods.

Arya and Gupta (2023) introduced an ensemble

ﬁlter-based FS approach combining ANOVA, Pear-

son correlation coefﬁcient, mutual information, and

Chi-square. The reduced feature sets are obtained

with the union and intersection operations. Using de-

cision tree, random forest, XGBoost, and CatBoost

classiﬁers on the Edge-IIoT dataset (cyber-attack de-

tection), we have 97.84% and 99.61% accuracy using

the intersection and union feature sets, respectively.

An ensemble FS approach was proposed by Seijo-

Pardo et al. (2017). The heterogeneous ensemble

combines the result of different FS methods, with the

same training data. The outputs of the base selec-

tors are combined with different aggregators to ob-

tain the resulting subset. On the experimental eval-

uation with the SVM classiﬁer, ensemble results for

seven datasets achieve comparable on better perfor-

mance than the one attained by individual methods.

For reviews on ensemble FS methods and their

combination, please see the publications by Bol

on-

Canedo and Alonso-Betanzos (2019) and Pudji-

hartono et al. (2022b).

1.2 Paper Organization

The remainder of this paper is organized as follows.

In Section 2, we review some topics on feature selec-

tion. The proposed approach is detailed in Section 3.

The experimental evaluation is reported in Section 4.

Finally, Section 5 provides concluding remarks and

directions of future work.

2 FEATURE SELECTION

We introduce notation and review some details of FS

techniques in Section 2.1. An overview of the tech-

niques considered in this work is presented in Sec-

tion 2.2, including the k-fold feature selection (KFFS)

algorithm, which we propose to extend in this work.

2.1 Notation

Regarding the notation followed in this paper, let X =

,... ,x

} denote a dataset, represented as a n × d

matrix (n instances on the rows and d features on the

columns). Each instance x

is a d-dimensional vector,

with i ∈ {1,. .. ,n}. Each feature vector, a column of

X, is denoted as X

, with j ∈ {1,...,d}. The num-

ber of classes is C, with c

∈ {1,...,C} represent-

ing the class of instance x

. Finally, y = {c

,. .. ,c

}

represents the class labels for each instance, with

∈ {1,. .. ,C}.

In this work, we consider both unsupervised and

supervised FS ﬁlters. The former do not use the class

label vector y, while the latter uses the label of each

instance to perform the feature assessment. Some FS

methods are based purely on the relevance of the fea-

tures; they rank the features according to some cri-

terion and then select the top-ranked ones. Other

methods are based on the relevance-redundancy (RR)

framework (Yu and Liu, 2004). In this case, the most

relevant features are kept and a redundancy analysis

is performed to remove redundant features.

2.2 Feature Selection Filters

We consider the three FS ﬁlters next described. The

ﬁrst technique is the fast correlation-based ﬁlter

(FCBF) proposed by Yu and Liu (2003), based on

the RR framework, computing the feature-class and

feature-feature correlations. It starts by selecting a set

Union and Intersection K-Fold Feature Selection

361

of features with correlation with the class label above

some threshold (the predominant features). In the sec-

ond step, redundancy analysis ﬁnds redundant fea-

tures among the predominant ones. These redundant

features are removed, keeping the ones that are the

most relevant to the class. FCBF resorts to the sym-

metrical uncertainty (SU) (Yu and Liu, 2003) mea-

sure, deﬁned as

SU(X

) =

2I(X

)

H(X

) + H(X

)

, (1)

where H(.) denotes the Shannon entropy and I(.)

denotes the mutual information (MI) (Cover and

Thomas, 2006). The SU is zero for independent ran-

dom variables and equal to one for deterministically

dependent random variables, i.e., if one is a bijective

function of the other.

The second FS technique is the Fisher ratio, a su-

pervised relevance-only method. For the i-th feature,

with C = 2, it computes the rank of the feature accord-

ing to

FiR



(−1)

− X

(1)



var(X

)

(−1)

+ var(X

)

(1)

, (2)

where X

(−1)

, X

(1)

, var(X

)

(−1)

, and var(X

)

(1)

are the

sample means and variances of feature X

, for the

instances of both classes. It aims to measure how

well each feature separates the two classes and is ad-

equate as a relevance metric for FS purposes. For the

multi-class case, C > 2, the FiR of feature X

is given

by (Duda et al., 2001; Zhao et al., 2010)

FiR

∑

j=1

(y)



( j)

− X



∑

j=1

(y)

var



( j)



, (3)

where n

(y)

is the number of occurrences of class j in

the n-length class label vector y, and X

( j)

is the sam-

ple mean of the values of X

whose class label is j;

ﬁnally, X

is the sample mean of feature X

The third FS ﬁlter is the relevance-only unsuper-

vised mean-median (MM) criterion, which ranks fea-

tures according to

= |X

− median(X

)|. (4)

The relevance of each feature is the absolute differ-

ence between the mean and median of X

. This cri-

terion is based on the idea that the most relevant fea-

tures are the ones with more asymmetric distributions.

The k-fold feature selection (KFFS) ﬁlter, de-

scribed in Algorithm 1, was proposed by Ferreira and

Figueiredo (2023) and it can work with any unsuper-

vised or supervised FS ﬁlter.

KFFS follows the rationale that the importance of

a feature is proportional to the number of times it is

selected on the k-folds over the training data. It re-

quires two parameters: the number of folds k to sam-

ple the training data and the threshold T

to assess the

percentage of choice of a feature by the ﬁlter on the k

folds.

3 PROPOSED APPROACH

In Section 3.1, we present our key insights regard-

ing the union and intersection of feature subspaces.

The details of the proposed technique are presented

in Section 3.2.

3.1 Union and Intersection

Our proposal is built upon the idea of the union and

the intersection of feature subspaces, as depicted in

Figure 2. Suppose that we have a feature space with d

features and over that space we apply three different

FS ﬁlters. These ﬁlters return feature subspaces with

dimensionality m

, m

, and m

features, respectively.

In Figure 2, we observe the union and the intersec-

tion among these feature subspaces, using an analogy

with the additive RGB color scheme. The subspaces

selected by FS methods 1, 2, and 3 are assigned to

the primary R, G, and B colors, respectively. The

intersection of the ﬁlter subspaces is represented by

the corresponding results of the color addition on the

RGB color space. To denote the number of features in

common on the subspaces found by FS methods i and

j, we use m

i j

, with with i, j ∈ {1,2,3}; on the case of

three FS ﬁlters, we use the notation m

123

Figure 1: Feature subspace analysis for the case of three

FS methods, on a d-dimensional space using a visual corre-

spondence with the three primary colors.

Over these feature subspaces, we can compute

statistics to assess the relation and (dis)similarities be-

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

362

Algorithm 1: k-Fold Feature Selection (KFFS) for ﬁlter FS by Ferreira and Figueiredo (2023).

Require: X : n × d matrix, n patterns of a d-dimensional dataset.

@ f ilter : a FS ﬁlter (unsupervised or supervised).

k : an integer stating the number of folds (k ∈ {2, . ..,n}).

: a threshold (percentage) to chose the number of features.

y : n× 1 class label vector (necessary only in case of a FS supervised ﬁlter).

Ensure: idx: m−dimensional vector with the indexes of the selected features.

1: Allocate the feature counter vector (FCV ), with dimensions 1 × d, such that each position refers to a speciﬁc feature.

2: Initialize FCV

= 0, with i ∈ {0,... , d − 1}.

3: Compute the k data folds in the dataset (different splits into training and test data).

4: For each fold, apply @ f ilter on the training data and update FCV

with the number of times @ f ilter selects feature i.

5: After the k data folds are processed, convert FCV to percentage: FCV P ← FCV /k.

6: Keep the indexes of the features that have been selected at least T

times (expressed in percentage), idx ← FCV P ≥ T

7: Return idx (the vector with the indexes of the selected features that have been selected at least T

times).

tween them. The Jaccard index (JI) is one of such

metrics, being deﬁned as

JI(A,B) =

|A ∩ B|

|A ∪ B|

, (5)

for sets A and B, where ∩ denotes intersection, ∪ de-

notes union, and |.| is the cardinality of the set. We

have 0 ≤ JI(A, B) ≤ 1. On the extreme cases, we

have: if A ∩ B =

0, then JI(A,B) = 0; if A ⊆ B or

B ⊆ A then JI(A, B) = 1. Other similar metrics are

the Dice-Sorenson (DS) coefﬁcient,

DS(A,B) = 2

|A ∩ B|

|A| + |B|

, (6)

and the overlap coefﬁcient or Szymkiewicz–Simpson

(SS) coefﬁcient,

SS(A, B) =

|A ∩ B|

min(|A|,|B|)

, (7)

both ranging from 0 (maximally different) to 1 (max-

imally similar or one is a subset of the other).

3.2 Union and Intersection KFFS

Our proposal extends the KFFS algorithm as follows:

• Set the T

and k parameters of KFFS to their val-

ues; by default, we set k = 10 and T

= 1.

• Apply KFFS with two or three different FS ﬁl-

ters, described in Section 2.2. We apply KFFS

(@ f ilter

=FCBF), KFFS (@ f ilter

=FiR), and

KFFS (@ f ilter

=MM) on the same k data folds,

using the threshold T

. Each ﬁlter will select dif-

ferent subsets of the input feature space, as de-

picted in Figure 2.

• Get the output indexes returned by each ﬁlter,

idx

f cb f

, idx

f ir

, and idx

• Combine the output indexes (idx

f cb f

, idx

f ir

, and

idx

) returned by the ﬁlters, with union and in-

tersection of the indexes of the selected features.

• Return the two index vectors, given by

idx

union

= idx

f cb f

∪ idx

f ir

∪ idx

;

idx

intersection

= idx

f cb f

∩ idx

f ir

∩ idx

The rationale is that by using and combining dif-

ferent ﬁlters, we are able to focus on different subsets

of the original input feature space. We also expect

that the combination of these feature subspaces will

overcome the results of each individual FS method.

The union of the feature subspaces will yield (much)

larger subspaces than the intersection of these sub-

spaces. In the intersection of the two or three sub-

spaces, we will have a small number of features which

are really relevant, since they are always selected re-

gardless the FS ﬁlter.

The unsupervised MM ﬁlter and the supervised

Fisher and FCBF FS ﬁlters were described in Sec-

tion 2.2. The MM and Fisher ﬁlters are relevance-

based methods, which select the top m most relevant

features as follows:

• Compute the MM relevance by Equation (4) or

Fisher ratio relevance by Equations (2) or (3), de-

noted as R

, for each feature X

, i ∈ {1,...,d}.

• Sort the relevance values by decreasing order.

• Compute the cumulative and normalized rele-

vance values, leading to an increasing function

whose values range to a maximum of 1.

• Keep the ﬁrst top relevant m features, holding, say

90% of the accumulated relevance given by R

The FCBF ﬁlter is a relevance-redundancy based

method. We use its default parameter values.

4 EXPERIMENTAL EVALUATION

The proposed methods were evaluated with public do-

main datasets. Section 4.1 describes the datasets and

Union and Intersection K-Fold Feature Selection

363

Table 1: Datasets with n instances, d features, and C classes.

Name n d C Problem/Task

Australian 690 14 2 Credit approval

Brain-Tumor-1 90 5920 5 Cancer detection

Brain-Tumor-2 50 10367 4 Cancer detection

Colon 62 2000 2 Cancer detection

Darwin 174 450 2 Alzheimer detection

Dermatology 366 34 6 Skin cancer detection

DLBCL 77 5469 2 B-cell malignancies

Drebin 15036 215 2 Malware detection

Heart 270 13 2 Heart disease

Hepatitis 155 19 2 Hepatitis survival

Ionosphere 351 34 2 Radar returns

Leukemia 72 7129 2 Leukemia detection

Leukemia-1 72 5328 3 Leukemia detection

Lymphoma 96 4026 9 Lymphoma detection

Prostate-Tumor 102 10509 2 Tumor detection

Sonar 208 60 2 Rock/Mine detection

Spambase 4601 57 2 Email spam

SRBCT 83 2308 4 Cancer detection

WDBC 569 30 2 Breast cancer

Wine 178 13 3 Wine cultivar

the evaluation metric. In Section 4.2, we report the

experimental results for the individual ﬁlters, their

union, and their intersection. In Section 4.3, we assess

the effect of changing the threshold and the number of

folds.

4.1 Datasets and Metrics

Table 1 describes the datasets used in our experi-

ments. We have gathered 20 datasets with differ-

ent types of data and problems, to assess the behav-

ior of our proposed method in different classiﬁcation

task scenarios. The datasets are available from https:

//csse.szu.edu.cn/staff/zhuzx/Datasets.html, from the

Arizona State University (ASU) repository (Zhao

et al., 2010), from the UCI University of California at

Irvine (UCI) repository (Dua and Graff, 2019), https:

//archive.ics.uci.edu/ml/index.php, from the knowl-

edge extraction evolutionary learning (KEEL), https:

//sci2s.ugr.es/keel/datasets.php repository, and https:

//jundongl.github.io/scikit-feature/datasets.html.

The microarray datasets for cancer detection have

“large d, small n”, d ≫ n. Other datasets are in the

opposite situation, with n ≫ d. We have also chosen

both binary and multi-class datasets.

We use the FCBF and FiR implementation of the

ASU repository. For the MM FS ﬁlter, we have our

own implementation. We have considered the na

ıve

Bayes (NB) classiﬁer from Waikato environment for

knowledge analysis (WEKA). NB classiﬁer is sensi-

tive to the presence of redundant features, suffering

an increase in the test-set error rate in the presence of

such features. Thus, it is useful to assess and compare

the quality of the feature subspaces found by each

method. Our key concern is to assess and compare the

adequacy of the several feature subspaces and not to

ﬁnd the best classiﬁer. For comparison purposes, we

have also used the support vector machines (SVM)

classiﬁer.

As evaluation metric, we consider the test-set er-

ror rate, with a 10-fold cross-validation procedure.

We also analyze the size of the feature subsets.

4.2 Union and Intersection

Table 2 shows the average test set error rate (Err) and

the average number of features m, over the ten folds,

for the four combinations of unions among these sub-

spaces.

In seven out of 20 datasets, the union of ﬁlters at-

tains the best results. The best average global result

is attained by KFFS(FCBF) closely followed by the

union of the three ﬁlters. All FS ﬁlter lead to large

reduction of the dimensionality of the data.

Table 3 reports the average test set error rate (Err)

and the average number of features m, over the ten

folds, for all the possible combinations of intersec-

tions among these subspaces. The results of the indi-

vidual methods are the same as in Table 2.

For some cases, the intersection of the feature sub-

space is an empty set. In four out of the 20 datasets,

the intersection of the ﬁlters attains better results than

the use of individual ﬁlters. In generic terms, the in-

tersection of ﬁlters also yields feature subspaces of

reduced dimensionality.

4.3 Parameter Sensitivity

We analyze the effect of changing the threshold T

KFFS(FCBF), KFFS(FiR) and their union and inter-

section for the Prostate-Tumor dataset, in Figure 2.

The FS approaches improve signiﬁcantly the results

of the baseline approach, with a consistent behavior.

In KFFS, as we increase the threshold the dimension-

ality of the selected feature space decreases.

Table 4 reports the best threshold value for each

dataset. We have made a grid search over all the pos-

sible threshold values from 0 to 100, and for each of

the four ﬁlters KFFS(FCBF), KFFS(Fisher), and their

union and intersection, we have recorded the highest

threshold (fewer features) with the lowest test set er-

ror rate by the SVM classiﬁer.

We now analyze the effect of changing the num-

ber of folds k in KFFS, for a ﬁxed threshold T

The goal is to assess the sensitivity of our proposed

method with the number of sampling folds on the

training data. In Figure 3, we assess the test set er-

ror rate of the SVM classiﬁer with 10-fold CV, on

the DLBCL dataset, with ten different values of k ∈

{n/10,2n/10, .. ., n} and a ﬁxed T

= 50.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

364

Table 2: Union evaluation. The average values of test set error rate (Err, in %) and the average number of features m for each

individual FS ﬁlter and their union, on the ten folds of 10-fold CV, for all the benchmark datasets. We use KFFS with k = 10

and T

= 1 with @ f ilter

= FCBF, @ f ilter

= FiR, and @ f ilter

= MM. The lower Err is in boldface. Regarding the error

rates, the Friedman test p-value is p = 0.0010340240 (≤ 0.05), thus having statistical signiﬁcance.

Individual Filters Union of Filters

Baseline NB KFFS(FCBF) KFFS(FiR) KFFS(MM) ∪

∪

123

Dataset Err d Err m Err m Err m Err m Err m Err m Err m

Australian 23.48 14 24.06 8 23.48 9 24.93 9 23.33 10 23.91 11 23.62 12 23.62 12

Brain-Tumor-1 10.00 5920 12.22 473 18.89 205 43.33 1 12.22 617 12.22 474 18.89 206 12.22 618

Brain-Tumor-2 32.00 10367 28.00 359 24.00 205 34.00 121 22.00 535 24.00 474 32.00 303 20.00 630

Colon 40.48 2000 19.05 55 17.62 86 52.14 1 21.19 114 19.05 57 15.95 88 17.86 116

Darwin 12.68 450 12.06 112 13.20 57 14.38 62 11.50 139 14.41 151 12.65 104 11.50 174

Dermatology 2.80 34 3.63 19 26.53 8 2.80 29 3.63 23 2.80 31 2.80 30 2.80 31

DLBCL 18.21 5469 6.43 225 9.11 116 15.36 2 10.54 292 6.43 226 10.54 117 10.54 292

Drebin 16.64 215 8.73 17 19.89 28 21.56 48 18.24 42 20.12 62 20.69 52 19.14 66

Heart 15.56 13 16.67 8 15.19 10 17.78 10 15.56 11 15.19 11 15.19 12 15.19 12

Hepatitis 15.42 19 17.42 9 17.33 12 19.33 17 16.08 12 16.75 18 18.00 18 16.08 18

Ionosphere 18.81 34 8.83 15 18.25 16 19.38 23 16.25 23 17.10 27 19.10 23 16.81 27

Leukemia 1.43 7129 2.86 171 4.29 142 33.39 2 2.86 256 2.86 173 4.29 144 2.86 258

Leukemia-1 4.29 5327 5.71 204 4.29 149 54.46 2 4.29 301 5.71 207 4.29 152 4.29 304

Lymphoma 24.00 4026 23.11 848 18.00 128 15.56 112 19.89 904 22.00 921 12.67 218 20.89 964

Prostate-Tumor 37.09 10509 9.64 257 8.73 114 32.09 100 10.55 320 14.55 354 11.55 211 12.55 415

Sonar 32.74 60 34.67 18 34.67 20 31.64 14 35.62 25 30.81 31 33.21 32 32.74 36

Spambase 20.73 54 23.63 18 13.45 14 21.45 28 21.02 24 20.28 35 20.89 33 20.28 37

SRBCT 1.11 2308 0.00 203 1.11 145 6.11 118 1.11 267 0.00 297 1.11 239 1.11 351

WDBC 6.67 30 4.92 11 6.85 14 7.20 13 6.14 20 5.79 18 6.49 18 6.14 22

Wine 2.78 13 2.22 10 5.56 5 3.33 12 2.22 10 2.78 13 3.33 12 2.78 13

Table 3: Intersection evaluation. The average values of test set error rate (Err, in %) and the average number of features m

for each individual FS ﬁlter and their intersection, on the ten folds of 10-fold CV, for all the benchmark datasets. We use

KFFS with k = 10 and T

= 1 with @ f ilter

= FCBF, @ f ilter

= FiR, and @ f ilter

= MM. The lower Err is in boldface.

Regarding the error rates, the Friedman test p-value is p = 0.0010340240 (≤ 0.05), thus having statistical signiﬁcance.

Individual Filters Intersection of Filters

Baseline NB KFFS(FCBF) KFFS(FiR) KFFS(MM) ∩

∩

123

Dataset Err d Err m Err m Err m Err m Err m Err m Err m

Australian 23.48 14 24.06 8 23.48 9 24.93 9 24.06 7 25.65 6 24.78 6 25.80 5

Brain-Tumor-1 10.00 5920 12.22 473 18.89 205 43.33 1 20.00 62 – 0 – 0 – 0

Brain-Tumor-2 32.00 10367 28.00 359 24.00 205 34.00 121 30.00 30 34.00 6 38.00 23 46.00 3

Colon 40.48 2000 19.05 55 17.62 86 52.14 1 19.05 27 – 0 – 0 – 0

Darwin 12.68 450 12.06 112 13.20 57 14.38 62 12.58 30 13.73 23 13.17 15 16.50 11

Dermatology 2.80 34 3.63 19 26.53 8 2.80 29 27.09 5 5.03 17 30.16 7 30.71 4

DLBCL 18.21 5469 6.43 225 9.11 116 15.36 2 5.00 50 19.64 1 – 0 – 0

Drebin 16.64 215 8.73 17 19.89 28 21.56 48 11.34 3 11.79 2 21.14 24 11.79 2

Heart 15.56 13 16.67 8 15.19 10 17.78 10 16.67 8 17.41 7 18.52 9 18.15 7

Hepatitis 15.42 19 17.42 9 17.33 12 19.33 17 19.92 8 17.38 8 17.96 11 19.88 7

Ionosphere 18.81 34 8.83 15 18.25 16 19.38 23 12.83 8 11.12 10 19.10 15 13.69 7

Leukemia 1.43 7129 2.86 171 4.29 142 33.39 2 2.86 57 – 0 – 0 – 0

Leukemia-1 4.29 5327 5.71 204 4.29 149 54.46 2 4.11 52 – 0 – 0 – 0

Lymphoma 24.00 4026 23.11 848 18.00 128 15.56 112 19.11 72 13.67 39 16.67 22 20.89 9

Prostate-Tumor 37.09 10509 9.64 257 8.73 114 32.09 100 7.73 51 16.73 3 19.64 2 13.82 1

Sonar 32.74 60 34.67 18 34.67 20 31.64 14 35.60 14 40.90 2 40.86 2 41.86 1

Spambase 20.73 54 23.63 18 13.45 14 21.45 28 15.89 8 25.06 11 14.63 10 16.91 6

SRBCT 1.11 2308 0.00 203 1.11 145 6.11 118 1.25 81 4.86 24 3.61 24 7.22 14

WDBC 6.67 30 4.92 11 6.85 14 7.20 13 5.79 5 6.85 5 8.78 9 8.43 4

Wine 2.78 13 2.22 10 5.56 5 3.33 12 5.56 5 2.78 10 5.56 5 5.56 5

The number of folds k has a large impact on the

end result for all ﬁlters. For lower values of k, we have

a non-stationary behavior of the error rate curve. Af-

ter a sufﬁciently large value of k, we observe a more

stable behavior on the error rate. These results show

that, for a speciﬁc dataset and problem, one should

ﬁne-tune both the T

and k parameters to have better

results.

5 CONCLUSIONS

In this paper, we have extended the KFFS ﬁlter algo-

rithm by performing union and intersection of the in-

dividual feature subspaces of two and three heteroge-

neous FS ﬁlters. We have considered two supervised

FS ﬁlters (FCBF and FiR) and one unsupervised ﬁlter

(MM). Two of these ﬁlters are relevance based (FiR

Union and Intersection K-Fold Feature Selection

365

Table 4: The best test set error rate (Err, in %), the corresponding average number of features m and Threshold, T

, for

KFFS(FCBF), KFFS(Fisher), and their union and intersection, for all the benchmark datasets. We use KFFS with k = 10 and

the SVM classiﬁer. The best result is in boldface.

Individual Filters Union Intersection

Baseline SVM KFFS(FCBF) KFFS(FiR) ∪

∩

Dataset Err d Err m T

Err m T

Australian 14.49 14 14.49 4 91 14.49 7 81 14.49 8 81 14.49 3 91

Brain-Tumor-1 10.00 5920 10.00 75 31 10.00 5920 0 10.00 158 31 10.00 5920 0

Brain-Tumor-2 20.00 10367 20.00 10367 0 18.00 92 21 16.00 219 11 18.00 30 1

Colon 13.10 2000 11.43 17 21 13.10 17 91 11.43 117 1 13.10 2000 0

Darwin 17.12 450 16.60 35 31 14.38 37 11 16.57 70 21 17.12 450 0

Dermatology 3.36 34 3.08 20 1 3.36 34 0 2.79 17 51 3.36 34 0

DLBCL 2.50 5469 2.50 42 41 2.50 5469 0 2.50 79 41 2.50 25 11

Drebin 2.23 215 2.23 215 0 2.23 215 0 2.23 215 0 2.23 215 0

Heart 15.93 13 14.07 8 1 14.07 10 1 14.07 10 1 14.07 8 1

Hepatitis 23.29 19 19.38 5 61 18.17 10 21 17.46 10 61 18.79 7 11

Ionosphere 11.42 34 11.42 34 0 11.42 34 0 11.13 22 1 11.42 34 0

Leukemia 1.43 7129 1.43 7129 0 1.43 7129 0 1.43 7129 0 1.43 7129 0

Leukemia-1 1.43 5327 1.43 5327 0 1.43 5327 0 1.43 5327 0 1.43 5327 0

Lymphoma 4.33 4026 4.33 80 61 4.33 4026 0 4.33 93 71 4.33 4026 0

Prostate-Tumor 8.00 10509 6.00 24 61 6.00 54 51 5.00 65 61 4.00 48 1

Sonar 21.71 60 21.19 9 41 21.24 18 11 21.24 21 11 21.71 60 0

Spambase 10.06 54 10.06 54 0 10.06 54 0 10.06 54 0 10.06 54 0

SRBCT 0.00 2308 0.00 54 41 0.00 56 91 0.00 61 91 0.00 31 31

WDBC 2.28 30 2.28 30 0 2.28 30 0 1.93 17 21 2.28 30 0

Wine 0.56 13 0.56 8 91 0.56 13 0 0.56 9 81 0.56 13 0

Figure 2: Test set error rate of the NB classiﬁer with

10-fold CV, as a function of the threshold in KFFS, for

KFFS(FCBF), KFFS(Fisher), and their Union and Intersec-

tion, with k = 10 on the Prostate-Tumor dataset.

and MM) while FCBF follows the RR framework.

Our experiments on 20 datasets with diverse types

of data and problems show that the union of the fea-

ture subsets typically attains better results than the

individual ﬁlters. The intersection also attains ade-

quate results, yielding human manageable subsets of

features allowing for explainability and interpretabil-

ity. By setting properly the threshold of the KFFS

algorithm, we can control the dimensionality of the

feature subspaces, reduced in such a way that allows

for the domain expert (e.g., a medical doctor) to fo-

cus on the interpretation of the resulting variables.

Figure 3: Test set error rate of the SVM classiﬁer with 10-

fold CV, as a function of the number of folds in KFFS, for

KFFS(FCBF), KFFS(Fisher), and their Union and Intersec-

tion, with T

= 50 on the DLBCL dataset.

However, in some cases, the subspace intersection is

empty. The dimensionality of the subspace resulting

from the intersection is typically much lower, as com-

pared to the one from the union. When dealing with

high-dimensional data, it is often the case that FS ﬁl-

ters select different regions of the feature subspace.

As future work, we aim to ﬁne-tune the pa-

rameters of the method for each dataset or type of

data/problem, individually. We will also explore the

use of different thresholds per ﬁlter.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

366

ACKNOWLEDGEMENTS

This research was supported by Instituto

Polit

ecnico de Lisboa (IPL) under Grant

IPL/IDI&CA2024/ML4EP ISEL.

REFERENCES

Alipoor, G., Mirbagheri, S., Moosavi, S., and Cruz, S.

(2022). Incipient detection of stator inter-turn short-

circuit faults in a doubly-fed induction generator using

deep learning. IET Electric Power Applications.

Arya, L. and Gupta, G. P. (2023). Ensemble ﬁlter-based

feature selection model for cyber attack detection in

industrial internet of things. In 2023 9th International

Conference on Advanced Computing and Communi-

cation Systems (ICACCS), volume 1, pages 834–840.

Bashir, S., Khattak, I. U., Khan, A., Khan, F. H., Gani,

A., and Shiraz, M. (2022). A novel feature selec-

tion method for classiﬁcation of medical data using

ﬁlters, wrappers, and embedded approaches. Com-

plexity, 2022(1):1–12.

Bol

on-Canedo, V. and Alonso-Betanzos, A. (2019). Ensem-

bles for feature selection: A review and future trends.

Information Fusion, 52:1–12.

Bolon-Canedo, V., Sanchez-Marono, N., and Alonso-

Betanzos, A. (2015). Feature Selection for High-

Dimensional Data. Springer.

Chamlal, H., Ouaderhman, T., and Rebbah, F. (2022). A hy-

brid feature selection approach for microarray datasets

using graph theoretic-based method. Information Sci-

ences, 615:449–474.

Cover, T. and Thomas, J. (2006). Elements of information

theory. John Wiley & Sons, second edition.

Damtew, Y. G., Chen, H., and Yuan, Z. (2023). Hetero-

geneous ensemble feature selection for network intru-

sion detection system. Int. J. Comput. Intell. Syst.,

16(1).

Dhal, P. and Azad, C. (2022). A comprehensive survey on

feature selection in the various ﬁelds of machine learn-

ing. Applied Intelligence, 52(4):4543–45810.

Dua, D. and Graff, C. (2019). UCI machine learning repos-

itory.

Duda, R., Hart, P., and Stork, D. (2001). Pattern classiﬁca-

tion. John Wiley & Sons, second edition.

Escolano, F., Suau, P., and Bonev, B. (2009). Information

Theory in Computer Vision and Pattern Recognition.

Springer.

Ferreira, A. and Figueiredo, M. (2023). Leveraging explain-

ability with k-fold feature selection. In 12th Inter-

national Conference on Pattern Recognition Applica-

tions and Methods (ICPRAM), pages 458–465.

Guyon, I. and Elisseeff, A. (2003). An introduction to vari-

able and feature selection. Journal of Machine Learn-

ing Research (JMLR), 3:1157–1182.

Guyon, I., Gunn, S., Nikravesh, M., and Zadeh (Editors), L.

(2006). Feature extraction, foundations and applica-

tions. Springer.

Hastie, T., Tibshirani, R., and Friedman, J. (2009). The El-

ements of Statistical Learning. Springer, 2nd edition.

Huynh-Cam, T.-T., Nalluri, V., Chen, L.-S., and Yang, Y.-

Y. (2022). IS-DT: A new feature selection method for

determining the important features in programmatic

buying. Big Data and Cognitive Computing, 6(4).

Jeon, Y. and Hwang, G. (2023). Feature selection with

scalable variational gaussian process via sensitivity

analysis based on L2 divergence. Neurocomputing,

518:577–592.

Mochammad, S., Noh, Y., Kang, Y.-J., Park, S., Lee, J.,

and Chin, S. (2022). Multi-ﬁlter clustering fusion for

feature selection in rotating machinery fault classiﬁ-

cation. Sensors, 22(6).

Pudjihartono, N., Fadason, T., Kempa-Liehr, A., and

O’Sullivan, J. (2022a). A review of feature selection

methods for machine learning-based disease risk pre-

diction. Frontiers in Bioinformatics, 2:927312.

Pudjihartono, N., Fadason, T., Kempa-Liehr, A. W., and

O’Sullivan, J. M. (2022b). A review of feature selec-

tion methods for machine learning-based disease risk

prediction. Front. Bioinform., 2:927312.

Remeseiro, B. and Bolon-Canedo, V. (2019). A review

of feature selection methods in medical applications.

Computers in Biology and Medicine, 112:103375.

Seijo-Pardo, B., Porto-D

ıaz, I., Bol

on-Canedo, V., and

Alonso-Betanzos, A. (2017). Ensemble feature selec-

tion: Homogeneous and heterogeneous approaches.

Knowledge-Based Systems, 118:124–139.

Xu, Y., Liu, Y., and Ma, J. (2022). Detection and de-

fense against DDoS attack on SDN controller based

on feature selection. In Chen, X., Huang, X., and

Kutyłowski, M., editors, Security and Privacy in So-

cial Networks and Big Data, pages 247–263, Singa-

pore. Springer Nature Singapore.

Yu, L. and Liu, H. (2003). Feature selection for high-

dimensional data: a fast correlation-based ﬁlter solu-

tion. In Proceedings of the International Conference

on Machine Learning (ICML), pages 856–863.

Yu, L. and Liu, H. (2004). Efﬁcient feature selection via

analysis of relevance and redundancy. Journal of Ma-

chine Learning Research (JMLR), 5:1205–1224.

Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand,

A., and Liu, H. (2010). Advancing feature selection

research - ASU feature selection repository. Techni-

cal report, Computer Science & Engineering, Arizona

State University.

Alvarez Est

evez, D., S

anchez-Maro

no, N., Alonso-

Betanzos, A., and Moret-Bonillo, V. (2011). Reducing

dimensionality in a database of sleep EEG arousals.

Expert Systems with Applications, 38(6):7746–7754.

Union and Intersection K-Fold Feature Selection

367