
Table 3: Two-step application of BORNFS changing the hop
parameter values.
Initial h = ∞ h = 10
t = 1.0 t = 1.0 0.95 0.9
Feature count 10,868,073 541 540 155 50
AUC-ROC LGBM 0.97 0.97 0.96 0.86
AUC-ROC MLP 0.98 0.98 0.98 0.88
Runtime (min.) 24.33 515.92 69.09 9.57
To determine the predictive capability of the fea-
ture subsets from the two-step approach, LGBM and
MLP classifiers were used on the refined dataset, and
AUC-ROC scores were calculated. Remarkably, with
the threshold parameter t at 0.95, AUC-ROC scores
were high, reaching 0.96 for LGBM and 0.98 for MLP,
despite reducing the feature count from over 10 mil-
lion to just 155. These results are detailed in Table 3.
The experiment was conducted on an AWS
r5.4xlarge instance, which has 16 vCPUs and 128GB
of memory. The total time taken for the experiment
with t = 0.95 was 93 minutes, comprising 24 minutes
for the first step and an additional 69 minutes for the
second step.
9 CONCLUSION
Beyond the established metrics for evaluating the
goodness of feature selection, that is class relevance
and feature count, we introduced nuisance as a third
metric. This metric measures the amount of irrel-
evant data that can distort understanding. Our µ
H
score evaluates the balance between relevance and
nuisance and has shown a positive correlation with
classifier performance. Our method, BORNFS, har-
monizes these metrics and has outperformed others,
including LCC and mRMR, on large datasets. It main-
tains accuracy while reducing the number of features.
ACKNOWLEDGEMENTS
This work was supported by JSPS KAKENHI Grant
Number 21H05052, and Number 21H03775.
REFERENCES
Almuallim, H. and Dietterich, T. G. (1994). Learning
boolean concepts in the presence of many irrelevant
features. Artificial Intelligence, 69(1 - 2).
Anderson, H. S. and Roth, P. (2018). EMBER: an open
dataset for training static PE malware machine learn-
ing models. CoRR, abs/1804.04637.
Blake, C. S. and Merz, C. J. (1998). UCI repository of ma-
chine learning databases. Technical report, University
of California, Irvine.
Hall, M. A. (2000). Correlation-based feature selection
for discrete and numeric class machine learning. In
ICML2000, pages 359–366.
KDD (1999). KDD Cup 1999: Computer network intrusion
detection.
Kira, K. and Rendell, L. (1992). A practical approach to
feature selection. In ICML1992, pages 249–256.
NIPS (2003). Neural Information Processing Systems Con-
ference 2003: Feature selection challenge.
Peng, H. (2007). mRMR (minimum Redundancy Maximum
Relevance Feature Selection. http://home.penglab.
com/proj/mRMR/. Online; accessed 29-February-
2020.
Peng, H., Long, F., and Ding, C. (2005). Fea-
ture selection based on mutual information: Cri-
teria of max-dependency, max-relevance and min-
redundancy. IEEE TPAMI, 27(8).
Shin K. and Maeda K. (2023). Temporarily open at.
https://00m.in/Xud17.
Shin, K., Fernandes, D., and Miyazaki, S. (2011). Consis-
tency measures for feature selection: A formal defini-
tion, relative sensitivity comparison, and a fast algo-
rithm. In IJCAI2011, pages 1491–1497.
Shin, K., Kuboyama, T., Hashimoto, T., and Shepard, D.
(2015). Super-CWC and super-LCC: Super fast fea-
ture selection algorithms. In IEEE BigData 2015,
pages 61–67.
Shin, K., Kuboyama, T., Hashimoto, T., and Shepard, D.
(2017). sCWC/sLCC: Highly scalable feature selec-
tion algorithms. Information, 8(4).
WCCI (2006). IEEE World Congress on Computational In-
telligence 2006: Performance prediction challenge.
Yu, L. and Liu, H. (2003). Feature selection for high-
dimensional data: a fast correlation-based filter solu-
tion. In ICML2003.
Zhao, Z., Anand, R., and Wang, M. (2019). Maximum Rel-
evance and Minimum Redundancy Feature Selection
Methods for a Marketing Machine Learning Platform.
IEEE DSAA2019.
Zhao, Z. and Liu, H. (2007). Searching for interacting fea-
tures. In IJCAI2007, pages 1156 – 1161.
BornFS: Feature Selection with Balanced Relevance and Nuisance and Its Application to Very Large Datasets
1107