BornFS: Feature Selection with Balanced Relevance and Nuisance and Its Application to Very Large Datasets
Kilho Shin, Chris Liu, Katsuyuki Maeda, Hiroaki Ohshima
2024
Abstract
In feature selection, we grapple with two primary challenges: devising effective evaluative indices for selected feature subsets and crafting scalable algorithms rooted in these indices. Our study addresses both. Beyond assessing the size and class relevance of selected features, we introduce a groundbreaking index, nuisance. It captures class-uncorrelated information, which can muddy subsequent processes. Our experiments confirm that a harmonious balance between class relevance and nuisance augments classification accuracy. To this end, we present the Balance-Optimized Relevance and Nuisance Feature Selection (BornFS) algorithm. It not only exhibits scalability to handle large datasets but also outperforms traditional methods by achieving better balance among the introduced indices. Notably, when applied to a dataset of 800,000 Windows executables, using LCC as a preprocessing filter, BornFS slashes the feature count from 10 million to under 200, maintaining a high accuracy in malware detection. Our findings shine a light on feature selection’s complexities and pave the way forward.
DownloadPaper Citation
in Harvard Style
Shin K., Liu C., Maeda K. and Ohshima H. (2024). BornFS: Feature Selection with Balanced Relevance and Nuisance and Its Application to Very Large Datasets. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-680-4, SciTePress, pages 1100-1107. DOI: 10.5220/0012436000003636
in Bibtex Style
@conference{icaart24,
author={Kilho Shin and Chris Liu and Katsuyuki Maeda and Hiroaki Ohshima},
title={BornFS: Feature Selection with Balanced Relevance and Nuisance and Its Application to Very Large Datasets},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2024},
pages={1100-1107},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012436000003636},
isbn={978-989-758-680-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - BornFS: Feature Selection with Balanced Relevance and Nuisance and Its Application to Very Large Datasets
SN - 978-989-758-680-4
AU - Shin K.
AU - Liu C.
AU - Maeda K.
AU - Ohshima H.
PY - 2024
SP - 1100
EP - 1107
DO - 10.5220/0012436000003636
PB - SciTePress