
In the following section, a brief review of the lit-
erature on the use of genetic algorithms for model op-
timization in different DL models and DL models for
the music genre classification task will be presented.
Section 3 will detail the proposed approach, includ-
ing the data preprocessing steps, model selection, and
the experimental protocol adopted. In Section 4, we
discuss the experiments conducted, outlining the con-
figurations and methodology employed. In Section
5, we present the discussion and results, emphasizing
how the genetic algorithm influenced hyperparame-
ter selection and model performance. Finally, Sec-
tion 6 concludes the study and highlights potential
directions for future research on applying genetic al-
gorithms to optimize hyperparameters for automatic
music genre classification.
2 LITERATURE REVIEW
The study of CNNs for audio processing and audio
signal classification has gained attention due to its im-
portance in material retrieval and music recommen-
dation tasks on digital platforms. Early studies fo-
cused on manual feature extraction of acoustic prop-
erties, such as Mel-Frequency Cepstral Coefficients
(MFCCs), timbre, and rhythm, combined with clas-
sical machine learning algorithms, including Support
Vector Machines (SVM) and K-Nearest Neighbors
(KNN) (Tzanetakis and Cook, 2002). While these
methods showed some success, their effectiveness
was limited by the challenge of capturing the more
complex nuances of musical genres.
The representation of audio signals as images has
become a widely adopted approach in audio analy-
sis, particularly in the context of musical genre clas-
sification (M
¨
uller, 2015). One of the most common
ways to achieve this representation is through spectro-
grams, which are visualizations that display the vari-
ation of sound frequencies over time. This type of
representation transforms the audio signal into a two-
dimensional format, enabling a more intuitive analy-
sis of acoustic characteristics (M
¨
uller, 2015).
With advancements in Deep Learning techniques,
CNNs have emerged as powerful tools for audio anal-
ysis, utilizing visual representations like Mel spec-
trograms to identify patterns that distinguish gen-
res (Choi et al., 2017a). For instance, Choi et al.
(2017) demonstrated the effectiveness of CNNs in
genre classification, outperforming traditional meth-
ods by leveraging the networks’ ability to learn hier-
archical features directly from raw data. Furthermore,
Dieleman and Schrauwen (Dieleman et al., 2011) ex-
plored end-to-end approaches, enabling CNNs to op-
erate directly on audio representations without requir-
ing manual feature extraction.
Another critical aspect of musical genre classi-
fication is model optimization, where hyperparame-
ter selection, such as network architecture, learning
rate, and the number of convolutional filters, plays
a pivotal role. Traditional tuning methods, such as
grid search or random search, often prove inefficient
for deep networks due to high computational costs
(Bergstra and Bengio, 2012). In this context, GAs
have been explored as a promising alternative, en-
abling automated selection of optimal hyperparame-
ter configurations. For example, Young et al. (Young
et al., 2015) demonstrated the application of GAs for
neural network architecture optimization, while Real
et al. (Real et al., 2019) showcased advancements in
using these techniques to achieve competitive perfor-
mance on deep learning benchmarks.
These studies highlight the evolution of the field,
from manual feature-based methods to the application
of deep learning and evolutionary algorithms. How-
ever, significant gaps remain in effectively integrating
these technologies to capture the diversity and com-
plexity of musical genres, motivating the proposal of
this study.
3 PROPOSED APPROACH
This section outlines the methodology adopted for the
task of musical genre classification. A step-by-step
flowchart is presented in Figure 1, detailing the key
processes, starting from the visual representation of
audio signals to model optimization. First, audio data
is converted into spectrograms to enable visual anal-
ysis. In this case, CNNs are leveraged for feature ex-
traction and classification. Finally, a GA is applied to
optimize the model’s architecture and hyperparame-
ters, ensuring the best configuration for the task.
3.1 Visual Representation of Audio
Signals
By representing audio as an image, details such as
rhythmic patterns, timbres, and harmonic transitions,
often associated with different musical genres, can be
captured (McFee et al., 2015). Additionally, many
genres share similar characteristics, which can be di-
rectly observed in their visual representations. For ex-
ample, genres like rock and blues may exhibit compa-
rable frequency patterns due to the use of similar in-
struments, whereas electronic genres may stand out
through frequency peaks generated by synthesizers
(Pons et al., 2017). In Figures 2, 3, 4 and 5, we can
ICEIS 2025 - 27th International Conference on Enterprise Information Systems
882