Exploring Feature Extraction Techniques and SVM for Facial

Recognition with Image Generation Using Diffusion Models

Nabila Daly

1 a

, Faten Khemakhem

2 b

and Hela Ltiﬁ

3 c

REsearch Groups in Intelligent Machine, Faculty of Sciences of Sfax, University of SFAX, Tunisia

REsearch Groups in Intelligent Machine,University of SFAX, Tunisia

Department of Computer Science, Faculty of Sciences and Techniques of Sidi Bouzid, University of Kairouan, Tunisia

{nabiladaly8, khemakhem.faten}@gmail.com, hela.ltiﬁ@ieee.org

Keywords:

Facial Recognition, Diffusion Models, Data Augmentation, Support Vector Machine, Feature Extraction,

Histogram of Oriented Gradients, Eigenfaces, Local Binary Patterns.

Abstract:

Facial recognition is a cornerstone of computer vision, with applications spanning security, personalization,

and beyond. In this study, we enhance the widely used Labeled Faces in the Wild (LFW) dataset by generating

additional images using a diffusion model, enriching its diversity and volume. These augmented datasets were

then employed to train Support Vector Machine (SVM) classiﬁers using three distinct feature extraction meth-

ods: Histogram of Oriented Gradients (HOG), Eigenfaces, and Local Binary Patterns (LBP), in combination

with SVM (HOG-SVM, Eigenfaces-SVM, and LBP-SVM). Our investigation evaluates the impact of these

hybrid approaches on facial recognition accuracy and computational efﬁciency when applied to the expanded

dataset. Experimental results reveal the strengths and limitations of each method, providing valuable insights

into the role of feature extraction and data augmentation in improving facial recognition systems.

1 INTRODUCTION

Facial recognition is a key area in computer vision,

with applications spanning across ﬁelds like security,

surveillance, and personalized services. The ability to

reliably identify individuals from images or videos is

crucial for tasks such as access control, forensic anal-

ysis, and customizing user experiences. Although sig-

niﬁcant progress has been made in facial recognition

technology, challenges like limited dataset diversity,

and variations in pose, lighting, and facial expressions

still hinder the creation of highly robust systems.

The quality and diversity of datasets play a crucial

role in training effective facial recognition models.

However, many widely used datasets, such as the La-

beled Faces in the Wild (LFW), are often constrained

in size and variability, limiting their utility for training

models capable of generalizing to unseen scenarios.

This limitation has spurred interest in leveraging gen-

erative models to augment datasets, enhancing both

their size and diversity.

Diffusion models have emerged as a state-of-the-

https://orcid.org/0009-0001-8932-8904

https://orcid.org/0000-0003-4386-4397

https://orcid.org/0000-0003-3953-1135

art approach for data generation, known for their abil-

ity to produce high-quality, realistic synthetic im-

ages. By systematically introducing and then revers-

ing noise in the data, these models excel in generating

samples that closely resemble real-world data distri-

butions. In this study, we apply diffusion models to

augment the LFW dataset, generating a wide array of

synthetic facial images. This enriched dataset, com-

prising both original and generated images, provides

a robust foundation for training and evaluating facial

recognition systems.

To assess the impact of dataset augmentation

on facial recognition performance, we employ Sup-

port Vector Machines (SVMs) integrated with three

feature extraction methods: Histogram of Ori-

ented Gradients (HOG) (Rajaa et al., 2021), Eigen-

faces (Safa Rajaa, 2021), and Local Binary Pat-

terns (LBP) (Shubhangi Patil, 2022). These hy-

brid approaches—HOG-SVM, Eigenfaces-SVM, and

LBP-SVM—offer diverse strategies for representing

facial features, each with distinct strengths in captur-

ing discriminative information from images.

Our experiments focus on training hybrid models

using the augmented LFW dataset and assessing their

performance in terms of accuracy, robustness to vari-

ations, and computational efﬁciency. By comparing

240

Daly, N., Khemakhem, F. and Ltiﬁ, H.

Exploring Feature Extraction Techniques and SVM for Facial Recognition with Image Generation Using Diffusion Models.

DOI: 10.5220/0013439900003928

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2025), pages 240-251

ISBN: 978-989-758-742-9; ISSN: 2184-4895

the results systematically, we aim to gain insights into

the effectiveness of combining diffusion-based data

augmentation with hybrid SVM-based classiﬁcation

methods. Furthermore, we investigate how the gen-

erated data enhances model performance, particularly

in overcoming challenges associated with the limited

diversity of real-world data.

This study not only demonstrates the potential of

diffusion models for dataset augmentation but also

underscores the importance of integrating robust fea-

ture extraction methods with SVM classiﬁers to en-

hance facial recognition performance. The ﬁndings

presented herein contribute to advancing the ﬁeld by

offering a practical approach to addressing data lim-

itations and improving system robustness. (Smith,

1998).

2 RELATED WORK

Facial recognition has witnessed signiﬁcant advance-

ments in recent years, driven by the proliferation

of deep learning techniques and large-scale datasets.

Deep convolutional neural networks (CNNs) have

emerged as state-of-the-art methods for facial feature

extraction and recognition, achieving remarkable per-

formance on benchmark datasets like LFW (Safa Ra-

jaa, 2021) and Celeb-Faces Attributes (CelebA)

(Rosebrock, 2021). In addition to deep learning ap-

proaches, traditional machine learning methods like

SVMs remain relevant in facial recognition tasks.

SVMs are particularly effective for binary classiﬁ-

cation tasks, including face/non-face discrimination,

and can be adapted to work with various feature rep-

resentations.

Among the traditional feature extraction methods,

the (HOG) algorithm has shown promising results in

capturing local texture and shape information from

facial images. HOG-based systems combined with

SVM classiﬁers have been successfully applied to

real-time face detection and recognition tasks.

Eigenfaces, based on principal component anal-

ysis (PCA), represent another classical approach to

facial recognition. By projecting facial images onto

a lower-dimensional subspace of eigenfaces, this

method reduces the complexity of face representation

and enables efﬁcient classiﬁcation with SVMs.

LBP (Jae Jeong Hwang, 2018) provide a texture-

based representation of facial images by encoding lo-

cal texture patterns. LBP-based feature descriptors,

coupled with SVM classiﬁers, have demonstrated ro-

bustness to variations in illumination and facial ex-

pressions, making them suitable for facial recognition

under non-ideal conditions.

While deep learning methods have dominated re-

cent progress in facial recognition, the compara-

tive analysis of traditional feature extraction meth-

ods like HOG, Eigenfaces (Cheng Quanhua, 2008),

and LBP combined with SVMs remains valuable.

Under-standing the strengths and weaknesses of these

approaches interms of accuracy, computational ef-

ﬁciency, and robustness is essential for developing

practical and effective facial recognition systems.

This study aims to contribute to this comparative anal-

ysis by evaluating these methods on the LFW dataset

and providing insights into their performance charac-

teristics. (Moore and Lopes, 1999).

3 PROPOSED APPROACH

In this study, we propose a two-phase approach for

enhancing facial recognition performance. First, we

augment the Labeled Faces in the Wild (LFW) dataset

using a diffusion model to generate synthetic facial

images. Second, we explore three distinct models

for facial recognition using SVM classiﬁers in con-

junction with different feature extraction techniques:

HOG (Rajaa et al., 2021), Eigenfaces, and LBP

(Shubhangi Patil, 2022).

3.1 Data Generation Using Diffusion

Models

To address the limitations of the LFW dataset in terms

of size and diversity, we employ a diffusion model for

data augmentation. The diffusion process systemati-

cally adds noise to clean images and then reverses it

to generate new samples that closely resemble real fa-

cial data. This approach enhances the variability of

the dataset by introducing new samples with diverse

facial attributes, poses, and lighting conditions, pro-

viding a richer training set for the subsequent recog-

nition models.

The diffusion model architecture, speciﬁcally a

Context-Unet (Hilbert et al., 2020), is used for gen-

erating synthetic images. The model learns to itera-

tively denoise images by passing them through mul-

tiple layers of convolution, down-sampling, and up-

sampling blocks. Context and timestep embeddings

are incorporated to condition the generated images on

speciﬁc attributes, such as facial expressions or pose

variations. The ﬁnal output is a synthetic image that

maintains the essential characteristics of real facial

data, making the augmented dataset more diverse and

robust.

The augmented dataset, consisting of both real

and synthetic images, forms the foundation for train-

Exploring Feature Extraction Techniques and SVM for Facial Recognition with Image Generation Using Diffusion Models

241

ing the proposed hybrid models. This phase is crucial

for improving the robustness and generalization ca-

pabilities of facial recognition systems, especially in

scenarios with limited real-world data.

3.2 Hybrid Models for Facial

Recognition

Each model represents a unique approach to facial

feature representation and classiﬁcation, allowing for

a comparative analysis of their performance on the

LFW dataset.

1. HOG-SVM Model: The HOG method grabs de-

tails from facial images by looking at the direc-

tion of gradients. It divides the image into small

parts and counts how many gradients point in dif-

ferent directions in each part. This helps capture

both shape and texture of facial features. These

counts are then fed into an SVM, which learns

to tell apart different facial features by ﬁnding

the best separation line in this high-dimensional

space. SVM is great for this because it can handle

lots of data and works well with the HOG features.

Figure 1: The proposed approach for HOG feature extrac-

tion and SVM approach.

2. Eigenfaces-SVM Model: Utilizes PCA to com-

pute eigenfaces, representing discriminative fea-

tures of facial images. SVM is trained on these

eigenface representations for recognition.

Figure 2: The proposed approach for Eigenfaces and SVM

approach(Cheng Quanhua, 2008).

3. LBP-SVM Model: Incorporates LBP

(Kancherla Deepika, 2019) to encode tex-

ture patterns, enabling effective handling of

illumination and facial expression variations by

SVM.

Figure 3: The proposed approach for LBP feature extraction

and SVM approach.

Each diagram branch should dem-onstrate how

raw facial images are processed through the respec-

tive feature extraction method (HOG, Eigenfaces, or

LBP) to generate feature vectors, which are then fed

into SVM classiﬁers for training and prediction. This

visualization will provide a clear overview of the pro-

posed approach and facilitate the understanding of

feature extraction and classiﬁcation stages in each

model.

4 DATA GENERATION USING

DIFFUSION MODELS

In this study, we propose the use of a diffusion model

using U-Net model (Kassel, 2021) for augmenting the

Labeled Faces in the Wild (LFW) dataset, addressing

the limitations of dataset size and diversity. The dif-

fusion model generates high-quality synthetic facial

images by progressively adding and removing noise

to real facial data. These synthetic images enhance

the dataset by introducing variations such as different

facial expressions, lighting, and poses. This approach

improves the robustness of facial recognition models

by providing a richer and more diverse training set.

4.1 Model Architecture

The ﬁgure 4 provides a detailed explanation of the

ContextUNet architecture (Mittal, 2024), which con-

sists of a series of layers designed to process and gen-

erate an output image based on the given input.

Figure 4: ContextUNet Architecture.

The input image is processed through the follow-

ing steps:

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

242

1. Input Image

The initial input is an image represented by its dimen-

sions: batch size b, channels c, height h, and width w.

2. Initial Convolution (Init Conv)

The input image is processed by an initial convolu-

tional layer, which is typically used to extract low-

level features. This layer increases the number of

feature maps to 256, allowing the network to capture

more complex patterns and structures in the image.

3. Down-sampling Path

The feature maps are passed through a series of down-

sampling layers (Down1 and Down2). These layers

consist of convolutional and pooling operations that

reduce the spatial dimensions of the feature maps

while increasing the number of channels (feature

maps).

• Down1: Produces 256 feature maps.

• Down2: Produces 512 feature maps.

4. Up-sampling Path

After down-sampling, the feature maps are up-

sampled through a series of up-sampling layers (Up0,

Up1, and Up2). These layers involve transposed con-

volutions or other upsampling operations that gradu-

ally increase the spatial resolution of the feature maps

back to the original input size. The number of feature

maps is progressively reduced during this process.

5. Context Embedding

The context embedding layer processes external in-

formation, such as time steps or class labels, and gen-

erates a vector representation of the context. This

embedding is integrated into the up-sampling path to

condition the network’s generation process based on

the provided context.

6. Time Embedding

Similar to the context embedding, the time embed-

ding layer takes the time step information and con-

verts it into a vector representation. This allows the

network to capture temporal dependencies and in-

tegrate this information into the up-sampling path,

which is particularly useful for sequential tasks.

7. Skip Connections

Skip connections are used to connect the outputs

of the down-sampling layers to the corresponding

up-sampling layers. These connections help pre-

serve ﬁne-grained details by directly passing high-

resolution features from the down-sampling path to

the up-sampling path, ensuring that important in-

formation is not lost during the spatial resolution

changes.

8. Output

The ﬁnal output is generated by a series of convolu-

tional layers, which produce an image with the same

dimensions as the input. This image is the result of

the network’s processing, incorporating both the low-

level features extracted by the initial convolution and

the high-level contextual and temporal information

from the embeddings.

4.2 Performance of the Diffusion Model

The adoption of the diffusion model signiﬁcantly im-

proves the diversity and quality of the synthetic im-

ages. The key performance improvements are out-

lined in the following table:

Table 1: Performance Comparison: Original vs. Aug-

mented Dataset.

Metric Original LFW

Dataset

Augmented

Dataset (with

Diffusion

Model)

Image Di-

versity

Low High

Facial

Variations

Limited Extensive (e.g.,

pose, expres-

sion)

Lighting

Conditions

Uniform Varied (dif-

ferent light

angles)

Image

Quality

High High (close to

real images)

As shown in the table 1, the diffusion model in-

troduces substantial improvements in image diversity,

facial variations, and lighting conditions, providing

a more robust training dataset for facial recognition

models.

4.3 Hyperparameters

The training of the diffusion model relies heavily on

the selection of appropriate hyperparameters. These

Exploring Feature Extraction Techniques and SVM for Facial Recognition with Image Generation Using Diffusion Models

243

parameters govern various aspects of the model’s ar-

chitecture, noise schedule, and optimization process.

Below is a detailed description of the most critical hy-

perparameters used in the model.

4.3.1 Diffusion Model Hyperparameters

The diffusion process is central to the model’s ability

to generate high-quality samples. Key hyperparame-

ters related to the diffusion process include:

• Timesteps (500): The number of diffusion steps

the model uses to gradually introduce noise to the

image. A higher number of timesteps allows for

ﬁner control over the noise addition, but it also in-

creases computational complexity. In this model,

we use 500 timesteps for a balance between com-

putational efﬁciency and output quality.

• Beta Parameters (β

= 1e−4, β

= 0.02): These

parameters control the noise schedule, which de-

ﬁnes how noise is added to the data over time. β

represents the starting noise level, while β

de-

termines the ﬁnal noise level. The model uses a

linear noise schedule that gradually increases the

noise added to the input.

4.3.2 Network Architecture Hyperparameters

These parameters deﬁne the internal structure of the

neural network used in the model:

• Number of Hidden Features (n f eat = 64):

This hyperparameter deﬁnes the number of hid-

den features or channels in the network. It plays

a critical role in controlling the capacity of the

model. A higher number of features can capture

more intricate details but may lead to overﬁtting

or slower training.

• Context Vector Size (n c f eat = 5): This refers

to the size of the context vector, which encodes

contextual information such as time steps or class

labels. It helps the model condition the generation

process based on this extra information. A context

vector size of 5 provides sufﬁcient capacity for en-

coding essential information without introducing

unnecessary complexity.

• Image Resolution (height = 16): The model op-

erates on 16x16 pixel images. Lower resolu-

tion speeds up training and reduces computational

costs, but it may limit the ﬁne-grained details that

can be captured. In this case, 16x16 resolution

is chosen to balance between computational efﬁ-

ciency and sufﬁcient visual information.

4.3.3 Training Hyperparameters

Training hyperparameters are critical to the conver-

gence and stability of the model during training:

• Number of Epochs (n epoch = 50): This pa-

rameter deﬁnes the number of complete passes

through the dataset. A total of 50 epochs is used

to ensure the model has sufﬁcient opportunities to

learn and improve its performance. The choice

of 50 epochs allows for effective training without

excessive overﬁtting.

• Learning Rate (lrate = 1e − 3): The learning

rate controls the step size during optimization. A

learning rate of 1e − 3 is chosen to balance fast

convergence with model stability. The learning

rate is decayed linearly over epochs to prevent

large updates in the later stages of training, en-

suring that the model ﬁne-tunes its weights effec-

tively.

4.3.4 Optimization

• Optimizer (Adam): The Adam optimizer is used

for model training. It is well-suited for models

with large datasets and parameters, as it adapts

the learning rate for each parameter based on the

ﬁrst and second moments of the gradients. Adam

helps to achieve faster convergence and better

generalization.

The following table provides a summary of the

key hyperparameters used in this diffusion model:

Table 2: Summary of Hyperparameters for the Diffusion

Model.

Hyperparameter Value

Timesteps 500

1e-4

0.02

Number of Features (n f eat) 64

Context Vector Size (n c f eat) 5

Image Resolution (height) 16x16

Number of Epochs (n epoch) 50

Learning Rate (lrate) 1e-3

Optimizer Adam

5 DATA PREPROCESSING

HOG Feature Extraction with SVM (HOG-SVM)

In the preprocessing step, we use the HOG tech-

nique to extract key facial features from the LFW

dataset. HOG captures details about shapes and tex-

tures in each image by analyzing the directions of gra-

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

244

dients, which helps highlight important facial struc-

tures. We calculate HOG descriptors for each image

and then use these as inputs to train an SVM classiﬁer

that specializes in facial recognition.

Eigenfaces Feature Extraction with SVM

(Eigenfaces-SVM)

Another approach we use is Eigenfaces, which re-

lies on PCA to identify the most distinctive features

in the dataset. PCA reduces the data’s dimensions by

focusing on the main features that differentiate faces.

We transform the images into these reduced represen-

tations (eigenfaces) and then use them to train another

SVM classiﬁer for facial recognition.

LBP Feature Extraction with SVM (LBP-SVM)

Lastly, we use LBP, which is effective in capturing

textures, making it useful for handling differences in

lighting and expressions. LBP encodes patterns found

in small regions of the face, providing features that are

resilient to such variations. We extract LBP features

from each image and use them as inputs for an SVM

classiﬁer focused on facial recognition.

Each of these pipelines is followed by training and

evaluating the SVM model on performance metrics

like accuracy, precision, recall, and F1 score. We

then compare the results to see how well HOG, Eigen-

faces, and LBP enhance the accuracy and reliability

of facial recognition on the LFW dataset. This ap-

proach underscores how feature extraction improves

facial recognition model performance.

6 SVM CLASSIFIER

The SVM classiﬁer is a powerful tool for binary clas-

siﬁcation tasks, known for its ability to separate data

into two distinct classes. In our work, we use SVM

with a linear kernel to differentiate facial images from

non-facial elements within our dataset.

Our classiﬁcation process begins by training the

SVM with HOG features extracted from the images,

as these features capture important structural patterns

unique to faces. By learning these patterns, the SVM

can establish a clear decision boundary that maxi-

mizes the separation between facial and non-facial

classes.

During training, the SVM iteratively adjusts this

boundary to achieve the best possible accuracy in

classiﬁcation. This training enables the SVM to rec-

ognize and correctly classify regions containing faces

versus those without.

After training, the SVM model is incorporated

into our HOG-based classiﬁcation pipeline. For each

new image, we extract HOG features and input them

into the SVM, which classiﬁes each image based on

the learned boundary, helping ensure consistent facial

detection on new data.

Figure 5: SVM Architecture.

7 METHODS

In this section, we describe the methods used for fa-

cial recognition, focusing on three approaches: HOG-

SVM, Eigenfaces-SVM, and LBP-SVM. Each ap-

proach uses a distinct feature extraction technique

combined with SVM for classiﬁcation. We cover the

training process, ﬁne-tuning, and key hyperparame-

ters chosen for each model.

7.1 Base Training

To begin, we trained three models using different fea-

ture extraction methods:

• HOG-SVM: The HOG descriptor was used to

capture local gradient orientations from the facial

images, emphasizing important shapes and tex-

tures. These HOG features were then input into

an SVM classiﬁer.

• Eigenfaces-SVM: PCA was used to generate

eigenfaces, which capture the most important fa-

cial features in a lower-dimensional space. These

eigenfaces were then fed into an SVM classiﬁer.

• LBP-SVM: The LBP descriptor was applied to

encode texture patterns from facial images, help-

ing handle variations in lighting and expressions.

These LBP features were then classiﬁed using an

SVM.

The steps for each approach included:

1. Data Preparation: We loaded the LFW dataset

and split it into training and testing sets to ensure

balanced performance evaluation.

2. Model Training: Each SVM classiﬁer was

trained using GridSearchCV to optimize key hy-

perparameters such as the regularization param-

eter (C), kernel type, and gamma value for non-

linear kernels.

Exploring Feature Extraction Techniques and SVM for Facial Recognition with Image Generation Using Diffusion Models

245

3. Evaluation: We assessed each model’s accuracy,

precision, recall, and F1-score on the test set to

compare their effectiveness in facial recognition.

7.2 Hyperparameter Optimization

After the base training, we optimized the hyperparam-

eters of the models to improve their performance fur-

ther. We employed the following procedure:

1. Hyperparameter Optimization: We performed

a grid search over hyperparameters to ﬁnd the best

conﬁguration for each model.

2. Model Reﬁnement: The models were retrained

using the best hyperparameters obtained from the

grid search.

3. Performance Evaluation: We evaluated the

models on the testing set using the same metrics

as before.

7.3 Hyperparameters

For the SVM classiﬁers, we used the following hyper-

parameter grid during grid search:

• Regularization parameter (C): [0.1, 1, 10, 100]

• Kernel type: [’linear’, ’rbf’, ’poly’]

• Gamma parameter (γ): [’scale’, ’auto’]

The best hyperparameters found during grid

search were used to train the ﬁnal models. The

following table represents the best hyperparameters

for different feature extraction methods (HOG, PCA,

LBP) after performing hyperparameter optimization.

Table 3: Best Hyperparameters for Different Feature Ex-

traction Methods.

Feature

Ex-

trac-

tion

Method

Parameter Values Tuned Best

Value

HOG

C [0.1, 1, 10, 100] 10

gamma [scale, auto] scale

kernel [linear, rbf, poly] rbf

PCA

C [0.1, 1, 10, 100] 10

gamma [scale, auto] scale

kernel [linear, rbf, poly] rbf

LBP

C [1, 10] 10

gamma [scale] scale

kernel [linear, rbf] rbf

8 EXPERIMENTAL ANALYSIS

AND COMPARISON

In this section, we present the results of two key ex-

periments: the image generation using the diffusion

model and the performance evaluation of various fa-

cial recognition models on the LFW dataset befor and

after image generation.

8.1 Results

8.1.1 Results of Image Generation

In this part, we evaluate the performance of the im-

age generation process using the diffusion model ap-

plied to the LFW dataset. The goal was to augment

the original dataset by generating diverse images for

each individual, simulating variations in lighting con-

ditions, facial expressions, and poses. For each per-

son in the LFW dataset, we generated multiple images

with different facial expressions (such as happy, sad,

and neutral), different pose orientations, and varying

lighting conditions (e.g., different light angles). This

augmentation aimed to increase the dataset’s diver-

sity, improving the robustness of facial recognition

models trained on this dataset.

To assess the quality of the generated images, we

compared them to the original LFW dataset in terms

of visual ﬁdelity and diversity. The evaluation was

performed by inspecting the generated images for

realistic facial features, maintaining identity consis-

tency across generated samples, and preserving cru-

cial facial characteristics such as eye shape, nose po-

sition, and mouth expression, despite the variations in

lighting, pose, and expression.

Furthermore, we evaluated the impact of training

the diffusion model for different numbers of epochs.

The number of epochs played a signiﬁcant role in the

quality of the generated images. Initially, with fewer

epochs, the generated images exhibited lower quality,

with some distortion or unnatural features. However,

as the number of epochs increased, the images grad-

ually improved, showing more realistic and coherent

facial features. The model achieved optimal perfor-

mance after a certain number of epochs, where the

generated images closely resembled real LFW images

while maintaining sufﬁcient diversity.

The results demonstrated that with an increased

number of epochs, the diffusion model signiﬁcantly

enhanced the diversity and quality of the augmented

dataset. The generated images displayed diverse

lighting conditions, facial expressions, and poses,

which were not present in the original LFW images,

thus improving the generalization capability of facial

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

246

recognition models.

Figure 6: Generated Images Sample.

The following observations were made from ﬁg-

ure 6:

• Image Diversity: The augmented dataset exhib-

ited high diversity, with the generated images

capturing a broader range of poses, expressions,

and lighting conditions compared to the original

dataset.

• Facial Variations: The generated images demon-

strated extensive variations in facial expressions

(e.g., happy, sad, neutral) and pose orientations,

making the model more robust for facial recogni-

tion tasks.

• Lighting Conditions: The augmented dataset

showcased varied lighting conditions, simulating

different light angles, which was not present in the

original LFW images.

• Image Quality: The quality of the generated im-

ages was high, closely resembling real images

and retaining critical facial features, enhancing

their usability for further analysis and recognition

tasks.

These results show that the diffusion model effec-

tively augments the LFW dataset, providing enhanced

diversity and realism in the generated images.

8.1.2 Results of Face Recognition Models

This section presents the comparative analysis of

three facial recognition models: Hybrid HOGSVM,

EigenfacesSVM, and LBPSVM. These models were

evaluated on the original LFW dataset as well as

the augmented dataset generated using the diffusion

model. The primary metric used for comparison is

accuracy.

Table 4: Performance Comparison of Facial Recognition

Models on LFW Dataset.

Model Accuracy (%) on LFW

Hybrid HOG-SVM 74.53%

Hybrid Eigenfaces-SVM 77.33%

Hybrid LBP-SVM 59%

The accuracy results highlight the performance dif-

ferences among the feature extraction methods when

integrated with SVM for facial recognition tasks. The

Eigenfaces-SVM model achieved the highest accu-

racy among the three models, emphasizing the effec-

tiveness of eigenface representations in capturing fa-

cial variations. The HOG-SVM model also demon-

strated competitive performance, while the LBP-

SVM model showed lower accuracy, indicating po-

tential challenges in handling illumination and texture

variations in the dataset.

8.2 Evaluation

Confusion Matrix

After training the facial recognition models us-

ing different feature extraction techniques combined

with SVM classiﬁers, we proceeded to evaluate their

performance on the LFW dataset. The evaluation in-

cludes assessing accuracy, precision, recall, and F1

score, which provide a comprehensive measure of the

models’ ability to correctly identify individuals while

minimizing both false positives and false negatives.

Additionally, we analyzed confusion matrices to gain

deeper insights into the models’ effectiveness, partic-

ularly in identifying misclassiﬁcations between simi-

lar facial features, expressions, or lighting conditions.

These metrics and analyses were crucial in under-

standing the strengths and limitations of each model,

helping to identify the most reliable approach for ac-

curate facial recognition under real-world scenarios.

This is the confusion matrix of HOG and SVM

method:

This is the confusion matrix of Eigenfaces and SVM

method:

Exploring Feature Extraction Techniques and SVM for Facial Recognition with Image Generation Using Diffusion Models

247

Figure 7: Confusion Matrix of Hog-SVM Model.

Figure 8: Confusion Matrix of Eigenface-SVM Model.

It signiﬁes the proportion of correctly identiﬁed posi-

tive instances among all the actual positive instances.

Overall, all models performed reasonably well. The

HOG-SVM and Eigenfaces-SVM models achieved

higher accuracy, precision, recall, and F1 score com-

pared to the LBP-SVM model. However, further anal-

ysis and ﬁne-tuning may be required to improve the

performance of the LBP-SVM model.

Figure 9: Confusion Matrix of LBP-SVM Model.

Figure 10: Different metrics by architecture.

ROC Curve

The ROC (Swets and Pickett, 1988) curves below il-

lustrate the performance of different models in terms

of the true positive rate (sensitivity) against the false

positive rate (1-speciﬁcity).

Based on the ROC curves, we can observe that the

HOG-SVM model achieved the highest area Groupe

Shopping lyonnaise funder the curve (AUC), indicat-

ing superior performance in distinguishing between

positive and negative samples.

Precision-Recall Curve

The Precision-Recall curves below illustrate the

trade-off between precision and recall for different

classiﬁcation models. Precision-Recall curves are

useful when the classes are imbalanced, as they pro-

vide insights into the classiﬁer’s performance across

different decision thresholds.

Based on the Precision-Recall curves, we can observe

Table 5: Summary of Model Performance Metrics.

Model Accuracy Precision Recall F1 Score

HOG-SVM 0.745 0.472 0.878 0.644

Eigenfaces-SVM 0.755 0.456 0.856 0.674

LBP-SVM 0.590 0.435 0.840 0.349

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

248

that the HOG-SVM model achieved higher precision-

recall values compared to other models across various

thresholds. This indicates that the HOG-SVM model

is better at identifying positive samples while main-

taining high precision, making it more suitable for the

task.

After Optimization Results

After ﬁne-tuning our models, we obtained the follow-

ing performance metrics:

Table 6: Performance metrics after ﬁne-tuning.

Model Accuracy Precision Recall F1 Score

HOG-SVM 0.795 0.485 0.890 0.682

Eigenfaces-SVM 0.845 0.490 0.894 0.783

LBP-SVM 0.609 0.443 0.853 0.372

From the results in the table 3, we observe that the

Eigenfaces + SVM approach achieved the highest ac-

curacy of 84.5%, with relatively balanced precision,

recall, and F1-score. HOG-SVM also performed rea-

sonably well with an accuracy of 79.5%, demon-

strating good recall but lower precision. However,

the LBP-SVM approach showed lower performance

across all metrics.

Figure 11: ROC Curves of Different Models after Hyperpa-

rameters Optimization.

8.3 Comparison of Results Before and

after Hyperparameters

Optimization

We compared three feature extractors (HOG, Eigen-

faces, and LBP) with the SVM classiﬁer. The perfor-

mance metrics before and after hyperparameters opti-

mization are summarized in the following table:

The results demonstrate clear improvements in model

performance after hyperparameters optimization. Ini-

tially, the HOG + SVM model achieved 74.5% ac-

Figure 12: Precision-Recall Curves of Different Models af-

ter Hyperparameters Optimization.

Table 7: Performance comparison before and after hyperpa-

rameter optimization.

Model Stage Accuracy Precision Recall F1 Score

HOG-SVM Before 0.745 0.472 0.878 0.644

HOG-SVM After 0.795 0.485 0.890 0.682

Eigenfaces-SVM Before 0.755 0.456 0.856 0.674

Eigenfaces-SVM After 0.845 0.490 0.894 0.783

LBP-SVM Before 0.590 0.435 0.840 0.349

LBP-SVM After 0.609 0.443 0.853 0.372

curacy with an F1 score of 0.644. After ﬁne-tuning,

the Eigenfaces + SVM model showed the greatest im-

provement, with accuracy rising from 75.5% to 84.5%

and the F1 score increasing from 0.674 to 0.783. The

HOG + SVM model also improved, reaching 79.5%

accuracy and an F1 score of 0.682. Although the LBP

+ SVM model saw only slight gains in accuracy and

F1 score, it still performed lower than the other mod-

els. These results highlight the value of the hyper-

parameters optimization in boosting model accuracy

and suggest that Eigenfaces is the most effective fea-

ture extractor for SVM on the LFW dataset.

8.4 Evaluation of Facial Recognition

Models on LFW Dataset and

Augmented Dataset

We evaluate the performance of three different fa-

cial recognition models—HOG+SVM, LBP+SVM,

and Eigenfaces+SVM—using both the original LFW

dataset and the augmented LFW dataset generated

with the diffusion model. The aim is to assess how

the introduction of augmented images, which include

variations in lighting, facial expressions, and poses,

inﬂuences the accuracy of the models compared to

training solely on the original LFW dataset.

Exploring Feature Extraction Techniques and SVM for Facial Recognition with Image Generation Using Diffusion Models

249

The augmentation process, which includes gen-

erating additional images for each individual in the

LFW dataset, allows the models to beneﬁt from a

more diverse range of facial variations, which typi-

cally improves their generalization capabilities. By

leveraging these augmented images, the models are

exposed to a wider variety of conditions, helping them

learn more robust feature representations.

We observe that training the models with the aug-

mented dataset yields better results compared to train-

ing on the original LFW dataset. This improvement

in accuracy demonstrates the beneﬁts of using gener-

ated images to enhance the diversity and complexity

of the training data. The following table summarizes

the performance metrics for each model on both the

original and augmented LFW datasets:

Table 8: Models Performance Metrics on LFW with Gener-

ated Images.

Model Accuracy Precision Recall F1 Score

HOG-SVM 0.782 0.77 0.896 0.661

Eigenfaces-SVM 0.961 0.94 0.984 0.877

LBP-SVM 0.957 0.712 0.970 0.642

As shown in Table 8, the accuracy for the HOG-

SVM and Eigenfaces-SVM models signiﬁcantly im-

proves when trained on the augmented dataset, while

the LBP-SVM model also beneﬁts from the additional

data, albeit to a lesser extent.

These results highlight the importance of diverse

and augmented data in improving the performance of

facial recognition models, especially in challenging

real-world scenarios where variations in facial expres-

sions, lighting, and poses are common. The augmen-

tation process through the diffusion model has proven

to be particularly beneﬁcial in this context, as it al-

lows the model to generalize better by exposing it to

more varied representations of facial features, which

may not be present in the original dataset.

This table 9 compares the best accuracies of face

recognition methods obtained in my study with those

from related work, all evaluated on the LFW dataset.

The results reveal that the Eigenfaces-SVM method

outperforms most of the methods in the related work,

achieving the highest accuracy of 0.961. This per-

formance is a notable improvement over the related

works, including well-established methods like PCA-

SVM and CNN, which achieved accuracies of 0.8413

and 0.7998, respectively. The HOG-SVM method,

which also showed promising results in this study

with an accuracy of 0.782, surpasses other meth-

ods like HOG-SVM from previous studies, which

achieved 0.644. The LBP-SVM method, however,

demonstrated an impressive result of 0.957 in the cur-

Table 9: Comparison of Accuracies on LFW Dataset

(Alamri et al., 2022).

Method Accuracy

Related Work

PCA(Yin et al., 2011) 0.8445

PCA - SVM (Duan et al., 2019) 0.8413

CNN (A et al., 2015) 0.7998

SIFT (Ahmed et al., 2018) 0.711

HOG-SVM (Dadi and Pillutla, 2016) 0.644

Eigenfaces-SVM (Aliyu et al., 2022) 0.831

LBP-SVM (Shan, 2011) 0.9481

SIFT - SVM 0.658

Current Study

HOG-SVM 0.782

Eigenfaces-SVM 0.961

LBP-SVM 0.957

rent study, which contrasts with the much higher ac-

curacy of 0.9481 reported in the related work.

This discrepancy might be due to the differences

in data augmentation strategies or model conﬁgura-

tions used across studies. Overall, these results con-

ﬁrm that the Eigenfaces-SVM and HOG-SVM meth-

ods are strong contenders for face recognition tasks,

with Eigenfaces-SVM emerging as the most effective

approach among the models tested.

9 CONCLUSION

This work highlights the importance of both ad-

vanced data augmentation techniques, such as dif-

fusion models, and the selection of effective fea-

ture extraction methods for improving the perfor-

mance of facial recognition systems. By comparing

three well-established algorithms for feature extrac-

tion—Eigenfaces, Local Binary Patterns (LBP), and

Histogram of Oriented Gradients (HOG)—we were

able to assess their suitability when combined with a

Support Vector Machine (SVM) classiﬁer for facial

recognition tasks.

Through extensive experimentation and evalua-

tion on both the original and augmented LFW dataset,

it became evident that the choice of feature extrac-

tor plays a crucial role in the overall performance of

the facial recognition model. Among the three al-

gorithms tested, Eigenfaces-SVM demonstrated the

highest accuracy and overall performance, followed

by HOG-SVM, with LBP-SVM achieving the low-

est results. The Eigenfaces method, which captures

the global structure of faces through Principal Com-

ponent Analysis (PCA), was particularly effective in

distinguishing subtle variations in facial features, re-

sulting in superior accuracy, precision, recall, and F1

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

250

score. HOG, known for its ability to capture edge and

texture information, also showed strong performance

but was not as robust as Eigenfaces in handling var-

ied facial expressions and lighting conditions. On the

other hand, LBP, which is more sensitive to local tex-

ture variations, underperformed compared to the other

two methods, particularly in more complex scenarios

involving diverse lighting and poses.

Additionally, the introduction of the diffusion

model for data augmentation signiﬁcantly contributed

to improving the performance of all three mod-

els. The synthetic images generated by the diffusion

model enhanced the diversity of the training data, pro-

viding the models with a broader range of facial vari-

ations. This led to a noticeable improvement in the

recognition accuracy, especially when compared to

training on the original LFW dataset alone. The aug-

mented data allowed the models to better generalize

to real-world conditions, which often involve diverse

facial expressions, poses, and lighting conditions.

REFERENCES

A, V., Hebbar, D., Shekhar, V. S., Murthy, K. N. B., and

Natarajan, S. (2015). Two novel detector-descriptor

based approaches for face recognition using sift and

surf. Procedia Computer Science, 70:185–197.

Ahmed, A., Guo, J., Ali, F., Deeba, F., and Ahmed, A.

(2018). Lbph based improved face recognition at low

resolution. In 2018 International Conference on Arti-

ﬁcial Intelligence and Big Data (ICAIBD), pages 144–

147.

Alamri, H., Alshanbari, E., Alotaibi, S., and AlGhamdi, M.

(2022). Face recognition and gender detection using

sift feature extraction, lbph, and svm. Engineering,

Technology & Applied Science Research, 12(2):8296–

8299.

Aliyu, I., Bomoi, M. A., and Maishanu, M. (2022). A com-

parative study of eigenface and ﬁsherface algorithms

based on opencv and sci-kit libraries implementations.

International Journal of Information and Electronics

Engineering, 14(3):35. Accuracy: 0.8310 for Eigen-

faces and SVM.

Cheng Quanhua, Liu Zunxiong, D. G. (2008). Facial gender

classiﬁcation with eigenfaces and least squares sup-

port vector machine. pages 28–33.

Dadi, H. S. and Pillutla, G. K. M. (2016). Improved face

recognition rate using hog features and svm classiﬁer.

IOSR Journal of Electronics and Communication En-

gineering (IOSR-JECE), 11(4):34–44.

Duan, Y., Lu, J., and Zhou, J. (2019). Uniformface: Learn-

ing deep equidistributed representation for face recog-

nition. In 2019 IEEE/CVF Conference on Computer

Vision and Pattern Recognition (CVPR), pages 3410–

3419.

Hilbert, A., Madai, V. I., Akay, E. M., and Aydin, O. U.

(2020). Brave-net: Fully automated arterial brain ves-

sel segmentation in patients with cerebrovascular dis-

ease. Frontiers in Artiﬁcial Intelligence, 3:552258.

Jae Jeong Hwang, Young Min Kim, K. H. R. (2018). Faces

recognition using haarcascade, lbph, hog and linear

svm object detector. In Proceedings of the Sixth Inter-

national Conference on Green and Human Informa-

tion Technology, pages 232–236.

Kancherla Deepika, Jyostna Devi Bodapati, R. K. S. (2019).

An efﬁcient automatic brain tumor classiﬁcation us-

ing lbp features and svm-based classiﬁer. Proceedings

of International Conference on Computational Intelli-

gence and Data Engineering, pages 163–170.

Kassel, R. (2021). U-net : le r

eseau de neurones de com-

puter vision.

Mittal, A. (2024). Comprendre les mod

eles de diffusion :

une plong

ee en profondeur dans l’ia g

erative.

Moore, R. and Lopes, J. (1999). Paper templates. In TEM-

PLATE’06, 1st International Conference on Template

Production. SCITEPRESS.

Rajaa, S., HARRABI, R. M. s., and Chaabane, S. B. (2021).

Facial expression recognition system based on svm

and hog techniques. In International Journal of Im-

age Processing (IJIP), pages 14–21.

Rosebrock, A. (2021). Opencv eigenfaces for face recogni-

tion2. In PyImageSearch.

Safa Rajaa, Raﬁka Mohamed salah HARRABI, S. B. C.

(2021). Facial expression recognition system based

on svm and hog techniques. pages 14–21.

Shan, C. (2011). Learning local binary patterns for gen-

der classiﬁcation on real-world face images. Pattern

Recognition Letters, 32(10):1318–1325.

Shubhangi Patil, Y. M. P. (2022). Face expression recog-

nition using svm and knn classiﬁer with hog. In Un-

known.

Smith, J. (1998). The Book. The publishing company, Lon-

don, 2nd edition.

Swets, J. A. and Pickett, R. M. (1988). Measuring the accu-

racy of diagnostic systems. Science, 240(4857):1285–

1293.

Yin, Q., Tang, X., and Sun, J. (2011). An associate-predict

model for face recognition. In CVPR 2011, pages

497–504.

Exploring Feature Extraction Techniques and SVM for Facial Recognition with Image Generation Using Diffusion Models

251