Data-Driven Fairness Generalization for Deepfake Detection

Uzoamaka Ezeakunne

1 a

, Chrisantus Eze

2 b

and Xiuwen Liu

1 c

Department of Computer Science, Florida State University, Tallahassee FL, U.S.A.

Department of Computer Science, Oklahoma State University, Stillwater OK, U.S.A.

Keywords:

Deepfake Detection, Image Manipulation Detection, Fairness, Generalization.

Abstract:

Despite the progress made in deepfake detection research, recent studies have shown that biases in the train-

ing data for these detectors can result in varying levels of performance across different demographic groups,

such as race and gender. These disparities can lead to certain groups being unfairly targeted or excluded.

Traditional methods often rely on fair loss functions to address these issues, but they under-perform when

applied to unseen datasets, hence, fairness generalization remains a challenge. In this work, we propose a

data-driven framework for tackling the fairness generalization problem in deepfake detection by leveraging

synthetic datasets and model optimization. Our approach focuses on generating and utilizing synthetic data

to enhance fairness across diverse demographic groups. By creating a diverse set of synthetic samples that

represent various demographic groups, we ensure that our model is trained on a balanced and representative

dataset. This approach allows us to generalize fairness more effectively across different domains. We employ

a comprehensive strategy that leverages synthetic data, a loss sharpness-aware optimization pipeline, and a

multi-task learning framework to create a more equitable training environment, which helps maintain fairness

across both intra-dataset and cross-dataset evaluations. Extensive experiments on benchmark deepfake detec-

tion datasets demonstrate the efﬁcacy of our approach, surpassing state-of-the-art approaches in preserving

fairness during cross-dataset evaluation. Our results highlight the potential of synthetic datasets in achieving

fairness generalization, providing a robust solution for the challenges faced in deepfake detection.

1 INTRODUCTION

Deepfake technology, which combines ”deep learn-

ing” and ”fake,” represents a signiﬁcant advancement

in media manipulation capabilities. Leveraging deep

learning techniques, it enables highly convincing fa-

cial manipulations and replacements in digital me-

dia. While technologically impressive, this capabil-

ity poses serious societal risks, particularly in spread-

ing misinformation and eroding public trust. In re-

sponse, researchers have developed various deepfake

detection techniques that have shown promising ac-

curacy rates in identifying manipulated content (Xu

et al., 2023; Zhao et al., 2021a; Ezeakunne and Liu,

2023; Guo et al., 2022; Pu et al., 2022b; Wang et al.,

2022).

However, a critical challenge has emerged in

the form of fairness disparities across demographic

groups as described in (Trinh and Liu, 2021; Xu

https://orcid.org/0009-0006-5524-5138

https://orcid.org/0000-0001-5440-4316

https://orcid.org/0000-0002-9320-3872

et al., 2022; Nadimpalli and Rattani, 2022; Masood

et al., 2023). Studies have revealed that current detec-

tion methods perform inconsistently across different

demographics, particularly showing higher accuracy

rates for individuals with lighter skin tones compared

to those with darker skin tones (Trinh and Liu, 2021;

Hazirbas et al., 2021). This bias creates a concern-

ing vulnerability where malicious actors could poten-

tially target speciﬁc demographic groups with deep-

fakes that are more likely to evade detection.

While recent algorithmic approaches have shown

promise in improving detection fairness when train-

ing and testing data use similar forgery techniques

(Ju et al., 2024), the real-world application presents

a more complex challenge. Deepfake detectors, typ-

ically developed and trained on standard research

datasets, must ultimately operate in diverse real-world

environments where they encounter deepfakes cre-

ated using various, potentially unknown forgery tech-

niques. This generalization capability is crucial for

practical deployment, particularly on social media

platforms where manipulated content proliferates.

582

Ezeakunne, U., Eze, C. and Liu, X.

Data-Driven Fairness Generalization for Deepfake Detection.

DOI: 10.5220/0013161000003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 3, pages 582-591

ISBN: 978-989-758-737-5; ISSN: 2184-433X

Train Set Test Set A Test Set B

100

Peformance (%)

Fairness Detection

Figure 1: Fairness Generalization Comparison: A detec-

tor’s accuracy can generalize well to unseen test sets A and

B, maintaining consistent detection performance. However,

while fairness metrics are preserved on test set B, they fail

to generalize to test set A, highlighting the challenge of

achieving consistent fairness across different datasets.

While current methods of deepfake detection fo-

cus almost entirely on detection accuracy, it is very

critical to also consider how well these approaches

generalize to fairness. As shown in Figure 1, if a de-

tector was trained on a dataset and its performance for

fairness and detection on the training set are shown

on the left bars, and the same detector is tested on

an unseen dataset (test set A) at the middle, the de-

tection performance, mostly in terms of model accu-

racy may generalize but fairness may fail to general-

ize. On the other hand, when tested on another un-

seen dataset (test set B), it can retain its fairness and

detection performance. As can be seen in Figure 1,

the detector’s detection performance can be general-

ized to both datasets but could fail to ensure fairness

generalization on test set A.

Our research addresses these challenges through

a comprehensive approach that combines data-centric

and algorithmic solutions. We propose a novel frame-

work that utilizes self-blended images (SBI) for syn-

thetic data generation to balance demographic repre-

sentation in the training data, addressing a key source

of fairness disparities. This is coupled with a multi-

task learning architecture that simultaneously opti-

mizes for both detection accuracy and demographic

fairness. The architecture employs an EfﬁcientNet

backbone for feature extraction, with separate heads

for deepfake detection and demographic classiﬁca-

tion.

To enhance generalization capabilities, we im-

plement Sharpness-Aware Minimization (SAM) op-

timization (Lin et al., 2024), which helps ﬁnd more

robust solutions by ﬂattening the loss landscape. Our

framework also incorporates a carefully designed loss

function that balances classiﬁcation accuracy, demo-

graphic prediction, and fairness constraints. This ap-

proach represents a signiﬁcant advancement in creat-

ing deepfake detection systems that are not only ac-

curate but also fair and generalizable across different

demographic groups and forgery techniques.

As demonstrated in our experimental results,

while traditional detectors may maintain detection ac-

curacy across different test sets, they often fail to

preserve fairness when encountering new data. Our

method addresses this limitation by speciﬁcally opti-

mizing for both performance metrics, ensuring con-

sistent fairness across different demographic groups

even when tested on previously unseen datasets.

Therefore, our contributions are as follows:

• We propose synthetic data balancing, to balance

training datasets across demographics to aid fair

learning in deepfake detection.

• Our framework introduces a multi-task architec-

ture, leveraging two classiﬁcation heads; one for

deepfake detection and another for learning the

demographic features of the datasets. This en-

ables the system to not only detect forgery but

also ensure that it is aware of demographic biases,

making it more equitable across different groups.

• The multi-task learning approach, combined with

sharpness-aware loss optimization and robust fea-

ture extraction, makes the system more effective

at generalizing the fairness and detection perfor-

mances to unseen datasets.

• We performed intra-dataset and cross-dataset

evaluations to show the improved performance of

this method.

2 RELATED WORK

2.1 Deepfake Detection

To train deepfake detection models that perform well

during training and in practice; on unseen datasets,

several studies (Li et al., 2020a; Zhao et al., 2021b;

Guan et al., 2022; Li, 2018) have introduced novel

methods for manually synthesizing a variety of face

forgeries similar to deepfakes. These methods help

deepfake detection models learn more generalized

representations of artifacts, improving their ability to

identify deepfakes across different scenarios.

The authors in (Chen et al., 2022) proposed en-

hancing the diversity of forgeries through adversar-

ial training to improve the robustness of deepfake de-

tectors in recognizing various forgeries. FaceCutout

(Das et al., 2021) uses facial landmarks and ran-

domly cuts out different parts of the face (such as

the mouth, eye, etc.) to improve the robustness of

deepfake detection. Face-Xray (Li et al., 2020a) on

the other hand, involves generating blended images

Data-Driven Fairness Generalization for Deepfake Detection

583

(BI) by combining two different faces using a global

transformation and then training a model to distin-

guish between real and blended faces. The authors

in (Shiohara and Yamasaki, 2022) expanded on this

concept by creating a synthetic training dataset with

self-blended images (SBI), which are created by ap-

plying a series of data augmentation techniques to

real images. SBI has demonstrated even better gen-

eralization to previously unseen deepfakes compared

to Face-Xray (Li et al., 2020a). After synthesizing

forged images (i.e., pseudo deepfakes) using special-

ized augmentation techniques, they trained a binary

classiﬁcation model to perform the detection task.

2.2 Fairness in Deepfake Detection

Despite efforts to enhance the generalization of deep-

fake detection to unseen data, limited progress has

been made in mitigating biased performance when

testing on both the same (intra-domain) and different

(cross-domain) dataset distributions.

Recent studies have highlighted signiﬁcant fair-

ness issues in deepfake detection, revealing biases in

both datasets and detection models (Masood et al.,

2023). The work done by (Trinh and Liu, 2021)

and (Hazirbas et al., 2021) discovered substantial er-

ror rate differences across demographic groups, while

(Pu et al., 2022a) found that the MesoInception-4

model by (Afchar et al., 2018) was biased against

both genders. The authors in (Xu et al., 2022) advo-

cated for diverse dataset annotations to address these

biases, and (Nadimpalli and Rattani, 2022) introduced

a gender-balanced dataset to reduce gender-based per-

formance bias. However, these techniques led to only

modest improvements and required extensive data an-

notation. Furthermore, (Ju et al., 2024) worked to im-

prove fairness within the same dataset (intra-domain),

but did not address fairness generalization between

different datasets (cross-domain).

2.3 Fairness Generalization in Deepfake

Detection

Very little work has been done to achieve fairness

generation in deepfake detection. The recent work

done in (Lin et al., 2024) proposed a novel feature-

based technique to achieve fairness generalization and

preservation. They combine disentanglement learn-

ing, fairness learning, and loss optimization to pre-

serve fairness in deepfake detection. In the disen-

tanglement learning module, they utilized a disen-

tanglement loss to expose demographic and domain-

agnostic forgery features. While the fairness learn-

ing module fused the two features from the disentan-

glement learning module to obtain predictions for the

samples. Their framework, however, was trained on

a dataset with imbalanced demographic groups which

introduced some bias to their model.

Our framework approaches and solves fairness

generalization using a data-driven approach. Our

research prioritizes fairness and eliminates bias due

to dataset imbalance by using synthetic datasets to

balance the dataset distribution across the different

groups. We also used a multi-task learning approach,

combined with fairness and loss sharpness-aware op-

timization and robust feature extraction to achieve

signiﬁcant performance improvements on both intra-

dataset evaluation and cross-dataset evaluation.

3 PROPOSED METHOD

Our method aims to enhance both the generalization

and fairness of deepfake detection models through a

data-centric strategy. This section provides a detailed

overview of the problem formulation, followed by the

process of generating our synthetic dataset. We then

outline our approach for addressing dataset imbal-

ance and conclude with a discussion of our multi-task

learning framework, designed to simultaneously op-

timize for accuracy and fairness across demographic

groups.

3.1 Problem Setup

Given an input face image I, the goal is to train a deep-

fake detection model g that classiﬁes whether I is a

deepfake (fake) or a real face. The model g is deﬁned

as:

g : I → {0, 1} (1)

where g(I) indicates the predicted label, with

g(I) = 1 denoting a deepfake and g(I) = 0 denoting a

real face.

Detection Accuracy: The primary goal is to max-

imize the detection accuracy. Given a set of face im-

ages {I

}

i=1

with true labels {y

}

i=1

, the overall accu-

racy of the model g is:

Accuracy =

∑

i=1

1(g(I

) = y

) (2)

where 1(·) is the indicator function that equals 1 if

the prediction matches the true label.

Fairness Constraint: The detector should per-

form fairly across different demographic groups. Let

D represent the demographic group associated with

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

584

an image, and D

be the set of images from the demo-

graphic group k. Deﬁne Accuracy

as the accuracy of

the detector on images from demographic group k:

Accuracy

∑

i∈D

1(g(I

) = y

) (3)

The fairness of the detector is assessed by mini-

mizing the maximum disparity in accuracy across dif-

ferent demographic groups:

Fairness = max

k,l

Accuracy

− Accuracy

(4)

where k and l denote different demographic

groups. The objective is to minimize this fairness dis-

parity.

3.2 Synthetic Data Balancing

To enhance fairness in deepfake detection, we utilize

a technique based on self-blended images (SBI), as

proposed in (Shiohara and Yamasaki, 2022). This

technique involves generating synthetic images to

balance the demographic distribution of the dataset,

thereby improving fairness across various demo-

graphic groups. The dataset samples are shown in

Figure 2. Each real sample (upper row) has a cor-

responding fake sample (bottom row), so the dataset

is also balanced across real and fake classes. The pro-

cess of data balancing using a synthetic dataset is de-

tailed in the sections below.

Generation of Synthetic Images

Let I = {I

, I

, . . . , I

} represent the set of real face

images, where each I

belongs to a speciﬁc demo-

graphic group. The goal is to generate a set of self-

blended images S = {S

, S

, . . . , S

} using the follow-

ing steps:

Base Image Selection:

Select a diverse subset of

real face images, I

base

⊂ I , ensuring coverage across

demographic categories.

Augmentation Process: Apply a series of transfor-

mations to generate self-blended images. For a base

image I

, the synthetic image S

is generated using a

blend function B as follows:

= B(I

, T

)) (5)

where T

represents a transformation operation such

as scaling, rotation, or color adjustment, and B is the

blending function that combines I

with T

Dataset Balancing

To balance the dataset, we increase the representation

of underrepresented demographic groups:

Demographic Representation: Deﬁne the demo-

graphic groups D = {D

, D

, . . . , D

}. Let I

⊂ I

be the set of images belonging to demographic group

. To balance the dataset, we create synthetic im-

ages S

for each group D

B = I ∪ S (6)

where B is the balanced dataset and S =

k=1

This dataset B , is balanced across all the demo-

graphics, and also equal in terms of the number of

real and fake samples.

3.3 Multi-Task Learning

We propose a dual-task learning framework to en-

hance the fairness of deepfake detection models by

leveraging demographic information. The architec-

ture shown in Figure 3, is designed to improve the

generalization of deepfake detection across various

demographic groups, reducing bias in prediction ac-

curacy between these groups.

The input images are passed through the Efﬁ-

cientNet encoder (Tan, 2019), which extracts high-

dimensional feature representations. These features

capture both low-level information (like textures and

edges) as well as high-level semantic features (such as

facial structure and potential tampering artifacts). The

extracted features serve as the input for the real/fake

classiﬁcation head and the demographic head.

Real/Fake Classiﬁcation Head: This head con-

sists of a Multi-Layer Perceptron (MLP) that pro-

cesses the extracted features and predicts whether the

input image is real or fake. The output of this MLP is

a probability score indicating the likelihood that the

image is a deepfake or a genuine one. This branch

focuses on learning the subtle cues and artifacts asso-

ciated with the deepfake generation, such as inconsis-

tencies in lighting, facial details, or blurring artifacts

around the face.

Demographic Head: In parallel to the real/fake

classiﬁcation, the features are passed through a sec-

ond MLP for demographic group classiﬁcation. This

MLP is designed to categorize the input into one of

eight demographic groups, such as Black-Male (B-

M), White-Female (W-F), Asian-Male (A-M), etc.

The purpose of this head is to ensure that the model

learns demographic-speciﬁc features that are indepen-

dent of deepfake artifacts. These features are crucial

for enhancing the fairness of the model across diverse

demographic groups.

Data-Driven Fairness Generalization for Deepfake Detection

585

Figure 2: Sample of Facial Images. The real images (top row) and their corresponding fake/synthetic image (bottom row).

Loss Function

We deﬁne the loss function to incorporate both the

standard classiﬁcation loss and a fairness penalty.

Speciﬁcally, the total loss function L is composed of

three terms: the classiﬁcation loss, the demographic

loss, and the fairness penalty.

Classiﬁcation Loss: Let p = [p

, p

, . . . , p

] be

the predicted probabilities and y = [y

, y

, . . . , y

] be

the true binary labels. The binary cross-entropy loss

real

is deﬁned as:

real

= −

∑

i=1

log(p

) + (1 − y

)log(1 − p

)] (7)

where N is the number of samples, y

is the true

binary label (0 or 1) for the i-th sample, and p

is the

predicted probability for the positive class.

Demographic Loss: Let p

dem,i

dem,i,1

, . . . , p

dem,i,8

] represent the predicted prob-

abilities for the i-th sample across 8 demographic

groups, and y

dem,i

= [y

dem,i,1

, . . . , y

dem,i,8

] be the

one-hot encoded true labels for those groups. The

demographic loss L

dem

is:

dem

= −

dem

∑

i=1

∑

c=1

dem,i,c

log(p

dem,i,c

) (8)

where N

dem

is the number of samples in the demo-

graphic component, and the summation runs over the

8 demographic groups.

Fairness Penalty: To quantify fairness, we com-

pute the variance of the accuracy rates across differ-

ent demographic groups. Let a

denote the accuracy

for demographic group k. The variance of accuracies

Var

acc

is:

Var

acc

∑

k=1



Accuracy

−

Accuracy



(9)

where K is the number of demographic groups,

Accuracy

is the accuracy for group k, and

Accuracy

is the mean accuracy across all groups.

Total Loss: The total loss function L is a combi-

nation of the standard loss, the fairness penalty, and

the fairness loss. The total loss is deﬁned as:

L = L

real

+ λ · Var

acc

+ L

dem

(10)

where λ is a hyperparameter controlling the

weight of the fairness penalty.

Optimization

We employ the Sharpness-Aware Minimization

(SAM) optimizer (Foret et al., 2020) as our optimizer.

SAM is designed to improve the generalization per-

formance of deep neural networks by encouraging the

model to ﬁnd parameter spaces that are not only good

at minimizing the training loss but also robust to small

perturbations in the weight space. This results in a

ﬂatter and smoother loss landscape, which has been

shown to correlate with better generalization.

The SAM optimizer modiﬁes the traditional

gradient-based update by introducing a two-step pro-

cess that seeks to ﬁnd parameters that minimize both

the loss and its sensitivity to perturbations. Let L(w)

denote the total loss function of the model, where w

represents the network’s parameters. SAM performs

the following steps:

Perturbation Step: SAM ﬁrst ﬁnds a perturbation

ε that maximizes the loss function around the current

parameters. The perturbation ε is chosen to make the

loss worse within a neighborhood of the current pa-

rameter values:

ε(w) = arg max

∥ε∥≤ρ

L(w + ε) (11)

where ρ controls the size of the neighborhood in

which the optimizer searches for the sharpest direc-

tions.

Parameter Update Step: Once the worst-case

perturbation ε is found, the SAM optimizer updates

the parameters by minimizing the loss function at

w + ε:

w ← w − η∇L(w + ε) (12)

where η is the learning rate.

This two-step process leads to solutions in ﬂat-

ter regions of the loss surface, improving the model’s

generalization and robustness.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

586

Figure 3: An overview of our proposed method. We utilize EfﬁcientNet (Tan, 2019) to extract deep features from input images

for the feature extraction module. For the classiﬁcation module, two heads are used: the real/fake head predicts whether the

input is real or fake, and the demographic head predicts the demographic group. The demographic classiﬁcation head outputs

probabilities for one of eight demographic categories based on both gender and ethnicity. These categories include Black-Male

(B-M), Black-Female (B-F), White-Male (W-M), White-Female (W-F), Asian-Male (A-M), Asian-Female (A-F), Other-Male

(O-M), and Other-Female (O-F). We use SAM (Sharpness-Aware Minimization) for the optimization module to ﬂatten the

loss landscape and enhance fairness generalization across demographic groups.

4 EXPERIMENTS

4.1 Experimental Settings

Dataset

To evaluate the fairness generalization capability of

our proposed approach, we train our model on the

widely used FaceForensics++ (FF++) dataset (Rossler

et al., 2019). For testing, we use FF++, Deepfake De-

tection Challenge (DFDC) (Dolhansky et al., 2020),

and Celeb-DF (Li et al., 2020b). We used the real im-

ages from these datasets and created synthetic images

as the fake images using the method in Section 3.2.

Since the original datasets do not include demo-

graphic information, we follow the approach in (Ju

et al., 2024) for data processing and annotation. The

datasets are categorized by race and gender, form-

ing the following intersection groups: Black-Male (B-

M), Black-Female (B-F), White-Male (W-M), White-

Female (W-F), Asian-Male (A-M), Asian-Female (A-

F), Other-Male (O-M), and Other-Female (O-F). The

Celeb-DF (Li et al., 2020b) does not contain the Asian

sub-group.

Evaluation Metrics

For evaluating our approach, we use several perfor-

mance metrics, including the Area Under the Curve

(AUC), Accuracy, and True Positive Rate (TPR).

These metrics allow us to benchmark our method

against previous works. To assess the degree of fair-

ness of the model, we analyze the accuracy gaps

across different races and genders.

Baseline Methods

For intra-dataset evaluation, we compare our ap-

proach against recent works in deepfake detection

fairness, speciﬁcally DAW-FDD (2023) (Ju et al.,

2024) and Lin et al. (2024) (Lin et al., 2024). For

cross-dataset evaluation, our method is compared to

the fairness generalization method proposed by Lin et

al. (2024) (Lin et al., 2024). To the best of our knowl-

edge, Lin et al. (2024) (Lin et al., 2024) is the prior

Data-Driven Fairness Generalization for Deepfake Detection

587

Xception

DAW-FDD

Lin et al. Ours

100

Accuracy (%)

Male Female

Figure 4: Intra-dataset evaluation across different methods.

Methods are trained and tested on the same distribution as

the training set (FF++ (Rossler et al., 2019)): Performance

comparison across gender. See Table 1 for exact numerical

values.

work addressing the fairness generalization problem.

We also used the Xception (Chollet, 2017) as a base-

line for comparison during both intra-dataset evalua-

tion and cross-dataset evaluation. The Xception net-

work (Chollet, 2017) was implemented using transfer

learning to perform the deepfake detection task.

Implementation Details

All experiments are conducted using PyTorch. For

our method, we set the batch size to 16 and train the

model for 100 epochs. To incorporate fairness, we use

a fairness penalty weight λ of 20 (See Eq. 10). For

optimization, we employ the SAM optimizer (Foret

et al., 2020). The learning rate is set to 5e-4, momen-

tum is conﬁgured at 0.9, and weight decay is set to

5 × 10

−3

4.2 Experimental Results

In this section, we discuss the results of our evalua-

tions and present our ﬁndings. We start by discussing

the results of our intra-dataset evaluations and then,

the results of the cross-dataset evaluations.

Intra-Dataset Evaluation Results

We perform intra-dataset evaluations to measure the

model’s performance on the same dataset distribution,

assessing its ability to ﬁt the training data. As shown

in Table 1, our method achieves comparable perfor-

mance to baseline approaches in deepfake detection.

However, it demonstrates a signiﬁcant improvement

over baseline methods in fairness preservation across

demographic groups. Figure 4 presents the evaluation

results for the gender demographic group, while Fig-

ure 5 illustrates the results for the racial demographic

group. As evidenced in these ﬁgures, the disparity

between the Male and Female subgroups is substan-

tially reduced in our approach with a 0.12% differ-

ence (in accuracy) between the subgroups, compared

Xception

DAW-FDD

Lin et al. Ours

100

Accuracy (%)

Black White Asian Others

Figure 5: Intra-dataset evaluation across different methods.

Methods are trained and tested on the same distribution as

the training set (FF++ (Rossler et al., 2019)): Performance

comparison across race. See Table 1 for exact numerical

values.

to the 1.58%, 2.90% and 3.87% differences between

the subgroups for the baseline approaches. Addition-

ally, the disparities among racial groups are signiﬁ-

cantly lower with our approach with a 0.71% differ-

ence (which happens to be between asian and black -

considering minimum and maximum accuracies) be-

tween the subgroups, while the baselines had differ-

ences of 5.28%, 1.64% and 1.52%. These results

highlight the effectiveness and superiority of our ap-

proach in achieving fairness generalization.

4.2.1 Cross-Dataset Evaluation Results

We conducted cross-dataset evaluations to assess

our model’s performance on out-of-distribution data,

evaluating its ability to generalize to unseen datasets.

As recorded in Table 2, and visualized in Figures

6,7,8 and 9, our approach achieves comparable per-

formance to baseline methods in deepfake detection

while signiﬁcantly outperforming them in fairness

generalization across demographic groups.

Figures 6 and 7 visualize the results of evaluations

from Table 2 where the model was trained on FF++

and tested on DFDC, while Figures 8 and 9 visual-

izes the results from Table 2 for training on FF++ and

testing on Celeb-DF. These ﬁgures demonstrate that

our approach exhibits minimal performance dispar-

ities across different gender and racial groups com-

pared to the baselines.

As shown in Table 2, on the DFDC dataset, the

Xception baseline has the lowest disparity in accuracy

across genders. The disparity in accuracy between

males and females is 1.85% for Xception, and then

4.5% (Lin et al., 2024) and 8.03% (Ours). Across race

on the DFDC dataset, our method has the lowest accu-

racy disparity across the races at 11.15% (which hap-

pens to be between Asian and white) with the baseline

recording accuracy disparities of 24.53% and 14.39%.

As shown in Table 2, on the Celeb-DF dataset,

our method has a minimal disparity in accuracy across

both gender and race. Across gender, we got 15.47%

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

588

Table 1: Intra-dataset evaluation compares different deepfake detection approaches by training and testing on the same

dataset’s distribution. Results are computed on each dataset’s test set, with the best performance shown in bold.

FF++ (Rossler et al., 2019)

Xception DAW-FDD Lin et al. Ours

(Chollet, 2017) (Ju et al., 2024) (Lin et al., 2024)

Gender

Male

TPR (%) 98.62 93.48 92.13 91.91

AUC (%) 90.23 96.91 97.56 97.71

ACC (%) 87.03 92.65 92.07 92.39

Female

TPR (%) 98.85 97.15 98.02 91.89

AUC (%) 93.48 98.05 98.94 97.72

ACC (%) 88.61 95.55 95.94 92.27

Race

Black

TPR (%) 99.59 96.36 97.17 91.27

AUC (%) 95.85 97.92 98.22 97.57

ACC (%) 89.00 94.67 94.66 92.04

White

TPR (%) 98.71 95.47 94.81 91.88

AUC (%) 92.57 97.54 98.21 97.70

ACC (%) 88.84 94.29 94.09 92.28

Asian

TPR (%) 98.73 94.40 96.44 92.71

AUC (%) 89.40 96.39 97.54 97.91

ACC (%) 83.72 93.04 93.46 92.75

Other

TPR (%) 98.51 95.77 95.83 91.73

AUC (%) 89.87 98.23 98.58 97.66

ACC (%) 84.94 94.68 94.98 92.25

Xception Lin et al. Ours

Accuracy (%)

Male Female

Figure 6: Cross-dataset evaluation across different meth-

ods. Methods are trained on (FF++ (Rossler et al., 2019))

and tested on DFDC (Dolhansky et al., 2020). Performance

comparison is across gender. See Table 2 for exact numeri-

cal values.

disparity in accuracy, while the baselines recorded

wider disparities of 40.66% and 25.47% in accuracy.

Across race, our method recorded 11.54% as the max-

imum disparity in accuracy (which happens to be be-

tween white and black), the baselines recorded wider

disparities of 26.41% and 14.39%.

These results highlight the effectiveness of our ap-

proach in achieving fairness generalization in cross-

dataset scenarios, demonstrating improved consis-

tency across both racial and gender demographics

compared to baseline methods.

5 LIMITATIONS

The limitations of our method are:

Xception Lin et al. Ours

Accuracy (%)

Black White Asian Others

Figure 7: Cross-dataset evaluation across different meth-

ods. Methods are trained on (FF++ (Rossler et al., 2019))

and tested on DFDC (Dolhansky et al., 2020). Performance

comparison is across race. See Table 2 for exact numerical

values.

• Dependence on Demographically Annotated

Datasets: The effectiveness of our method relies

on the availability of detailed demographic anno-

tations. Such datasets are often difﬁcult to obtain

and may not always be comprehensive enough,

potentially impacting the fairness assessment.

• Trade-Off Between Fairness and Detection Per-

formance: Improving fairness across demo-

graphic groups can result in a trade-off with over-

all detection performance. Enhancements aimed

at reducing demographic disparities may lead to

a decrease in the model’s overall accuracy in de-

tecting deepfakes.

These limitations highlight the need for further

research to address the challenges of dataset depen-

dency and the balance between fairness and detection

performance.

Data-Driven Fairness Generalization for Deepfake Detection

589

Table 2: Cross-dataset evaluation across different methods for deepfake detection. The methods were trained on (FF++

(Rossler et al., 2019)) and tested on DFDC (Dolhansky et al., 2020) and Celeb-DF (Li et al., 2020b).

DFDC (Dolhansky et al., 2020) Celeb-DF (Li et al., 2020b)

Xception Lin et al. Ours Xception Lin et al. Ours

(Chollet, 2017) (Lin et al., 2024) (Chollet, 2017) (Lin et al., 2024)

Gender

Male

TPR (%) 85.11 37.81 35.49 98.68 76.89 78.38

AUC (%) 59.97 60.08 63.30 36.97 66.93 75.44

ACC (%) 43.75 63.11 66.98 43.29 56.16 66.41

Female

TPR (%) 77.59 44.36 40.57 99.24 90.19 86.67

AUC (%) 50.70 60.06 59.12 53.26 75.83 82.09

ACC (%) 45.60 58.61 58.98 83.95 81.63 81.88

Race

Black

TPR (%) 76.28 45.14 57.58 98.68 76.89 78.38

AUC (%) 50.40 59.81 68.83 47.63 67.03 72.53

ACC (%) 39.23 58.66 65.54 63.99 64.10 69.61

White

TPR (%) 80.82 38.18 32.98 99.24 90.18 86.69

AUC (%) 56.45 60.14 60.18 53.58 77.02 82.20

ACC (%) 48.79 59.80 60.24 81.38 80.43 81.15

Asian

TPR (%) 82.50 42.50 19.16 - - -

AUC (%) 54.75 51.78 43.89 - - -

ACC (%) 22.38 57.04 71.39 - - -

Others

TPR (%) 87.68 55.93 36.18 100 96.66 73.33

AUC (%) 60.39 72.52 65.38 28.31 64.53 84.97

ACC (%) 46.91 71.43 65.82 17.22 32.78 70.00

Xception Lin et al. Ours

Accuracy (%)

Male Female

Figure 8: Cross-dataset evaluation across different methods.

Methods are trained on (FF++ (Rossler et al., 2019)) and

tested on Celeb-DF (Li et al., 2020b). Performance com-

parison is across gender. See Table 2 for exact numerical

values.

6 CONCLUSION

We have presented a novel approach to address the

critical challenge of fairness generalization in deep-

fake detection systems. Our method combines three

key innovations: a data-centric strategy using syn-

thetic image generation to balance demographic rep-

resentation, a multi-task learning architecture that si-

multaneously optimizes detection accuracy and de-

mographic fairness, and sharpness-aware optimiza-

tion to enhance generalization capabilities. Our com-

prehensive evaluations demonstrate the effectiveness

of this approach. In intra-dataset testing, our method

achieved comparable detection performance to exist-

ing approaches while signiﬁcantly reducing demo-

graphic disparities - showing only 0.12% accuracy

difference between gender groups compared to up

Xception Lin et al. Ours

Accuracy (%)

Black White Others

Figure 9: Cross-dataset evaluation across different methods.

Methods are trained on (FF++ (Rossler et al., 2019)) and

tested on Celeb-DF (Li et al., 2020b). Performance com-

parison is across races. See Table 2 for exact numerical

values

to 3.87% in baseline methods, and 0.71% difference

across racial groups compared to up to 5.28% in

baselines. More importantly, in challenging cross-

dataset scenarios, our approach demonstrated supe-

rior fairness generalization. When tested on the

Celeb-DF dataset, our method reduced gender-based

accuracy disparities to 15.47% compared to up to

40.66% in baselines, while maintaining strong over-

all detection performance. These results suggest that

our integrated approach of balanced synthetic data,

demographic-aware learning, and robust optimization

provides a promising direction for developing deep-

fake detection systems that are both accurate and de-

mographically fair. Future work could explore ex-

tending these techniques to other domains where al-

gorithmic fairness and generalization are crucial con-

cerns.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

590

REFERENCES

Afchar, D., Nozick, V., Yamagishi, J., and Echizen, I.

(2018). Mesonet: a compact facial video forgery de-

tection network. In 2018 IEEE international work-

shop on information forensics and security (WIFS),

pages 1–7. IEEE.

Chen, L., Zhang, Y., Song, Y., Liu, L., and Wang, J. (2022).

Self-supervised learning of adversarial example: To-

wards good generalizations for deepfake detection.

Chollet, F. (2017). Xception: Deep learning with depthwise

separable convolutions. In Proceedings of the IEEE

conference on computer vision and pattern recogni-

tion, pages 1251–1258.

Das, S., Seferbekov, S., Datta, A., Islam, M. S., and Amin,

M. R. (2021). Towards solving the deepfake prob-

lem: An analysis on improving deepfake detection us-

ing dynamic face augmentation.

Dolhansky, B., Bitton, J., Pﬂaum, B., Lu, J., Howes, R.,

Wang, M., and Ferrer, C. C. (2020). The deepfake

detection challenge (dfdc) dataset. arXiv preprint

arXiv:2006.07397.

Ezeakunne, U. and Liu, X. (2023). Facial deepfake detec-

tion using gaussian processes.

Foret, P., Kleiner, A., Mobahi, H., and Neyshabur, B.

(2020). Sharpness-aware minimization for efﬁ-

ciently improving generalization. arXiv preprint

arXiv:2010.01412.

Guan, J., Zhou, H., Gong, M., Ding, E., Wang, J., and Zhao,

Y. (2022). Detecting deepfake by creating spatio-

temporal regularity disruption.

Guo, H., Hu, S., Wang, X., Chang, M.-C., and Lyu, S.

(2022). Robust attentive deep neural network for de-

tecting gan-generated faces.

Hazirbas, C., Bitton, J., Dolhansky, B., Pan, J., Gordo, A.,

and Ferrer, C. C. (2021). Towards measuring fairness

in ai: the casual conversations dataset.

Ju, Y., Hu, S., Jia, S., Chen, G. H., and Lyu, S. (2024).

Improving fairness in deepfake detection.

Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., and

Guo, B. (2020a). Face x-ray for more general face

forgery detection.

Li, Y. (2018). Exposing deepfake videos by detecting face

warping artif acts.

Li, Y., Yang, X., Sun, P., Qi, H., and Lyu, S. (2020b).

Celeb-df: A large-scale challenging dataset for deep-

fake forensics. In Proceedings of the IEEE/CVF con-

ference on computer vision and pattern recognition,

pages 3207–3216.

Lin, L., He, X., Ju, Y., Wang, X., Ding, F., and Hu, S.

(2024). Preserving fairness generalization in deepfake

detection.

Masood, M., Nawaz, M., Malik, K. M., Javed, A., Irtaza,

A., and Malik, H. (2023). Deepfakes generation and

detection: State-of-the-art, open challenges, counter-

measures, and way forward.

Nadimpalli, A. V. and Rattani, A. (2022). Gbdf: gender

balanced deepfake dataset towards fair deepfake de-

tection.

Pu, M., Kuan, M. Y., Lim, N. T., Chong, C. Y., and Lim,

M. K. (2022a). Fairness evaluation in deepfake detec-

tion models using metamorphic testing.

Pu, W., Hu, J., Wang, X., Li, Y., Hu, S., Zhu, B., Song,

R., Song, Q., Wu, X., and Lyu, S. (2022b). Learning a

deep dual-level network for robust deepfake detection.

Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies,

J., and Nießner, M. (2019). Faceforensics++: Learn-

ing to detect manipulated facial images. In Proceed-

ings of the IEEE/CVF international conference on

computer vision, pages 1–11.

Shiohara, K. and Yamasaki, T. (2022). Detecting deepfakes

with self-blended images.

Tan, M. (2019). Efﬁcientnet: Rethinking model scaling

for convolutional neural networks. arXiv preprint

arXiv:1905.11946.

Trinh, L. and Liu, Y. (2021). An examination of fairness of

ai models for deepfake detection.

Wang, J., Wu, Z., Ouyang, W., Han, X., Chen, J., Jiang,

Y.-G., and Li, S.-N. (2022). M2tr: Multi-modal multi-

scale transformers for deepfake detection.

Xu, Y., Raja, K., Verdoliva, L., and Pedersen, M. (2023).

Learning pairwise interaction for generalizable deep-

fake detection.

Xu, Y., Terh

orst, P., Raja, K., and Pedersen, M. (2022). A

comprehensive analysis of ai biases in deepfake detec-

tion with massively annotated databases.

Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., and Yu,

N. (2021a). Multi-attentional deepfake detection.

Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., and Xia,

W. (2021b). Learning self-consistency for deepfake

detection.

Data-Driven Fairness Generalization for Deepfake Detection

591