
3.6 Libraries and Toolkits Used
The research harnessed the capabilities of various
Python libraries and toolkits, including but not lim-
ited to OpenCV for extracting video frames, Image-
DataGenerator for data preparation and augmenta-
tion, and MediaPipe for facial detection and land-
mark allocation. Additionally, libraries such as Mat-
plotlib, NumPy, Pandas, and scikit-learn were utilized
for data visualization, manipulation, model evalua-
tion, and metrics calculation, respectively, ensuring a
comprehensive and robust methodological approach.
4 RESULTS
The evaluation of our proposed method’s perfor-
mance involves several aspects, including the appli-
cation of face cut-out augmentations to the Face-
Forensics++ (FF++) and Celeb-DF datasets, com-
parative analysis with different training settings, and
benchmarking against state-of-the-art deepfake de-
tection techniques.A comparative analysis of results
was conducted under three distinct settings. Base-
line(Original faces without any augmentation),Cut-
out 1(Four cut-outs placed strategically on the chin,
mouth, jawline, and forehead regions) and Cut-out 2(
Four cut-outs placed strategically on the left eye, right
eye, both eyes, and nose regions.)
• Phase One: Cut-out Technique Evaluation with
Each Dataset During this phase, three image
groups were created from each dataset: Baseline,
Cut-out 1, and Cut-out 2. Subsequently, these
groups were trained using the selected deep con-
volutional models, EfficientNet-B7 and Xception-
Net. The results demonstrated that models trained
with the Cut-out 2 group significantly outper-
formed those in the Baseline and Cut-out 1 groups
(Figure.3). Interestingly, the Cut-out 1 group
occasionally underperformed the Baseline group,
as observed in training with the EfficientNet-B7
model (Table1).
The results indicated substantial improvements in
the performance of the EfficientNet and Xception
models when trained with the Cut-out 2 group,
with accuracy gains ranging from 1.23% to 17.7%
compared to the Baseline group. These find-
ings highlight the effectiveness of the Cut-Out 2
dataset in training more robust facial recognition
models. This can be attributed to the enforced
learning within the Cut-Out 2 dataset, emphasiz-
ing distinguishing facial regions that are critical in
differentiating fake from genuine faces.
Figure 3: Test results in phase 1.
In the context of the Celeb-DF dataset, the Cut-
out 2 group achieved log-loss results 43.18% bet-
ter than the Cut-out 1 group when using the
EfficientNet-B7 model, closely matched by the
Xception model. Models trained with face Cut-
out 2 augmentations consistently demonstrated
superior performance. These findings suggest that
training with Cut-out 2 images led models to pri-
oritize the exposed facial regions, such as the fore-
head, cheeks, and chin. Consequently, it can be
inferred that regions of the face outside the cen-
tral features (eyes and nose) provide more sig-
nificant information for discerning differences be-
tween authentic and synthetic faces.
This aligns with a study by Huang et al.(Huang
et al., 2012), which found that even when facial
expressions were occluded, models could identify
a majority of facial expressions by relying on ex-
ternal facial features for cues. Similarly, our study
suggests that features beyond the central region of
the face contain crucial information for detecting
disparities between similar faces. In cases where
faces are very similar, such as with deepfakes, fo-
cusing on facial features beyond the central region
of the face can lead to more accurate detection of
differences.
• Phase Two: Evaluation of the Performance with a
Combined Dataset
In this phase, the datasets from Phase One were com-
bined to increase the overall volume of training data.
This augmentation aimed to improve the model’s gen-
eralization and performance with previously unseen
data while exposing it to more diverse examples. Ta-
ble 2 presents the results of the second phase, which
evaluated the performance of three different groups
(Baseline, Cut-out 1, and Cut-out 2) with a combined
dataset of face images. The models were trained for
20 epochs, and their performance was assessed based
on AUC, ACC (accuracy), and log-loss metrics.
EfficientNet-B7 with Cut-out 2 achieved the best
performance, with an AUC of 0.89, ACC of 0.91, and
log-loss of 0.45. The Xception model with Cut-out 2
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
312