Behavior-Based Deepfake Detection:

Leveraging Cognitive Response to Visual Face Perception

Hendrik Graupner

1,2 a

, Mohammad Yeghaneh Abkenar

2,3 b

, Lisa Schwetlick

, Ralf Engbert

and Christoph Meinel

Hasso Plattner Institute, University of Potsdam, Potsdam, Germany

Innovations Department, Bundesdruckerei GmbH, Berlin, Germany

Department of Linguistics, University of Potsdam, Potsdam, Germany

Psychophysics Lab, Swiss Federal Institute of Technology Lausanne, Lausanne, Switzerland

Department of Psychology, University of Potsdam, Potsdam, Germany

Keywords:

Internet Security, Deepfake Detection, Biometrics, Eye Tracking.

Abstract:

Face presentation attacks are a propagating issue in an increasingly digitally interconnected world. One of

the most recent developments is deepfake impersonation attacks in live video streams. Behavioral biometric

analysis is a crucial part of a comprehensive solution to this pressing issue. This paper proposes the application

of biological responses to visual self-recognition as a dynamic biometric trait. Self-recognition is a cognitive

process that can be leveraged as in-brain identity validation. A sophisticated pre-trained model classiﬁes eye-

tracking data to determine the face in the user’s current visual focus. One eminent use case is the protection

of online video conferences. This paper provides the architecture of a prototypical implementation based on

an open-source video conferencing platform. Our work of interdisciplinary research aims to contribute to a

holistic solution to protect our modern communication systems and restore trust in digitization.

1 INTRODUCTION

Today’s society is heavily based on digital systems,

ranging from communication to ﬁnancial services.

In this environment, identity theft is a propagating

security issue. Sophisticated impersonation, e.g.,

based on silicone masks, represents a signiﬁcant chal-

lenge (Hernandez-Ortega et al., 2023). More re-

cently, the additional threat of video and audio deep-

fakes drastically increased the attack potential in on-

line communication systems (Europol, 2023; CNN,

2024). Deepfakes are a face manipulation technique

used to change the appearance of a person in a digital

video. Without the aid of modern face presentation

attack (FPA) detection systems, people can no longer

trust their sole perception.

We claim that a comprehensive solution must in-

volve biometrics. Physiological features and human

behavior are a rich source of information in a person’s

video. Including this information in deepfake detec-

tion schemes, makes it signiﬁcantly more difﬁcult to

https://orcid.org/0000-0002-5305-1624

https://orcid.org/0009-0006-5423-5376

produce undetectable fakes. Biometric recognition is

already an integral part of securing identities in mod-

ern computer systems. Our research aims to combine

existing knowledge and investigate novel approaches

from the ﬁeld.

This paper utilizes an experimental in-brain iden-

tity validation mechanism and aims to employ it as a

dynamic biometric trait. The approach based on vi-

sual self-recognition leverages modern eye-tracking

analysis. Due to its success in neuromarketing (Iloka

and Anukwe, 2020), eye tracking is an advancing

technology of the present day (Grand View Research,

2022). It introduces its application of detecting iden-

tity forgery through deepfake attacks in online video-

based systems, i.e., video conferencing and video

identiﬁcation.

Section 2 provides a brief overview of related

work and concepts on which our approach is based.

Former work on the utilization of self-recognition as

a biometric trait is summarized in Section 3. Sec-

tion 4 elaborates on application of the deepfake sce-

nario outlining a detection algorithm and introducing

the two use cases of deepfake attacks in video con-

964

Graupner, H., Abkenar, M. Y., Schwetlick, L., Engbert, R. and Meinel, C.

Behavior-Based Deepfake Detection: Leveraging Cognitive Response to Visual Face Perception.

DOI: 10.5220/0013256200003911

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 1, pages 964-971

ISBN: 978-989-758-731-3; ISSN: 2184-4305

ferencing and video identiﬁcation. Implications and

limitations are discussed in Section 5 and conclusions

are drawn in Section 6.

This paper aims to contribute the ﬁrst application

of visual self-recognition in a real-world scenario and

a concrete scheme to detect video deepfakes in real

time.

2 BACKGROUND

2.1 Related Work

Based on a 2022 systematic literature review by

Rana et al. (Rana et al., 2022), deepfake detection

may be grouped into the categories: “deep learning-

based techniques, classical machine learning-based

methods, statistical techniques, and blockchain-based

techniques”. Deep learning techniques, such as con-

volutional neural networks, outperform classical ma-

chine learning and statistics in this task. Hence, there

are essentially only two categories of approaches.

The authors conclude that deep-learning-based ap-

proaches are most common in detecting deepfakes

and introduce their work DeepfakeStack (Rana and

Sung, 2020) which combines multiple detection algo-

rithms to achieve detection accuracy of up to 99.65 %.

Machine-learning-based detection does not general-

ize well beyond the deepfake algorithm it was trained

for. Content credentials, e.g., by Hasan et al. (Hasan

and Salah, 2019), use blockchains to “trace and track

the provenance and history of digital content to its

original source”.

Recently, researchers have begun to explore the

dimension of biometrics as an alternative approach to

address the issue from a different perspective. E.g.,

Agarwal et al. (Agarwal et al., 2020) and Cozzolino

et al. (Cozzolino et al., 2021) make use of deep neu-

ral networks (DNNs) that learn biometric features like

facial muscle movements. In 2022, a method that cov-

ers static analysis of multiple biometric features in the

face (e.g., visible blood ﬂow in veins) was published

by Intel (Intel Corporation, 2022).

2.2 Eye Tracking

The “eye-mind assumption,” proposed by Just and

Carpenter (Just and Carpenter, 1980), suggests that

the brain processes the information corresponding

to the ﬁxation of the eyes. At the same time, in-

terest, mental load, arousal, and implicit processing

elicit changes in pupil size, according to Hess and

Polt (Hess and Polt, 1960). Therefore, the cognitive

reaction to visual stimuli can be measured through

eye tracking to some extent. This work focuses on

the time courses of pupil size and microsaccade rates.

Microsaccades represent the fastest component of ﬁx-

ational eye movements generated involuntarily during

visual ﬁxation (Martinez-Conde et al., 2004).

Eye tracking is an emerging technology as indi-

cated by industry reports, e.g. (Grand View Research,

2022). The growing popularity in recent years can be

attributed to its success in neuromarketing (Iloka and

Anukwe, 2020). We expect that cameras capable of

tracking pupils will be present in many end-user de-

vices in the foreseeable future. In 2024, they are al-

ready integrated into certain models of smartphones,

gaming laptops, and virtual reality (VR) headsets.

2.3 Video Deepfakes

Deepfake is a recent face manipulation technique

used for changing the appearance of a person in a dig-

ital image. It can be used to alter certain attributes,

facial expressions, or swap entire faces. The latter is

known as identity swap which “[. . . ] consists of re-

placing the face of one person in a video with the face

of another person” (Tolosana et al., 2020).

The same technology used for static deepfake can

be used in live streams. In practice, the attacker in-

tercepts their own webcam’s video stream with a live

deepfake application, e.g., DeepFaceLive (Github,

2024a). This software uses a pre-trained model to

swap the face in real-time. A second software, e.g.,

OBS Studio (OBS Project, 2024), is required to serve

the manipulated video stream as a virtual webcam

device to the video conferencing. The ﬁnancial re-

sources required for this setup are relatively low.

Based on our experience, the hardware for acceptable

performance can be obtained for around $3,000.

2.4 Video-Ident

Video-Ident is the name of a German video identiﬁ-

cation method. In Germany and Austria, it is com-

monly used for remote know-your-customer proce-

dures, e.g., when opening bank accounts. It is im-

plemented by many providers, e.g., POSTIDENT by

Deutsche Post (Deutsche Post, 2024).

Video identiﬁcation builds on a video conferenc-

ing application to manually verify a client’s iden-

tity. A human agent will request the client to enable

their webcam and show their national identity (ID)

card. The card’s authenticity is veriﬁed by observa-

tion of the hologram security features when the card

is moved by the holder. The trained agent visually

compares the facial features of the ID card and the

client to match the claimed and actual identities.

Behavior-Based Deepfake Detection: Leveraging Cognitive Response to Visual Face Perception

965

Table 1: Basic cross-validated model performance metrics (including standard deviation) of four different classiﬁers. The

best-performing model (bold) is a pre-trained generic classiﬁer utilizing pupil size and microsaccade rate features.

Model AUROC F

-Score EER

Generic, single trials 0.61 ± 0.03 0.59 ± 0.03 0.41 ± 0.03

Generic, aggregated trials (pupil size) 0.74 ± 0.08 0.63 ± 0.06 0.34 ± 0.12

Generic, aggregated trials 0.82 ± 0.07

0.82 ± 0.07

0.82 ± 0.07 0.76±0.03

0.76 ± 0.03

0.76 ± 0.03 0.23±0.07

0.23 ± 0.07

(pupil size, microsaccade rate)

Individual, single trials 0.62 ± 0.15 – 0.41 ± 0.12

3 SELF-RECOGNITION

BIOMETRICS

Self-recognition is the cognitive process of recog-

nizing our own face among others. This effect can

be measured using eye tracking (see Section 2.2)

as shown in a former feasibility analysis (Graupner

et al., 2023). Exemplary models were trained on a

public data set (Schwetlick et al., 2023a) collected

in an empirical study (Schwetlick et al., 2023b) of

116 volunteers in 2022. These models are binary clas-

siﬁers differentiating between self and non-self-face

perception.

Table 1 shows the results of the four different

models that were trained and evaluated in our earlier

work (Graupner et al., 2023). The best-performing

model (bold) is a generic classiﬁer on aggregated tri-

als using pupil size and microsaccade rate features. In

a cross-validated test, it achieves a mean accuracy of

0.82 measured in the area under the receiver operat-

ing characteristic curve (AUROC). A more detailed

description of the data, feature and model engineer-

ing of the underlying machine learning model can be

found in our original work.

These results can be leveraged to implement vi-

sual self-recognition as a biometric trait. A promis-

ing approach is its application as an identity validation

mechanism. When a person perceives a face image or

video on a screen, the generic classiﬁer can be used to

detect the observed face class of self or non-self. This

classiﬁcation result facilitates the decision of whether

the face displayed on the screen corresponds to the

individual positioned in front of the screen. I.e., the

users self-verify their identities to the system. Based

on this important piece of information, an identity val-

idation system may detect identity spooﬁng attacks in

certain scenarios.

Other potential applications include continuous

authentication based on individually trained models.

The system observes the user’s face perception on var-

ious occasions and generates long-term identity con-

ﬁdence based on the face recognition results. Due to

the high-variance nature of biometric data, fast one-

time authentication may not be viable in a real-world

scenario.

3.1 Classiﬁcation Model

The underlying model is a generic binary classiﬁer of

the classes self and non-self. It distinguishes between

the perception of the observed user’s face and another

person’s face. It is trained on a publicly available,

high-quality data set on eye-tracking data of face per-

ception (Schwetlick et al., 2023a). The data set is im-

balanced by 1 to 10, thus, it was balanced by random

undersampling.

The algorithm found to be most effective is logis-

tic regression (hyperparameters: solver = lbfgs; max.

iter. = 1000; reg. = L2; conv. tol. = 0.0001). The

best feature set extracted from the raw eye-tracking

data consists of the global maximum of the pupil size,

the second local minimum of the microsaccade rate,

and the standard deviation of the microsaccade rate.

As shown in Table 1 (bold), the model performance is

AUROC = 0.82, F

= 0.76, and EER = 0.23.

3.2 Application to Real-World Problems

First real-world applications may be aimed at scenar-

ios in which face observation does not require addi-

tional effort by the users. This approach makes it rel-

atively uncomplicated to collect data on a larger scale.

One scenario that may be apparent is video conferenc-

ing. In the course of a video conference participants

are often transmitting video streams of their faces.

Those real-time face videos are observed by other

participants and the transmitting users themselves.

Hence, self-recognition can be conducted completely

passively without the necessity of additional user in-

teraction.

The technology requires an eye-tracking device to

be present at the site of users observed by the system.

In our study setup, we use relatively low-cost desktop

eye trackers, e.g., the GP3 HD by Gazepoint (Gaze-

point, 2024). This sophisticated requirement may cur-

rently limit the application of self-recognition tech-

nology to special scenarios that require exceptional

security. However, in the future, it is very likely

that an increasing number of devices already provide

built-in functionality to capture eye-tracking data. In

2024, infrared (IR) cameras are already integrated

BIOSIGNALS 2025 - 18th International Conference on Bio-inspired Systems and Signal Processing

966

into certain models of smartphones, gaming laptops,

and VR headsets.

Another important aspect is the possibility of ex-

pansion to include additional modalities and technolo-

gies. Due to the high-variance nature of behavioral

biometrics, short-term measurements may only be uti-

lized with a low conﬁdence level. Other solutions may

be able to deliver fast results with high conﬁdence

but have other limitations. For example, in deep-

fake detection, a combined approach of deep learning

and behavioral biometrics may be more powerful than

each of the techniques alone. Deep learning classiﬁers

can produce quick results given video images but are

highly limited to the deepfake algorithms they were

trained for. Behavioral biometrics need multiple ob-

servations and consequently more time, but they do

not rely on artifacts of speciﬁc deepfake algorithms.

A joint solution has the potential to combine the best

of both worlds.

3.3 User Data and Privacy

Considerations

In the scope of this paper, a legally applicable privacy

expert opinion is not given. However, we believe that

modern security technologies need to consider user

privacy already by design.

The approach presented in this paper processes

biometric user data which may be used to deduct med-

ical information. Due to the high-sensitive nature of

such data, it is of severe importance, that the system

is limited to the intended task of self-recognition de-

tection. Consequently, the processes of interest in this

solution are local data collection, data transmission

to remote services, and remote data processing. Fol-

lowing the paradigm or data minimization, only in-

evitable data is recorded and transmitted at each step.

Under no circumstances, the system processes infor-

mation about the users (e.g.name), to prevent data

misuse by giving the opportunity to consolidate med-

ical analysis and other personal information.

3.3.1 Local Data Collection

The biometric data is locally collected by an eye-

tracking device. A sophisticated software prepro-

cesses the raw video information recorded by the eye

trackers’ IR cameras. Hence, the client application

only receives the time-series data of pupil size, points

of gaze, and blink indicators for the left and right eye,

respectively. Based on the gaze data the client appli-

cation is capable of recognizing the ﬁxation of a user

video stream in the video conference. The informa-

tion gathered is the time of video stream ﬁxation and

the binary label of the user (self, non-self). Until this

point in the process, no data leaves the local context

of the end-user computer.

3.3.2 Data Transmission

The aforementioned time-series data is transmitted to

a remote identity validation service that hosts the self-

recognition classiﬁer. Data transmission happens im-

mediately after the ﬁxation of a user’s video stream

plus an observation period of 3,000 ms. Each pack-

age transmitted covers the label of the observed video

stream (self, non-self), pupil size, points of gaze,

blink indicators, and user session ID, which is the

minimum of required information for classiﬁcation.

Transmission is secured by transport encryption based

on the web standard Transport Layer Security (TLS)

by default.

3.3.3 Remote Data Processing

A reliable result can only be produced in a controlled

environment. Hence, the hosting of the model and in-

ference has to be executed by secure remote service.

The aforementioned data transmitted to the service

does not contain any user or context information. The

labeled eye-tracking data is fed to a generic classiﬁer

and the result is interpreted in the context of the se-

curity task. Only the binary result label (suspicious,

unsuspicious) and the user session ID are then deliv-

ered to an arbitrary monitoring system to display the

result or, in a future productive environment, execute

subsequent measures (e.g., alerting of operational se-

curity personnel). No input or output data is stored by

the service.

4 DEEPFAKE ATTACK

DETECTION BASED ON

SELF-RECOGNITION

A recent development of FPAs is image and video

deepfakes as introduced in Section 2.3. Our novel ap-

proach aims to introduce a sophisticated tool to de-

tect potential face manipulation in video-based sys-

tems. Biometric detection offers a way to escape the

co-evolutionary dynamic of machine-learning-based

deepfake detection by leveraging features and charac-

teristics of humans in addition to digital processing.

Especially self-recognition biometrics add a currently

unexploited source of information about the relation-

ship between a face in a live video stream and the in-

dividuals interacting with it.

The basic principle is to capture a user’s involun-

tary reaction to the video they are streaming, similar

to how one might look at themselves in the mirror.

Behavior-Based Deepfake Detection: Leveraging Cognitive Response to Visual Face Perception

967

Figure 1: Algorithm used for face presentation attack detec-

tion. When a user visually focuses a face in a video stream,

the validation system assesses the user’s biometric reaction.

Based on this reaction, the validation system detects a

potential discrepancy between the user’s claimed and

actual identity. The system classiﬁes the eye-tracking

data based on a generic model as described in Sec-

tion 3.1.

Figure 1 illustrates the algorithm used to de-

tect FPAs in our prototype. Each iteration is trig-

gered when the user visually focuses a face on the

screen (A). If the user observes their own webcam’s

video stream, conﬁdence in an ongoing FPA can be

built by classiﬁcation of the perceived face (B). In the

case of a non-self-face classiﬁcation, the indicator for

FPA detection is raised (C). On reaching a predeﬁned

threshold, the system will issue a notiﬁcation about

a potential FPA (D). If the perception is classiﬁed as

self-face instead, the expected and actual results are

equal, hence, the FPA indicator is lowered accord-

ingly (E). If the user views someone else’s face, the

system uses the classiﬁcation (F) result to raise or

lower conﬁdence in its classiﬁcation validity. This

mechanism deals with the high natural variance and

differences in biological reactions between individu-

als. If the user perceives the face as non-self, the ex-

pected and actual classiﬁcations are equal, resulting

in increased conﬁdence (G); otherwise, conﬁdence is

lowered (H).

This detection algorithm can be leveraged in sce-

narios where users view their own video streams.

Plausible everyday scenarios include online video

conferencing and video identiﬁcation systems. Social

engineering attacks backed by impersonation in those

contexts yield a high damage potential.

4.1 Deepfake Attacks in Video

Conferencing

Video conferencing has become an integral part of

people’s daily lives. In 2022, 76 % of medium and

93 % of large enterprises conducted online meet-

ings (Eurostat, 2023). This development incentives

to take advantage of the vulnerabilities of such sys-

tems, and thus, introduce a new attack surface. For

example, access from insecure devices is common,

and non-authenticated guest users are often allowed.

In December 2021, a globally operating gang success-

fully impersonated the chief executive ofﬁcer (CEO)

of a company to steal an equivalent of $41 mil-

lion from French companies by using deepfake tech-

nology (Europol, 2023). More recently, a multina-

tional company lost an equivalent of $25 million in

Hong Kong, tricked by the impersonation of their

chief ﬁnancial ofﬁcer (CFO) in an online video meet-

ing (CNN, 2024).

This deepfake detection approach can be applied

to online video conferencing. It aims to protect video

conferences with an additional layer of security by de-

tecting FPAs of otherwise unsuspicious users. This

method requires users to use an unconcealed device,

streaming their face video and eye-tracking data.

The basic attack setup is shown in Figure 2. The

attacker intercepts their webcam’s data stream and re-

places the original video with a face-swapped video

(see Section 2.3). The impersonated user is known

to the other conference participants. The conference

system is not capable of distinguishing the virtual we-

bcam device from its original source. A remote iden-

tity validation service collects the data from local eye-

tracking devices and the video conference system,

including webcam video streams of all participants.

Based on the classiﬁcation of this data the service val-

idates the users’ identities and potentially notiﬁes an-

other service in case of detection of a potential FPAs

according to the algorithm elaborated in Section 3.

In 2023, we implemented a ﬁrst experimental pro-

totype of this scenario. An initial small data set of

twelve volunteers showed promising results. Data

collection of a representative data set is ongoing at

the time of this publication.

Our prototype extends the open-source video con-

ferencing platform OpenVidu (Github, 2024b). Fig-

ure 3 provides an overview of the components of

the system. Plug-ins of the client and server com-

ponent are required to collect information from the

video conference, e.g., content in the visual focus of

the user. The client application also passes the eye-

tracking data to the identity validation service. The

latter hosts a pre-trained model and analyzes data dur-

ing ongoing video calls. Results may be transmitted

BIOSIGNALS 2025 - 18th International Conference on Bio-inspired Systems and Signal Processing

968

Figure 2: The fundamental attack setup of a deepfake attack in online video conferencing. The attacker intercepts the original

video stream of their webcam and presents a virtual device to the video conference, which streams their face-swapped video.

to the server plug-in or an external service. Before in-

ference is possible, training is required and conducted

using the standalone component training client. On

the client side, the video conferencing application

runs in the local browser. Aside from the additional

hardware requirement of the eye tracker, the user ex-

perience remains seamless.

4.2 Deepfake Attacks in Video-Ident

Like the video conferencing scenario, video identiﬁ-

cation systems, such as Video-Ident (see Section 2.4),

are vulnerable to deepfake-based FPAs. In 2022, the

activist group Chaos Computer Club (CCC) hacked

six different Video-Ident solutions by presenting a vir-

tual German national ID card even without the use of

deepfake technology (Chaos Computer Club, 2022).

Figure 4 illustrates how an attack on Video-Ident

works. A client joins an unauthenticated session with

a human agent. Authentication is manually performed

by matching ID card identity to the face of the person

in the video stream. The attacker manipulates their

webcam’s raw data like the conference scenario but

in addition adds a swap of a real ID card with a fake

ID card matching his face swapped identity.

Figure 3: The conferencing system is a client-server archi-

tecture communicating with an identity validation service.

Four software components (bold) were implemented to add

self-recognition-based FPA detection.

This procedure can not only impersonate an exist-

ing person but also invent a non-existing individual.

The additional requirement of the client to transmit

their eye-tracking data allows the remote identity val-

idation service to verify the client’s perception of the

faces displayed on their screen. A potential FPA can

be detected, and the agent may be alerted in real time

during the live session.

5 DISCUSSION

Section 3 summarizes an earlier published demon-

stration of the feasibility of self-recognition as a bio-

metric trait. Leveraging those concepts, in Section 4,

we introduced an FPA detection scheme based on dy-

namic biometric analysis, which detects deepfake at-

tacks. This method may be a new building block of

a multimodal solution to protect users from identity

theft in online video-based applications.

A multimodal solution includes state-of-the-art

deep learning techniques, static and dynamic biomet-

ric analysis, and content credentials. Existing litera-

ture (see Section 2.1) shows that the widespread deep-

fake detection methods based on deep learning are

speciﬁcally trained for known deepfake algorithms

and can achieve very high accuracies on them. On

the other hand, biometric approaches like ours are

technology-agnostic but involve natural variance. In

practice, both can complement each other, and thus,

should be combined. A holistic solution can be fur-

ther enhanced by including static biometric analysis.

Content credentials provide additional protection of

integrity and authenticity on static content, e.g., im-

ages, created by trustworthy devices.

Behavior-Based Deepfake Detection: Leveraging Cognitive Response to Visual Face Perception

969

Figure 4: The fundamental attack setup of a deepfake attack in Video-Ident. The attacker intercepts the original video stream

of their webcam and presents a virtual device to the Video-Ident system, including a swap of their face and ID card.

5.1 Limitations

Eye-tracking data has a high natural variance and sig-

niﬁcant differences in biological reactions between

individuals exist. In addition, the method is lim-

ited to eye-trackable individuals who show no sig-

niﬁcant anomaly in their eye-related reaction to self-

perception. This may exclude blind people and those

with certain cognitive impairments. To mitigate those

limitations, our goal is to combine heterogeneous in-

formation sources into a multimodal solution.

Evasion of self-recognition-based deepfake detec-

tion may be possible. E.g., attackers could overlay

their local video streams with the original webcam,

and hence, be undetectable by our identity validation

service. This could be prevented by adding visual arti-

facts to the video stream on the server side, that cause

detectable reactions in eye movement. This and other

potential bypasses are subject to ongoing research.

5.2 Ethical Consideration

It has been shown that classiﬁcation pipelines perform

differently across gender, race, skin color, and other

characteristics (Buolamwini and Gebru, 2018). That

is why we recognize the importance of ethical consid-

erations and debiasing in our technology.

To mitigate these concerns, we strive to implement

more inclusive data collection procedures. By ensur-

ing diverse representation in our data, we aim to ad-

dress biases related to race, gender, and other identi-

ties. This involves carefully tracing how these iden-

tities are understood, represented and reﬂected within

the systems and their application contexts.

Additionally, we acknowledge that pre-release tri-

als, independent auditing, and ongoing monitoring are

essential to test for bias, discrimination, and other

harms in such systems.

6 CONCLUSION

This paper describes our approach to using dynamic

biometrics for live FPA detection during the online

video-based use case of video conferencing (see Sec-

tion 4.1). We illustrate this exemplarily using deep-

fake attacks. Moreover, we provide insights into our

video conferencing prototype implementation.

Future work to enhance self-recognition-based

identiﬁcation includes the collection of more high-

quality data containing deepfake stimuli, e.g., face-

swapped or morphed self-face images. Model perfor-

mance may be improved by investigating more so-

phisticated features, especially based on microsac-

cades. Furthermore, the exploration of more dynamic

biometric features may be an interesting ﬁeld for de-

veloping smarter FPA detection frameworks. We en-

courage the community to apply our approach and

similar research to upcoming use cases. For example,

veriﬁcation of live avatars in VR scenarios may be in-

teresting. Mobile use cases, e.g., smartphone contin-

uous authentication, are plausible due to the existing

availability of eye tracking in many devices.

We aim to build a comprehensive FPA detection

framework that includes not only dynamic and static

biometric features but also deep-learning-based and

content-credential-based protection.

The detection and prevention of online FPAs

strengthens the linkage between physical and online

identities. It prevents identity fraud and its personal

and societal harm. The increasing number of success-

ful frauds exploiting deepfake technology undermines

the trust in digital systems such as video conferencing

and video identiﬁcation. This trust must be restored

by appropriate countermeasures to not endanger the

beneﬁts of today’s technological advances and the po-

tential of tomorrow’s digitization.

BIOSIGNALS 2025 - 18th International Conference on Bio-inspired Systems and Signal Processing

970

ACKNOWLEDGEMENTS

This research was partially supported by the Ger-

man Federal Ministry for Economic Affairs and

Climate Action (BMWK) under the funded project

SENSIBLE-KI (grant number 01MT21005B).

DISCLOSURE OF INTERESTS

This research was funded by Bundesdruckerei GmbH.

At the time of publication, authors H. Graupner and

M. Y. Abkenar are employed by Bundesdruckerei.

REFERENCES

Agarwal, S., Farid, H., El-Gaaly, T., and Lim, S.-N. (2020).

Detecting deep-fake videos from appearance and be-

havior. In 2020 IEEE international workshop on in-

formation forensics and security, pages 1–6. IEEE.

Buolamwini, J. and Gebru, T. (2018). Gender shades: In-

tersectional accuracy disparities in commercial gender

classiﬁcation. In Conference on fairness, accountabil-

ity and transparency, pages 77–91. PMLR.

Chaos Computer Club (2022). Chaos Computer Club hacks

Video-Ident. https://www.ccc.de/en/updates/2022/c

haos-computer-club-hackt-video-ident. last accessed

2024/06/27.

CNN (2024). Finance worker pays out $25 million after

video call with deepfake ’chief ﬁnancial ofﬁcer’. http

s://edition.cnn.com/2024/02/04/asia/deepfake-cfo-s

cam-hong-kong-intl-hnk/index.html. last accessed

2024/06/27.

Cozzolino, D., R

ossler, A., Thies, J., Nießner, M., and Ver-

doliva, L. (2021). Id-reveal: Identity-aware deepfake

video detection. In Proceedings of the IEEE/CVF

international conference on computer vision, pages

15108–15117.

Deutsche Post (2024). How does the identity check by video

chat work? https://www.deutschepost.de/en/p/posti

dent/privatkunden/identiﬁkation-per-videochat.html.

last accessed 2024/06/27.

Europol (2023). Franco-israeli gang behind eur 38 million

ceo fraud busted. https://www.europol.europa.eu/me

dia-press/newsroom/news/franco-israeli-gang-beh

ind-eur-38-million-ceo-fraud-busted. last accessed

2024/06/27.

Eurostat (2023). Online meetings and remote access to en-

terprise resources - statistics. https://ec.europa.eu/eur

ostat/statistics-explained/index.php?title=Online mee

tings and remote access to enterprise resources - s

tatistics. last accessed 2024/06/27.

Gazepoint (2024). GP3 HD Eye Tracker 150Hz. https:

//www.gazept.com/product/gp3hd. last accessed

2024/06/27.

Github (2024a). Deepfacelive. https://github.com/iperov/

DeepFaceLive. last accessed 2024/06/27.

Github (2024b). OpenVidu. https://github.com/OpenVidu/

openvidu. last accessed 2024/06/27.

Grand View Research (2022). Eye tracking market size,

share & trends analysis report. https://www.grandvie

wresearch.com/industry-analysis/eye-tracking-marke

t. last accessed 2024/06/27.

Graupner, H., Schwetlick, L., Engbert, R., and Meinel, C.

(2023). Unconventional biometrics: Exploring the

feasibility of a cognitive trait based on visual self-

recognition. In 2023 IEEE International Joint Con-

ference on Biometrics, pages 1–10. IEEE.

Hasan, H. R. and Salah, K. (2019). Combating deepfake

videos using blockchain and smart contracts. IEEE

Access, 7:41596–41606.

Hernandez-Ortega, J., Fierrez, J., Morales, A., and Galbally,

J. (2023). Introduction to presentation attack detection

in face biometrics and recent advances. Handbook of

Biometric Anti-Spooﬁng: Presentation Attack Detec-

tion and Vulnerability Assessment, pages 203–230.

Hess, E. H. and Polt, J. M. (1960). Pupil size as re-

lated to interest value of visual stimuli. Science,

132(3423):349–350.

Iloka, B. C. and Anukwe, G. I. (2020). Review of eye-

tracking: A neuromarketing technique. Neuroscience

Research Notes, 3(4):29–34.

Intel Corporation (2022). Intel introduces real-time deep-

fake detector. https://www.intel.com/content/www/us

/en/newsroom/news/intel-introduces-real-time-deepf

ake-detector.html. last accessed 2024/06/27.

Just, M. A. and Carpenter, P. A. (1980). A theory of reading:

from eye ﬁxations to comprehension. Psychological

review, 87(4):329.

Martinez-Conde, S., Macknik, S. L., and Hubel, D. H.

(2004). The role of ﬁxational eye movements in visual

perception. Nature reviews neuroscience, 5(3):229–

240.

OBS Project (2024). Obs studio. https://obsproject.com.

last accessed 2024/06/27.

Rana, M. S., Nobi, M. N., Murali, B., and Sung, A. H.

(2022). Deepfake detection: A systematic literature

review. IEEE access, 10:25494–25513.

Rana, M. S. and Sung, A. H. (2020). Deepfakestack:

A deep ensemble-based learning technique for deep-

fake detection. In 2020 7th IEEE International Con-

ference on Cyber Security and Cloud Computing

(CSCloud)/2020 6th IEEE International Conference

on Edge Computing and Scalable Cloud (EdgeCom),

pages 70–75. IEEE.

Schwetlick, L., Engbert, R., and Graupner, H. (2023a). Data

Set: Face- and Self-Recognition Effects on Pupil Size

and Microsaccade Rate. Open Science Framework.

Schwetlick, L., Graupner, H., Dimigen, O., and Engbert,

R. (2023b). Self-recognition generates characteristic

responses in pupil dynamics and microsaccade rate.

arXiv.

Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A.,

and Ortega-Garcia, J. (2020). Deepfakes and beyond:

A survey of face manipulation and fake detection. In-

formation Fusion, 64:131–148.

Behavior-Based Deepfake Detection: Leveraging Cognitive Response to Visual Face Perception

971