FRCol: Face Recognition Based Speaker Video Colorization

Rory Ward; John Breslin

doi:10.5220/0013306800003912

FRCol: Face Recognition Based Speaker Video Colorization

Rory Ward, John Breslin

2025

Abstract

Automatic video colorization has recently gained attention for its ability to adapt old movies for today’s modern entertainment industry. However, there is a significant challenge: limiting unnatural color hallucination. Generative artificial intelligence often generates erroneous results, which in colorization manifests as unnatural colorizations. In this work, we propose to ground our automatic video colorization system in relevant exemplars by leveraging a face database, which we retrieve from using facial recognition technology. This retrieved exemplar guides the colorization of the latent-diffusion-based speaker video colorizer. We dub our system FRCol. We focus on speakers as humans have evolved to pay particular attention to certain aspects of colorization, with human faces being one of them. We improve the previous state-of-the-art (SOTA) DeOldify by an average of 13% on the standard metrics of PSNR, SSIM, FID, and FVD on the Grid and Lombard Grid datasets. Our user study also consolidates these results where FRCol was preferred to contemporary colorizers 81% of the time.

Download

Paper Citation

in Harvard Style

Ward R. and Breslin J. (2025). FRCol: Face Recognition Based Speaker Video Colorization. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 717-728. DOI: 10.5220/0013306800003912

in Bibtex Style

@conference{visapp25,
author={Rory Ward and John Breslin},
title={FRCol: Face Recognition Based Speaker Video Colorization},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP},
year={2025},
pages={717-728},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013306800003912},
isbn={978-989-758-728-3},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP
TI - FRCol: Face Recognition Based Speaker Video Colorization
SN - 978-989-758-728-3
AU - Ward R.
AU - Breslin J.
PY - 2025
SP - 717
EP - 728
DO - 10.5220/0013306800003912
PB - SciTePress