Generation of Facial Images Reflecting Speaker Attributes and Emotions Based on Voice Input

Kotaro Koseki; Yuichi Sei; Yasuyuki Tahara; Akihiko Ohsuga

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Generation of Facial Images Reflecting Speaker Attributes and Emotions Based on Voice Input

Topics: AI and Creativity; Deep Learning; Neural Networks; Visualization

In Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, 99-105, 2023 , Lisbon, Portugal

Authors: Kotaro Koseki ; Yuichi Sei ; Yasuyuki Tahara and Akihiko Ohsuga

Affiliation: The University of Electro-Communications, Tokyo, Japan

Keyword(s): Multimodal Learning, Deep Learning Machine Learning, CNN, VAE.

Abstract: The task of “face generation from voice” will bring about a significant change in the way voice calls are made. Voice calls create a psychological gap compared to face to face communication because the other party’s face is not visible. Generating a face from voice can alleviate this psychological gap and contribute to more efficient communication. Multimodal learning is a machine learning method that uses different data (e.g., voice and face images) and is being studied to combine various types of information such as text, images, and voice, as in google’s imagen(Saharia et al., 2022). In this study, we perform multimodal learning of speech and face images using a CNN convolutional speech encoder and a face image variational autoencoder (VAE: Variational Autoencoder) to create models that can represent speech and face images of different modalities in the same latent space. Focusing on the emotional information of speech, we also built a model that can generate face images that refl ect the speaker’s emotions and attributes in response to input speech. As a result, we were able to generate face images that reflect rough emotions and attributes, although there are variations in the emotions depending on the type of emotion. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.59

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Koseki, K., Sei, Y., Tahara, Y. and Ohsuga, A. (2023). Generation of Facial Images Reflecting Speaker Attributes and Emotions Based on Voice Input. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-623-1; ISSN 2184-433X, SciTePress, pages 99-105. DOI: 10.5220/0011630200003393

@conference{icaart23,
author={Kotaro Koseki and Yuichi Sei and Yasuyuki Tahara and Akihiko Ohsuga},
title={Generation of Facial Images Reflecting Speaker Attributes and Emotions Based on Voice Input},
booktitle={Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2023},
pages={99-105},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011630200003393},
isbn={978-989-758-623-1},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - Generation of Facial Images Reflecting Speaker Attributes and Emotions Based on Voice Input
SN - 978-989-758-623-1
IS - 2184-433X
AU - Koseki, K.
AU - Sei, Y.
AU - Tahara, Y.
AU - Ohsuga, A.
PY - 2023
SP - 99
EP - 105
DO - 10.5220/0011630200003393
PB - SciTePress