Search Results for author: Joanna Hong

Found 11 papers, 7 papers with code

DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion

no code implementations • 23 Aug 2023 • Se Jin Park, Joanna Hong, Minsu Kim, Yong Man Ro

We contribute a new large-scale 3D facial mesh dataset, 3D-HDTF to enable the synthesis of variations in identities, poses, and facial motions of 3D face mesh.

3D Face Animation

Paper
Add Code

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

1 code implementation • ICCV 2023 • Jeongsoo Choi, Joanna Hong, Yong Man Ro

In doing so, the rich speaker embedding information can be produced solely from input visual information, and the extra audio information is not necessary during the inference time.

Speech Synthesis

Paper
Code

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

1 code implementation • CVPR 2023 • Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro

Thus, we firstly analyze that the previous AVSR models are not indeed robust to the corruption of multimodal input streams, the audio and the visual inputs, compared to uni-modal models.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Lip-to-Speech Synthesis in the Wild with Multi-task Learning

3 code implementations • 17 Feb 2023 • Minsu Kim, Joanna Hong, Yong Man Ro

To this end, we design multi-task learning that guides the model using multimodal supervision, i. e., text and audio, to complement the insufficient word representations of acoustic feature reconstruction loss.

Lip to Speech Synthesis Multi-Task Learning +1

Paper
Code

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

no code implementations • 2 Nov 2022 • Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, Yong Man Ro

It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time.

Audio-Visual Synchronization Representation Learning +1

Paper
Add Code

Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition

1 code implementation • 13 Jul 2022 • Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro

The enhanced audio features are fused with the visual features and taken to an encoder-decoder model composed of Conformer and Transformer for speech recognition.

Audio-Visual Speech Recognition Decoder +3

Paper
Code

VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection

no code implementations • 15 Jun 2022 • Joanna Hong, Minsu Kim, Yong Man Ro

Thus, the proposed framework brings the advantage of synthesizing the speech containing the right content even with the silent talking face video of an unseen subject.

feature selection Speech Synthesis

Paper
Add Code

Lip to Speech Synthesis with Visual Context Attentional GAN

1 code implementation • NeurIPS 2021 • Minsu Kim, Joanna Hong, Yong Man Ro

In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis.

Contrastive Learning Generative Adversarial Network +2

Paper
Code

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

1 code implementation • ICCV 2021 • Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro

By learning the interrelationship through the associative bridge, the proposed bridging framework is able to obtain the target modal representations inside the memory network, even with the source modal input only, and it provides rich information for its downstream tasks.

Ranked #3 on Lipreading on CAS-VSR-W1k (LRW-1000)

Lip Reading

Paper
Code

Speech Reconstruction with Reminiscent Sound via Visual Voice Memory

1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021 • Joanna Hong, Minsu Kim, Se Jin Park, Yong Man Ro

Our key contributions are: (1) proposing the Visual Voice memory that brings rich information of audio that complements the visual features, thus producing high-quality speech from silent video, and (2) enabling multi-speaker and unseen speaker training by memorizing auditory features and the corresponding visual features.

Ranked #1 on Speaker-Specific Lip to Speech Synthesis on GRID corpus (mixed-speech)

Speaker-Specific Lip to Speech Synthesis

Paper
Code

Comprehensive Facial Expression Synthesis using Human-Interpretable Language

no code implementations • 16 Jul 2020 • Joanna Hong, Jung Uk Kim, Sangmin Lee, Yong Man Ro

Recent advances in facial expression synthesis have shown promising results using diverse expression representations including facial action units.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.