Search Results for author: Jeong Hun Yeo

Found 8 papers, 2 papers with code

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

1 code implementation23 Feb 2024 Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements.

Ranked #4 on Lipreading on LRS3-TED (using extra training data)

Lipreading Lip Reading +3

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

no code implementations18 Jan 2024 Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Se Jin Park, Yong Man Ro

By using the visual speech units as the inputs of our system, we pre-train the model to predict corresponding text outputs on massive multilingual data constructed by merging several VSR databases.

Sentence speech-recognition +1

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

no code implementations15 Sep 2023 Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp.

Image Comprehension Language Modelling +1

Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

no code implementations15 Sep 2023 Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro

Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention.

Language Identification speech-recognition +1

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

no code implementations ICCV 2023 Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro

In order to mitigate the challenge, we try to learn general speech knowledge, the ability to model lip movements, from a high-resource language through the prediction of speech units.

Lip Reading

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

1 code implementation The AAAI Conference on Artificial Intelligence (AAAI) 2022 Minsu Kim, Jeong Hun Yeo, Yong Man Ro

With the multi-head key memories, MVM extracts possible candidate audio features from the memory, which allows the lip reading model to consider the possibility of which pronunciations can be represented from the input lip movement.

Lip Reading

Cannot find the paper you are looking for? You can Submit a new open access paper.