Discriminative Multi-modality Speech Recognition

CVPR 2020 Bo XuCheng LuYandong GuoJacob Wang

Vision is often used as a complementary modality for audio speech recognition (ASR), especially in the noisy environment where performance of solo audio modality significantly deteriorates. After combining visual modality, ASR is upgraded to the multi-modality speech recognition (MSR)... (read more)

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Lipreading Lip Reading in the Wild 3D Conv + P3D-ResNet50 + TCN Top-1 Accuracy 84.80 # 3

Methods used in the Paper