1 code implementation • 20 Feb 2024 • José-M. Acosta-Triana, David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
In order to promote research on low-resource languages for audio-visual speech technologies, we present AnnoTheia, a semi-automatic annotation toolkit that detects when a person speaks on the scene and the corresponding transcription.
no code implementations • 20 Feb 2024 • David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech Recognition (VSR).
1 code implementation • 5 Jan 2024 • David Gimeno-Gómez, Ana-Maria Bucur, Adrian Cosma, Carlos-David Martínez-Hinarejos, Paolo Rosso
Depression, a prominent contributor to global disability, affects a substantial portion of the population.
no code implementations • 21 Nov 2023 • David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
Different studies have shown the importance of visual cues throughout the speech perception process.
no code implementations • 21 Nov 2023 • David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
In this paper, we propose an analysis of different speech visual features with the intention of identifying which of them is the best approach to capture the nature of lip movements for natural Spanish and, in this way, dealing with the automatic visual speech recognition task.
1 code implementation • LREC 2022 • David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
Speech is considered as a multi-modal process where hearing and vision are two fundamentals pillars.