Search Results for author: David Gimeno-Gómez

Found 6 papers, 3 papers with code

AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies

1 code implementation • 20 Feb 2024 • José-M. Acosta-Triana, David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

In order to promote research on low-resource languages for audio-visual speech technologies, we present AnnoTheia, a semi-automatic annotation toolkit that detects when a person speaks on the scene and the corresponding transcription.

Paper
Code

Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition

no code implementations • 20 Feb 2024 • David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech Recognition (VSR).

Decoder speech-recognition +1

Paper
Add Code

Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues

1 code implementation • 5 Jan 2024 • David Gimeno-Gómez, Ana-Maria Bucur, Adrian Cosma, Carlos-David Martínez-Hinarejos, Paolo Rosso

Depression, a prominent contributor to global disability, affects a substantial portion of the population.

Depression Detection

Paper
Code

Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish

no code implementations • 21 Nov 2023 • David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

Different studies have shown the importance of visual cues throughout the speech perception process.

speech-recognition Visual Speech Recognition

Paper
Add Code

Analysis of Visual Features for Continuous Lipreading in Spanish

no code implementations • 21 Nov 2023 • David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

In this paper, we propose an analysis of different speech visual features with the intention of identifying which of them is the best approach to capture the nature of lip movements for natural Spanish and, in this way, dealing with the automatic visual speech recognition task.

Lipreading speech-recognition +1

Paper
Add Code

LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild

1 code implementation • LREC 2022 • David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

Speech is considered as a multi-modal process where hearing and vision are two fundamentals pillars.

Automatic Speech Recognition speech-recognition +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.