1 code implementation • 13 Mar 2024 • Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk
Lastly, we show that the proposed recipe can be applied to other distillation methodologies, such as the recent DPWavLM.
no code implementations • 25 Sep 2023 • Arthur Pimentel, Heitor Guimarães, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk
Recent advances with self-supervised learning have allowed speech recognition systems to achieve state-of-the-art (SOTA) word error rates (WER) while requiring only a fraction of the labeled training data needed by its predecessors.
no code implementations • 22 Sep 2023 • Heitor R. Guimarães, Arthur Pimentel, Anderson Avila, Tiago H. Falk
Keyword spotting (KWS) refers to the task of identifying a set of predefined words in audio streams.
no code implementations • 23 May 2023 • Vamsikrishna Chemudupati, Marzieh Tahaei, Heitor Guimaraes, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago Falk
Large self-supervised pre-trained speech models have achieved remarkable success across various speech-processing tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 9 May 2023 • Heitor Guimarães, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Tiago H. Falk
Later, these representations serve as input to downstream models to solve a number of tasks, such as keyword spotting or emotion recognition.
no code implementations • 18 Feb 2023 • Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk
The proposed layer-wise distillation recipe is evaluated on top of three well-established universal representations, as well as with three downstream tasks.
no code implementations • 12 Nov 2022 • Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk
Self-supervised speech representation learning aims to extract meaningful factors from the speech signal that can later be used across different downstream tasks, such as speech and/or emotion recognition.