Search Results for author: Frederico Santos de Oliveira

Found 6 papers, 6 papers with code

Yin Yang Convolutional Nets: Image Manifold Extraction by the Analysis of Opposites

1 code implementation • 24 Oct 2023 • Augusto Seben da Rosa, Frederico Santos de Oliveira, Anderson da Silva Soares, Arnaldo Candido Junior

Computer vision in general presented several advances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative models, among others).

Paper
Code

CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

2 code implementations • 14 Oct 2021 • Arnaldo Candido Junior, Edresson Casanova, Anderson Soares, Frederico Santos de Oliveira, Lucas Oliveira, Ricardo Corso Fernandes Junior, Daniel Peixoto Pinto da Silva, Fernando Gorgulho Fayet, Bruno Baldissera Carlotto, Lucas Rafael Stefanel Gris, Sandra Maria Aluísio

with 290. 77 hours, a publicly available dataset for ASR in BP containing validated pairs (audio-transcription).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Brazilian Portuguese Speech Recognition Using Wav2vec 2.0

1 code implementation • 23 Jul 2021 • Lucas Rafael Stefanel Gris, Edresson Casanova, Frederico Santos de Oliveira, Anderson da Silva Soares, Arnaldo Candido Junior

In this sense, this work presents the development of an public Automatic Speech Recognition (ASR) system using only open available audio data, from the fine-tuning of the Wav2vec 2. 0 XLSR-53 model pre-trained in many languages, over BP data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

2 code implementations • 2 Apr 2021 • Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frederico Santos de Oliveira, Arnaldo Candido Junior, Anderson da Silva Soares, Sandra Maria Aluisio, Moacir Antonelli Ponti

In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training.

29,429

Paper
Code