Search Results for author: Rafael Valle

Found 19 papers, 9 papers with code

Audio Dialogues: Dialogues dataset for audio and music understanding

no code implementations • 11 Apr 2024 • Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

Existing datasets for audio understanding primarily focus on single-turn interactions (i. e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue.

Audio captioning Audio Question Answering +3

Paper
Add Code

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

no code implementations • 2 Feb 2024 • Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro

Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs.

Few-Shot Learning In-Context Learning +2

Paper
Add Code

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

no code implementations • 24 Jan 2024 • Akshit Arora, Rohan Badlani, Sungwon Kim, Rafael Valle, Bryan Catanzaro

In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets.

Voice Cloning

Paper
Add Code

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

no code implementations • 14 Oct 2023 • Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning and speaker verification models.

Self-Supervised Learning Speaker Verification +2

Paper
Add Code

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting

1 code implementation • NeurIPS 2023 • Sungwon Kim ~Sungwon_Kim2, Kevin J. Shih, Rohan Badlani, Joao Felipe Santos, Evelina Bakhturina, Mikyas T. Desta, Rafael Valle, Sungroh Yoon, Bryan Catanzaro

P-Flow comprises a speech-prompted text encoder for speaker adaptation and a flow matching generative decoder for high-quality and fast speech synthesis.

Speech Synthesis

165

Paper
Code

VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

no code implementations • 14 Mar 2023 • Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro

We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system.

Disentanglement Speech Synthesis

Paper
Add Code

Multilingual Multiaccented Multispeaker TTS with RADTTS

no code implementations • 24 Jan 2023 • Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro

We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice.

Speech Synthesis

Paper
Add Code

SPACE: Speech-driven Portrait Animation with Controllable Expression

no code implementations • ICCV 2023 • Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Rafael Valle, Ming-Yu Liu

It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator.

Paper
Add Code

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows

1 code implementation • 3 Mar 2022 • Kevin J. Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, Bryan Catanzaro

Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2.

Speech Synthesis Text-To-Speech Synthesis

271

Paper
Code

One TTS Alignment To Rule Them All

3 code implementations • 23 Aug 2021 • Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro

However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.

Speech Synthesis

29,277

Paper
Code

RAD-TTS: Parallel Flow-Based TTS with Robust Alignment Learning and Diverse Synthesis

1 code implementation • ICML Workshop INNF 2021 • Kevin J. Shih, Rafael Valle, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro

This work introduces a predominantly parallel, end-to-end TTS model based on normalizing flows.

Speech Synthesis Text-To-Speech Synthesis

271

Paper
Code

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

3 code implementations • ICLR 2021 • Rafael Valle, Kevin Shih, Ryan Prenger, Bryan Catanzaro

In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer.

Ranked #1 on Text-To-Speech Synthesis on LJSpeech (Pleasantness MOS metric, using extra training data)

Speech Synthesis Style Transfer +1

881

Paper
Code

Neural ODEs for Image Segmentation with Level Sets

no code implementations • 25 Dec 2019 • Rafael Valle, Fitsum Reda, Mohammad Shoeybi, Patrick Legresley, Andrew Tao, Bryan Catanzaro

We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method.

Image Segmentation object-detection +4

Paper
Add Code

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens

4 code implementations • 26 Oct 2019 • Rafael Valle, Jason Li, Ryan Prenger, Bryan Catanzaro

Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data.

Style Transfer

847

Paper
Code

WaveGlow: A Flow-based Generative Network for Speech Synthesis

2 code implementations • 31 Oct 2018 • Ryan Prenger, Rafael Valle, Bryan Catanzaro

In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.

Ranked #8 on Speech Synthesis on LibriTTS

Audio Synthesis regression +1

2,219

Paper
Code

TequilaGAN: How to easily identify GAN samples

no code implementations • ICLR 2019 • Rafael Valle, Wilson Cai, Anish Doshi

In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework.

Generative Adversarial Network

Paper
Add Code

Attacking Speaker Recognition With Deep Generative Models

no code implementations • 8 Jan 2018 • Wilson Cai, Anish Doshi, Rafael Valle

In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems.

Speaker Recognition

Paper
Add Code

Character-Based Handwritten Text Transcription with Attention Networks

1 code implementation • 11 Dec 2017 • Jason Poulos, Rafael Valle

When the sequence alignment is one-to-one, softmax attention is able to learn a more precise alignment at each step of the decoding, whereas the alignment generated by sigmoid attention is much less precise.

Handwritten Text Recognition HTR

Paper
Code

Missing Data Imputation for Supervised Learning

1 code implementation • 28 Oct 2016 • Jason Poulos, Rafael Valle

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information.

Ranked #1 on Imputation on Adult

General Classification Imputation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.