Search Results for author: Rafael Valle

Found 19 papers, 9 papers with code

Audio Dialogues: Dialogues dataset for audio and music understanding

no code implementations11 Apr 2024 Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

Existing datasets for audio understanding primarily focus on single-turn interactions (i. e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue.

Audio captioning Audio Question Answering +3

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

no code implementations2 Feb 2024 Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro

Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs.

Few-Shot Learning In-Context Learning +2

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

no code implementations24 Jan 2024 Akshit Arora, Rohan Badlani, Sungwon Kim, Rafael Valle, Bryan Catanzaro

In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets.

Voice Cloning

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

no code implementations14 Oct 2023 Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning and speaker verification models.

Self-Supervised Learning Speaker Verification +2

Multilingual Multiaccented Multispeaker TTS with RADTTS

no code implementations24 Jan 2023 Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro

We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice.

Speech Synthesis

SPACE: Speech-driven Portrait Animation with Controllable Expression

no code implementations ICCV 2023 Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Rafael Valle, Ming-Yu Liu

It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator.

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows

1 code implementation3 Mar 2022 Kevin J. Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, Bryan Catanzaro

Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2.

Speech Synthesis Text-To-Speech Synthesis

One TTS Alignment To Rule Them All

3 code implementations23 Aug 2021 Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro

However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.

Speech Synthesis

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

3 code implementations ICLR 2021 Rafael Valle, Kevin Shih, Ryan Prenger, Bryan Catanzaro

In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer.

 Ranked #1 on Text-To-Speech Synthesis on LJSpeech (Pleasantness MOS metric, using extra training data)

Speech Synthesis Style Transfer +1

Neural ODEs for Image Segmentation with Level Sets

no code implementations25 Dec 2019 Rafael Valle, Fitsum Reda, Mohammad Shoeybi, Patrick Legresley, Andrew Tao, Bryan Catanzaro

We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method.

Image Segmentation object-detection +4

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens

4 code implementations26 Oct 2019 Rafael Valle, Jason Li, Ryan Prenger, Bryan Catanzaro

Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data.

Style Transfer

WaveGlow: A Flow-based Generative Network for Speech Synthesis

2 code implementations31 Oct 2018 Ryan Prenger, Rafael Valle, Bryan Catanzaro

In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.

Audio Synthesis regression +1

TequilaGAN: How to easily identify GAN samples

no code implementations ICLR 2019 Rafael Valle, Wilson Cai, Anish Doshi

In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework.

Generative Adversarial Network

Attacking Speaker Recognition With Deep Generative Models

no code implementations8 Jan 2018 Wilson Cai, Anish Doshi, Rafael Valle

In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems.

Speaker Recognition

Character-Based Handwritten Text Transcription with Attention Networks

1 code implementation11 Dec 2017 Jason Poulos, Rafael Valle

When the sequence alignment is one-to-one, softmax attention is able to learn a more precise alignment at each step of the decoding, whereas the alignment generated by sigmoid attention is much less precise.

Handwritten Text Recognition HTR

Missing Data Imputation for Supervised Learning

1 code implementation28 Oct 2016 Jason Poulos, Rafael Valle

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information.

General Classification Imputation

Cannot find the paper you are looking for? You can Submit a new open access paper.