Search Results for author: Rohan Badlani

Found 13 papers, 5 papers with code

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

no code implementations • 2 Feb 2024 • Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro

Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs.

Few-Shot Learning In-Context Learning +2

Paper
Add Code

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

no code implementations • 24 Jan 2024 • Akshit Arora, Rohan Badlani, Sungwon Kim, Rafael Valle, Bryan Catanzaro

In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets.

Voice Cloning

Paper
Add Code

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting

1 code implementation • NeurIPS 2023 • Sungwon Kim ~Sungwon_Kim2, Kevin J. Shih, Rohan Badlani, Joao Felipe Santos, Evelina Bakhturina, Mikyas T. Desta, Rafael Valle, Sungroh Yoon, Bryan Catanzaro

P-Flow comprises a speech-prompted text encoder for speaker adaptation and a flow matching generative decoder for high-quality and fast speech synthesis.

Speech Synthesis

164

Paper
Code

VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

no code implementations • 14 Mar 2023 • Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro

We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system.

Disentanglement Speech Synthesis

Paper
Add Code

Multilingual Multiaccented Multispeaker TTS with RADTTS

no code implementations • 24 Jan 2023 • Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro

We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice.

Speech Synthesis

Paper
Add Code

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows

1 code implementation • 3 Mar 2022 • Kevin J. Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, Bryan Catanzaro

Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2.

Speech Synthesis Text-To-Speech Synthesis

271

Paper
Code

One TTS Alignment To Rule Them All

3 code implementations • 23 Aug 2021 • Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro

However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.

Speech Synthesis

29,239

Paper
Code

RAD-TTS: Parallel Flow-Based TTS with Robust Alignment Learning and Diverse Synthesis

1 code implementation • ICML Workshop INNF 2021 • Kevin J. Shih, Rafael Valle, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro

This work introduces a predominantly parallel, end-to-end TTS model based on normalizing flows.

Speech Synthesis Text-To-Speech Synthesis

271

Paper
Code

Relation Extraction with Contextualized Relation Embedding (CRE)

1 code implementation • EMNLP (DeeLIO) 2020 • Xiaoyu Chen, Rohan Badlani

This paper proposes an architecture for the relation extraction task that integrates semantic information with knowledge base modeling in a novel manner.

Entity Embeddings Relation +1