Search Results for author: Rohan Badlani

Found 13 papers, 5 papers with code

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

no code implementations2 Feb 2024 Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro

Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs.

Few-Shot Learning In-Context Learning +2

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

no code implementations24 Jan 2024 Akshit Arora, Rohan Badlani, Sungwon Kim, Rafael Valle, Bryan Catanzaro

In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets.

Voice Cloning

Multilingual Multiaccented Multispeaker TTS with RADTTS

no code implementations24 Jan 2023 Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro

We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice.

Speech Synthesis

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows

1 code implementation3 Mar 2022 Kevin J. Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, Bryan Catanzaro

Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2.

Speech Synthesis Text-To-Speech Synthesis

One TTS Alignment To Rule Them All

3 code implementations23 Aug 2021 Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro

However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.

Speech Synthesis

Relation Extraction with Contextualized Relation Embedding (CRE)

1 code implementation EMNLP (DeeLIO) 2020 Xiaoyu Chen, Rohan Badlani

This paper proposes an architecture for the relation extraction task that integrates semantic information with knowledge base modeling in a novel manner.

Entity Embeddings Relation +1

Framework for evaluation of sound event detection in web videos

no code implementations2 Nov 2017 Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj

The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets.

Event Detection Sound Event Detection

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations20 Sep 2016 Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.