Search Results for author: Yinghao Aaron Li

Found 12 papers, 5 papers with code

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience

no code implementations • 6 Feb 2024 • Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume.

Language Modelling Large Language Model

Paper
Add Code

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

no code implementations • 31 Jan 2024 • Gavin Mischler, Yinghao Aaron Li, Stephan Bickel, Ashesh D. Mehta, Nima Mesgarani

We also compare the feature extraction pathways of the LLMs to each other and identify new ways in which high-performing models have converged toward similar hierarchical processing mechanisms.

Paper
Add Code

Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

no code implementations • 27 Sep 2023 • Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios.

Contrastive Learning Data Augmentation

Paper
Add Code

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

no code implementations • 18 Sep 2023 • Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Subjective evaluations on LJSpeech show that our model significantly outperforms both iSTFTNet and HiFi-GAN, achieving ground-truth-level performance.

Speech Synthesis

Paper
Add Code

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

no code implementations • 18 Jul 2023 • Yinghao Aaron Li, Cong Han, Nima Mesgarani

In recent years, large-scale pre-trained speech language models (SLMs) have demonstrated remarkable advancements in various generative speech modeling applications, such as text-to-speech synthesis, voice conversion, and speech enhancement.

Generative Adversarial Network Language Modelling +4

Paper
Add Code

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

1 code implementation • NeurIPS 2023 • Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani

In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis.

Speech Synthesis

4,074

Paper
Code

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

no code implementations • 29 May 2023 • Xilin Jiang, Yinghao Aaron Li, Nima Mesgarani

Lifelong audio feature extraction involves learning new sound classes incrementally, which is essential for adapting to new data distributions over time.

Acoustic Scene Classification Continual Learning +3

Paper
Add Code

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

no code implementations • 11 Feb 2023 • Cong Han, Vishal Choudhari, Yinghao Aaron Li, Nima Mesgarani

Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment.

Paper
Add Code

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

2 code implementations • 20 Jan 2023 • Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns.

4,074

Paper
Code

StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models

1 code implementation • 29 Dec 2022 • Yinghao Aaron Li, Cong Han, Nima Mesgarani

Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models.

Data Augmentation Transfer Learning +1

146

Paper
Code

StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis

1 code implementation • 30 May 2022 • Yinghao Aaron Li, Cong Han, Nima Mesgarani

Text-to-Speech (TTS) has recently seen great progress in synthesizing high-quality speech owing to the rapid development of parallel TTS systems, but producing speech with naturalistic prosodic variations, speaking styles and emotional tones remains challenging.

Data Augmentation Self-Supervised Learning +2

352

Paper
Code

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

2 code implementations • 21 Jul 2021 • Yinghao Aaron Li, Ali Zare, Nima Mesgarani

We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2.

Generative Adversarial Network Voice Conversion

459

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.