Search Results for author: Cong Han

Found 21 papers, 10 papers with code

Improving Conversational Recommendation Systems’ Quality with Context-Aware Item Meta-Information

no code implementations • Findings (NAACL) 2022 • Bowen Yang, Cong Han, Yu Li, Lei Zuo, Zhou Yu

In this paper, we propose a simple yet effective architecture comprising a pre-trained language model (PLM) and an item metadata encoder to integrate the recommendation and the dialog generation better.

Knowledge Graphs Language Modelling +2

Paper
Add Code

Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation

1 code implementation • 27 Mar 2024 • Xilin Jiang, Cong Han, Nima Mesgarani

In this work, we replace transformers with Mamba, a selective state space model, for speech separation.

Speech Separation

Paper
Code

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience

no code implementations • 6 Feb 2024 • Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume.

Language Modelling Large Language Model

Paper
Add Code

Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

no code implementations • 27 Sep 2023 • Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios.

Contrastive Learning Data Augmentation

Paper
Add Code

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

no code implementations • 18 Sep 2023 • Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Subjective evaluations on LJSpeech show that our model significantly outperforms both iSTFTNet and HiFi-GAN, achieving ground-truth-level performance.

Speech Synthesis

Paper
Add Code

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

no code implementations • 18 Jul 2023 • Yinghao Aaron Li, Cong Han, Nima Mesgarani

In recent years, large-scale pre-trained speech language models (SLMs) have demonstrated remarkable advancements in various generative speech modeling applications, such as text-to-speech synthesis, voice conversion, and speech enhancement.

Generative Adversarial Network Language Modelling +4

Paper
Add Code

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

1 code implementation • NeurIPS 2023 • Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani

In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis.

Speech Synthesis

4,220

Paper
Code

Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

1 code implementation • ICCV 2023 • Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma

Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pretrained visual-language model.

Classification Language Modelling +3

Paper
Code

Online Binaural Speech Separation of Moving Speakers With a Wavesplit Network

no code implementations • 13 Mar 2023 • Cong Han, Nima Mesgarani

Binaural speech separation in real-world scenarios often involves moving speakers.

Online Clustering Speaker Separation +1

Paper
Add Code

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

no code implementations • 11 Feb 2023 • Cong Han, Vishal Choudhari, Yinghao Aaron Li, Nima Mesgarani

Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment.

Paper
Add Code

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

2 code implementations • 20 Jan 2023 • Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns.

4,220

Paper
Code

StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models

1 code implementation • 29 Dec 2022 • Yinghao Aaron Li, Cong Han, Nima Mesgarani

Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models.

Data Augmentation Transfer Learning +1

148

Paper
Code

Extensible Proxy for Efficient NAS

1 code implementation • 17 Oct 2022 • Yuhong Li, Jiajie Li, Cong Han, Pan Li, JinJun Xiong, Deming Chen

(2) Efficient proxies are not extensible to multi-modality downstream tasks.

Neural Architecture Search

Paper
Code

StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis

1 code implementation • 30 May 2022 • Yinghao Aaron Li, Cong Han, Nima Mesgarani

Text-to-Speech (TTS) has recently seen great progress in synthesizing high-quality speech owing to the rapid development of parallel TTS systems, but producing speech with naturalistic prosodic variations, speaking styles and emotional tones remains challenging.

Data Augmentation Self-Supervised Learning +2

359

Paper
Code

Multi-Channel Speech Denoising for Machine Ears

no code implementations • 17 Feb 2022 • Cong Han, E. Merve Kaya, Kyle Hoefer, Malcolm Slaney, Simon Carlile

This work describes a speech denoising system for machine ears that aims to improve speech intelligibility and the overall listening experience in noisy environments.

Denoising Speech Denoising

Paper
Add Code

Improving Conversational Recommendation Systems' Quality with Context-Aware Item Meta Information

1 code implementation • 15 Dec 2021 • Bowen Yang, Cong Han, Yu Li, Lei Zuo, Zhou Yu

The encoder learns to map item metadata to embeddings that can reflect the semantic information in the dialog context.

Language Modelling Recommendation Systems +1

Paper
Code

Dual-Path Modeling for Long Recording Speech Separation in Meetings

no code implementations • 23 Feb 2021 • Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian

A transformer-based dual-path system is proposed, which integrates transform layers for global modeling.

Speech Separation

Paper
Add Code

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording

no code implementations • 17 Dec 2020 • Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen

Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years.

Clustering Speech Separation

Paper
Add Code

Group Communication with Context Codec for Lightweight Source Separation

1 code implementation • 14 Dec 2020 • Yi Luo, Cong Han, Nima Mesgarani

A context codec module, containing a context encoder and a context decoder, is designed as a learnable downsampling and upsampling module to decrease the length of a sequential feature processed by the separation module.

Decoder Speech Enhancement +1

Paper
Code

Incentive Mechanism Design for ROI-constrained Auto-bidding

no code implementations • 4 Dec 2020 • Bin Li, Xiao Yang, Daren Sun, Zhi Ji, Zhen Jiang, Cong Han, Dong Hao

Auto-bidding plays an important role in online advertising and has become a crucial tool for advertisers and advertising platforms to meet their performance objectives and optimize the efficiency of ad delivery.

Computer Science and Game Theory

Paper
Add Code

FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio Processing

1 code implementation • 29 Sep 2019 • Yi Luo, Enea Ceolini, Cong Han, Shih-Chii Liu, Nima Mesgarani

Beamforming has been extensively investigated for multi-channel audio processing tasks.

Speech Enhancement speech-recognition +1

238

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.