Search Results for author: Yiling Huang

Found 12 papers, 6 papers with code

PNeSM: Arbitrary 3D Scene Stylization via Prompt-Based Neural Style Mapping

no code implementations13 Mar 2024 Jiafu Chen, Wei Xing, Jiakai Sun, Tianyi Chu, Yiling Huang, Boyan Ji, Lei Zhao, Huaizhong Lin, Haibo Chen, Zhizhong Wang

3D scene stylization refers to transform the appearance of a 3D scene to match a given style image, ensuring that images rendered from different viewpoints exhibit the same style as the given style image, while maintaining the 3D consistency of the stylized scene.

Disentanglement

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

2 code implementations7 Jan 2024 Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank

1 code implementation11 Dec 2023 Zhanjie Zhang, Quanwei Zhang, Guangyuan Li, Wei Xing, Lei Zhao, Jiakai Sun, Zehua Lan, Junsheng Luan, Yiling Huang, Huaizhong Lin

To address the above issues, we propose ArtBank, a novel artistic style transfer framework, to generate highly realistic stylized images while preserving the content structure of the content images.

Style Transfer

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

no code implementations15 Sep 2023 Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang

Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

no code implementations14 Sep 2023 Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang

We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages.

Change Detection

Selective inference using randomized group lasso estimators for general models

no code implementations24 Jun 2023 Yiling Huang, Sarah Pirenne, Snigdha Panigrahi, Gerda Claeskens

Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions.

Nutrition

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

no code implementations11 Nov 2022 Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno

Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy.

Change Detection

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

1 code implementation25 Oct 2022 Quan Wang, Yiling Huang, Han Lu, Guanlong Zhao, Ignacio Lopez Moreno

While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems.

Clustering speaker-diarization +1

Parameter-Free Attentive Scoring for Speaker Verification

1 code implementation10 Mar 2022 Jason Pelecanos, Quan Wang, Yiling Huang, Ignacio Lopez Moreno

This paper presents a novel study of parameter-free attentive scoring for speaker verification.

Speaker Verification

Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech

no code implementations24 Nov 2020 Yiling Huang, Yutian Chen, Jason Pelecanos, Quan Wang

In recent years, Text-To-Speech (TTS) has been used as a data augmentation technique for speech recognition to help complement inadequacies in the training data.

Data Augmentation Speaker Recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.