Search Results for author: Yiling Huang

Found 12 papers, 6 papers with code

PNeSM: Arbitrary 3D Scene Stylization via Prompt-Based Neural Style Mapping

no code implementations • 13 Mar 2024 • Jiafu Chen, Wei Xing, Jiakai Sun, Tianyi Chu, Yiling Huang, Boyan Ji, Lei Zhao, Huaizhong Lin, Haibo Chen, Zhizhong Wang

3D scene stylization refers to transform the appearance of a 3D scene to match a given style image, ensuring that images rendered from different viewpoints exhibit the same style as the given style image, while maintaining the 3D consistency of the stylized scene.

Disentanglement

Paper
Add Code

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

2 code implementations • 7 Jan 2024 • Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

312

Paper
Code

ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank

1 code implementation • 11 Dec 2023 • Zhanjie Zhang, Quanwei Zhang, Guangyuan Li, Wei Xing, Lei Zhao, Jiakai Sun, Zehua Lan, Junsheng Luan, Yiling Huang, Huaizhong Lin

To address the above issues, we propose ArtBank, a novel artistic style transfer framework, to generate highly realistic stylized images while preserving the content structure of the content images.

Style Transfer

Paper
Code

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

no code implementations • 15 Sep 2023 • Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang

Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

no code implementations • 14 Sep 2023 • Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang

We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages.

Change Detection

Paper
Add Code

Selective inference using randomized group lasso estimators for general models

no code implementations • 24 Jun 2023 • Yiling Huang, Sarah Pirenne, Snigdha Panigrahi, Gerda Claeskens

Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions.

Nutrition

Paper
Add Code

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

no code implementations • 11 Nov 2022 • Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno

Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy.

Change Detection

Paper
Add Code

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

1 code implementation • 25 Oct 2022 • Quan Wang, Yiling Huang, Han Lu, Guanlong Zhao, Ignacio Lopez Moreno

While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems.

Clustering speaker-diarization +1

490

Paper
Code

Parameter-Free Attentive Scoring for Speaker Verification

1 code implementation • 10 Mar 2022 • Jason Pelecanos, Quan Wang, Yiling Huang, Ignacio Lopez Moreno

This paper presents a novel study of parameter-free attentive scoring for speaker verification.

Speaker Verification

312

Paper
Code

Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech

1 code implementation • 24 Feb 2022 • Quan Wang, Yang Yu, Jason Pelecanos, Yiling Huang, Ignacio Lopez Moreno

In this paper, we introduce a novel language identification system based on conformer layers.

Domain Adaptation Language Identification

312

Paper
Code

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

1 code implementation • 23 Sep 2021 • Wei Xia, Han Lu, Quan Wang, Anshuman Tripathi, Yiling Huang, Ignacio Lopez Moreno, Hasim Sak

In this paper, we present a novel speaker diarization system for streaming on-device applications.

Clustering speaker-diarization +1

490

Paper
Code

Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech

no code implementations • 24 Nov 2020 • Yiling Huang, Yutian Chen, Jason Pelecanos, Quan Wang

In recent years, Text-To-Speech (TTS) has been used as a data augmentation technique for speech recognition to help complement inadequacies in the training data.

Data Augmentation Speaker Recognition +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.