Search Results for author: Guanlong Zhao

Found 10 papers, 4 papers with code

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

2 code implementations7 Jan 2024 Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

no code implementations15 Sep 2023 Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang

Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

no code implementations14 Sep 2023 Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang

We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages.

Change Detection

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

no code implementations11 Nov 2022 Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno

Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy.

Change Detection

Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting

no code implementations11 Nov 2022 Beltrán Labrador, Guanlong Zhao, Ignacio López Moreno, Angelo Scorza Scarpati, Liam Fowl, Quan Wang

In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task.

Keyword Spotting

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

1 code implementation25 Oct 2022 Quan Wang, Yiling Huang, Han Lu, Guanlong Zhao, Ignacio Lopez Moreno

While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems.

Clustering speaker-diarization +1

Improved Techniques for Learning to Dehaze and Beyond: A Collective Study

1 code implementation30 Jun 2018 Yu Liu, Guanlong Zhao, Boyuan Gong, Yang Li, Ritu Raj, Niraj Goel, Satya Kesav, Sandeep Gottimukkala, Zhangyang Wang, Wenqi Ren, DaCheng Tao

Here we explore two related but important tasks based on the recently released REalistic Single Image DEhazing (RESIDE) benchmark dataset: (i) single image dehazing as a low-level image restoration problem; and (ii) high-level visual understanding (e. g., object detection) of hazy images.

Image Dehazing Image Restoration +4

PAD-Net: A Perception-Aided Single Image Dehazing Network

1 code implementation8 May 2018 Yu Liu, Guanlong Zhao

In this work, we investigate the possibility of replacing the $\ell_2$ loss with perceptually derived loss functions (SSIM, MS-SSIM, etc.)

Image Dehazing MS-SSIM +2

Cannot find the paper you are looking for? You can Submit a new open access paper.