Search Results for author: BinBin Zhang

Found 16 papers, 9 papers with code

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

no code implementations7 Jan 2024 He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

The GUA-Speech System Description for CNVSRC Challenge 2023

no code implementations12 Dec 2023 Shengqiang Li, Chao Lei, Baozhong Ma, BinBin Zhang, Fuping Pan

This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023.

Language Modelling speech-recognition +1

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

no code implementations7 Oct 2023 Kaixun Huang, Ao Zhang, BinBin Zhang, Tianyi Xu, Xingchen Song, Lei Xie

However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

no code implementations18 May 2023 Xingchen Song, Di wu, BinBin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu

In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}.

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

1 code implementation1 Nov 2022 Xingchen Song, Di wu, Zhiyong Wu, BinBin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu

In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models.

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

1 code implementation30 Oct 2022 Jie Wang, Menglong Xu, Jingyong Hou, BinBin Zhang, Xiao-Lei Zhang, Lei Xie, Fuping Pan

Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices.

Keyword Spotting

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

3 code implementations29 Mar 2022 BinBin Zhang, Di wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu

Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model.

Language Modelling speech-recognition +1

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

1 code implementation7 Oct 2021 BinBin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di wu, Zhendong Peng

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total.

Label Error Detection Optical Character Recognition +4

Interpretable Compositional Convolutional Neural Networks

1 code implementation9 Jul 2021 Wen Shen, Zhihua Wei, Shikun Huang, BinBin Zhang, Jiaqi Fan, Ping Zhao, Quanshi Zhang

The reasonable definition of semantic interpretability presents the core challenge in explainable AI.

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

no code implementations10 Jun 2021 Di wu, BinBin Zhang, Chao Yang, Zhendong Peng, Wenjing Xia, Xiaoyu Chen, Xin Lei

On the experiment of AISHELL-1, we achieve a 4. 63\% character error rate (CER) with a non-streaming setup and 5. 05\% with a streaming setup with 320ms latency by U2++.

Data Augmentation speech-recognition +1

WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit

4 code implementations2 Feb 2021 Zhuoyuan Yao, Di wu, Xiong Wang, BinBin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei

In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

speech-recognition Speech Recognition

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

5 code implementations10 Dec 2020 BinBin Zhang, Di wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei

In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

Sentence speech-recognition +1

Verifiability and Predictability: Interpreting Utilities of Network Architectures for Point Cloud Processing

1 code implementation CVPR 2021 Wen Shen, Zhihua Wei, Shikun Huang, BinBin Zhang, Panyue Chen, Ping Zhao, Quanshi Zhang

In this paper, we diagnose deep neural networks for 3D point cloud processing to explore utilities of different intermediate-layer network architectures.

Adversarial Robustness

3D-Rotation-Equivariant Quaternion Neural Networks

1 code implementation ECCV 2020 Wen Shen, BinBin Zhang, Shikun Huang, Zhihua Wei, Quanshi Zhang

This paper proposes a set of rules to revise various neural networks for 3D point cloud processing to rotation-equivariant quaternion neural networks (REQNNs).

Cannot find the paper you are looking for? You can Submit a new open access paper.