Search Results for author: BinBin Zhang

Found 16 papers, 9 papers with code

U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

no code implementations • 25 Apr 2024 • Xingchen Song, Di wu, BinBin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fuping Pan, Chao Yang

Scale has opened new frontiers in natural language processing, but at a high cost.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

The GUA-Speech System Description for CNVSRC Challenge 2023

no code implementations • 12 Dec 2023 • Shengqiang Li, Chao Lei, Baozhong Ma, BinBin Zhang, Fuping Pan

This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023.

Language Modelling speech-recognition +1

Paper
Add Code

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

no code implementations • 7 Oct 2023 • Kaixun Huang, Ao Zhang, BinBin Zhang, Tianyi Xu, Xingchen Song, Lei Xie

However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

no code implementations • 18 May 2023 • Xingchen Song, Di wu, BinBin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu

In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}.

Paper
Add Code

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

1 code implementation • 1 Nov 2022 • Xingchen Song, Di wu, Zhiyong Wu, BinBin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu

In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models.

3,696

Paper
Code

FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition

no code implementations • 31 Oct 2022 • Xingchen Song, Di wu, BinBin Zhang, Zhiyong Wu, Wenpeng Li, Dongfang Li, Pengshen Zhang, Zhendong Peng, Fuping Pan, Changbao Zhu, Zhongqin Wu

Therefore, we name it FusionFormer.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

1 code implementation • 30 Oct 2022 • Jie Wang, Menglong Xu, Jingyong Hou, BinBin Zhang, Xiao-Lei Zhang, Lei Xie, Fuping Pan

Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices.

Keyword Spotting

381

Paper
Code

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

3 code implementations • 29 Mar 2022 • BinBin Zhang, Di wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu

Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model.

Language Modelling speech-recognition +1

3,696

Paper
Code

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

1 code implementation • 7 Oct 2021 • BinBin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di wu, Zhendong Peng

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total.

Ranked #5 on Speech Recognition on WenetSpeech

Label Error Detection Optical Character Recognition +4

455

Paper
Code

Interpretable Compositional Convolutional Neural Networks

1 code implementation • 9 Jul 2021 • Wen Shen, Zhihua Wei, Shikun Huang, BinBin Zhang, Jiaqi Fan, Ping Zhao, Quanshi Zhang

The reasonable definition of semantic interpretability presents the core challenge in explainable AI.

Paper
Code

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

no code implementations • 10 Jun 2021 • Di wu, BinBin Zhang, Chao Yang, Zhendong Peng, Wenjing Xia, Xiaoyu Chen, Xin Lei

On the experiment of AISHELL-1, we achieve a 4. 63\% character error rate (CER) with a non-streaming setup and 5. 05\% with a streaming setup with 320ms latency by U2++.

Data Augmentation speech-recognition +1

Paper
Add Code

WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit

4 code implementations • 2 Feb 2021 • Zhuoyuan Yao, Di wu, Xiong Wang, BinBin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei

In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

speech-recognition Speech Recognition

10,154

Paper
Code

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

5 code implementations • 10 Dec 2020 • BinBin Zhang, Di wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei

In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

Ranked #6 on Speech Recognition on AISHELL-1

Sentence speech-recognition +1

10,154

Paper
Code

Verifiability and Predictability: Interpreting Utilities of Network Architectures for Point Cloud Processing

1 code implementation • CVPR 2021 • Wen Shen, Zhihua Wei, Shikun Huang, BinBin Zhang, Panyue Chen, Ping Zhao, Quanshi Zhang

In this paper, we diagnose deep neural networks for 3D point cloud processing to explore utilities of different intermediate-layer network architectures.

Adversarial Robustness

Paper
Code

3D-Rotation-Equivariant Quaternion Neural Networks

1 code implementation • ECCV 2020 • Wen Shen, BinBin Zhang, Shikun Huang, Zhihua Wei, Quanshi Zhang

This paper proposes a set of rules to revise various neural networks for 3D point cloud processing to rotation-equivariant quaternion neural networks (REQNNs).

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.