no code implementations • 5 Jan 2024 • Dongdi Zhao, Jianbo Ma, Lu Lu, Jinke Li, Xuan Ji, Lei Zhu, Fuming Fang, Ming Liu, Feijun Jiang
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem.
1 code implementation • 18 Aug 2023 • Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, Weidong Cai
In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM.
1 code implementation • 27 Feb 2023 • Jianbo Ma, Siqi Pan, Deepak Chandran, Andrea Fanelli, Richard Cartwright
The SA represents our proposal for an efficient streaming SSRL implementation, while the LLSA solves the latency build-up problem of other streaming attention architectures, such as the masked acausal attention (MAA), guaranteeing a latency equal to one layer even when multiple layers are stacked.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3