no code implementations • EMNLP 2021 • Xiaobao Guo, Adams Kong, Huan Zhou, Xianfeng Wang, Min Wang
Specifically, to improve unimodal representations, a unimodal refinement module is designed to refine modality-specific learning via iteratively updating the distribution with transformer-based attention layers.
no code implementations • 6 May 2024 • Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie
Accents represent deviations from standard pronunciation norms, and the multi-task learning framework for simultaneous ASR and accent recognition (AR) has effectively addressed the multi-accent scenarios, making it a prominent solution.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 8 Apr 2024 • He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie
Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video.
1 code implementation • 4 Feb 2024 • Huan Zhou, Feng Xue, Yucong Li, Shi Gong, Yiqun Li, Yu Zhou
The spatial detail branch is firstly designed to extract low-level feature representation for the road by the first stage of ResNet-18.
no code implementations • 9 Mar 2023 • Kai Liu, Ziqing Du, Xucheng Wan, Huan Zhou
To mitigate the imperative SC issue, we reformulate the training objective and propose two novel loss schemes that explore the metric of reconstruction improvement performance defined at small chunk-level and leverage the metric associated distribution information.
no code implementations • 16 Jan 2023 • Kai Liu, Xucheng Wan, Ziqing Du, Huan Zhou
As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker.
1 code implementation • 7 Jan 2023 • Gangwei Xu, Huan Zhou, Xin Yang
In this paper, we propose CGI-Stereo, a novel neural network architecture that can concurrently achieve real-time performance, competitive accuracy, and strong generalization ability.
no code implementations • 24 Sep 2022 • Xucheng Wan, Kai Liu, Ziqing Du, Huan Zhou
To validate the effectiveness of our proposed model, extensive experiments are conducted on the DNS2020 dataset.
no code implementations • 24 Sep 2022 • Ziqing Du, Kai Liu, Xucheng Wan, Huan Zhou
Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion.
no code implementations • 15 Mar 2022 • Hong Liu, Wen-Dong Xu, Zi-Hao Shang, Xiang-Dong Wang, Hai-Yan Zhou, Ke-Wen Ma, Huan Zhou, Jia-Lin Qi, Jia-Rui Jiang, Li-Lan Tan, Hui-Min Zeng, Hui-Juan Cai, Kuan-Song Wang, Yue-Liang Qian
A weakly supervised learning framework based on discriminative patch selecting and multi-instance learning was proposed for breast cancer molecular subtype prediction from H&E WSIs.
no code implementations • 16 Dec 2020 • Naweiluo Zhou, Yiannis Georgiou, Li Zhong, Huan Zhou, Marcin Pospieszny
Containerisation demonstrates its efficiency in application deployment in cloud computing.
Distributed, Parallel, and Cluster Computing