no code implementations • IWSLT (EMNLP) 2018 • Nguyen Bach, Hongjie Chen, Kai Fan, Cheung-Chi Leung, Bo Li, Chongjia Ni, Rong Tong, Pei Zhang, Boxing Chen, Bin Ma, Fei Huang
This work describes the En→De Alibaba speech translation system developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2018.
no code implementations • 2 Nov 2023 • Tianyi Wang, Mengxiao Huang, Harry Cheng, Bin Ma, Yinglong Wang
Falsification and source tracing are accomplished by justifying the consistency between the content-matched identity perceptual watermark and the recovered robust watermark from the image.
1 code implementation • 22 Sep 2023 • Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma
Dual-path is a popular architecture for speech separation models (e. g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships.
Ranked #5 on Speech Separation on WSJ0-2mix
1 code implementation • 14 Aug 2023 • Yan Zhu, Junbao Zhuo, Bin Ma, Jiajia Geng, Xiaoming Wei, Xiaolin Wei, Shuhui Wang
We propose a model called OTI for ZSVR by employing orthogonal temporal interpolation and the matching loss based on VLMs.
Ranked #1 on Zero-Shot Action Recognition on UCF101
1 code implementation • 20 May 2023 • Jia Qi Yip, Tuan Truong, Dianwen Ng, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma
In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling.
no code implementations • 8 Mar 2023 • Jinwei Wang, Hao Wu, Haihua Wang, Jiawei Zhang, Xiangyang Luo, Bin Ma
Therefore, we propose a novel adversarial defense mechanism, which is referred to as immune defense and is the example-based pre-defense.
no code implementations • 7 Mar 2023 • Jinjie Ni, Yukun Ma, Wen Wang, Qian Chen, Dianwen Ng, Han Lei, Trung Hieu Nguyen, Chong Zhang, Bin Ma, Erik Cambria
Learning on a massive amount of speech corpus leads to the recent success of many self-supervised speech models.
1 code implementation • 23 Feb 2023 • Shengkui Zhao, Bin Ma
To effectively solve the indirect elemental interactions across chunks in the dual-path architecture, MossFormer employs a joint local and global self-attention architecture that simultaneously performs a full-computation self-attention on local chunks and a linearised low-cost self-attention over the full sequence.
Ranked #2 on Speech Separation on WHAMR!
no code implementations • 25 Oct 2022 • Kun Zhou, Berrak Sisman, Carlos Busso, Bin Ma, Haizhou Li
To achieve this, we propose a novel EVC framework, Mixed-EVC, which only leverages discrete emotion training labels.
no code implementations • 7 Oct 2022 • Lei Wang, Rong Tong, Cheung Chi Leung, Sunil Sivadas, Chongjia Ni, Bin Ma
This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 5 Sep 2022 • Bin Ma, Yueli Hu, Xianxian Lv, Kai Li
Video denoising for raw image has always been the difficulty of camera image processing.
no code implementations • 1 Aug 2022 • Jia Qi Yip, Dianwen Ng, Bin Ma, Konstantin Pervushin, Eng Siong Chng
Nuclear Magnetic Resonance (NMR) is used in structural biology to experimentally determine the structure of proteins, which is used in many areas of biology and is an important part of drug development.
no code implementations • 2 Jun 2022 • Mingyuan Cheng, Xinru Liao, Quan Liu, Bin Ma, Jian Xu, Bo Zheng
Learning individual-level treatment effect is a fundamental problem in causal inference and has received increasing attention in many areas, especially in the user growth area which concerns many internet companies.
no code implementations • 1 Nov 2021 • Zongtao Liu, Bin Ma, Quan Liu, Jian Xu, Bo Zheng
When speaking of sponsored search, bid keyword recommendation is the fundamental service.
1 code implementation • EMNLP 2021 • Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq Joty, Eng Siong Chng, Bin Ma
For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 2 Oct 2021 • Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma
We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement.
no code implementations • 3 Feb 2021 • Shengkui Zhao, Hao Wang, Trung Hieu Nguyen, Bin Ma
Cross-lingual voice conversion (VC) is an important and challenging problem due to significant mismatches of the phonetic set and the speech prosody of different languages.
1 code implementation • 3 Feb 2021 • Shengkui Zhao, Trung Hieu Nguyen, Bin Ma
In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complex-valued convolutional layers by constructing more informative features.
Ranked #1 on Speech Enhancement on DNS Challenge
1 code implementation • 16 Oct 2020 • Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma
With these data, three neural TTS models -- Tacotron2, Transformer and FastSpeech are applied for building bilingual and code-switched TTS.
no code implementations • 7 Oct 2020 • Xu Yang, Zhaohui Shang, Keliang Hu, Yi Hu, Bin Ma, Yongjiang Wang, Zihuang Cao, Michael C. B. Ashley, Wei Wang
Dome A in Antarctica has many characteristics that make it an excellent site for astronomical observations, from the optical to the terahertz.
Instrumentation and Methods for Astrophysics
no code implementations • 21 May 2020 • Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Eng Siong Chng, Chongjia Ni, Bin Ma
To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture.
no code implementations • 25 Nov 2019 • Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li
To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Apr 2019 • Yerbolat Khassanov, Hai-Hua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma
The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 2 Jan 2019 • Bin Ma, Shubao Liu, Yingxuan Zhi, Qi Song
Building on these, we demonstrate that image features can be learned in self-supervision by first training an optical flow estimator with synthetic flow data, and then learning image features from the estimated flows in real motion data.
no code implementations • 10 Jun 2018 • Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li
We also find that it is important to have sufficient speech segment pairs to train the deep CNN for effective acoustic word embeddings.
no code implementations • 5 Feb 2016 • Kong Aik Lee, Ville Hautamäki, Anthony Larcher, Wei Rao, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, Haizhou Li, Sylvain Meignier
This article describes the systems jointly submitted by Institute for Infocomm (I$^2$R), the Laboratoire d'Informatique de l'Universit\'e du Maine (LIUM), Nanyang Technology University (NTU) and the University of Eastern Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE).
no code implementations • MediaEval 2015 Workshop 2015 • Jingyong Hou, Van Tung Pham, Cheung-Chi Leung, Lei Wang, HaiHua Xu, Hang Lv, Lei Xie, Zhonghua Fu, Chongjia Ni, Xiong Xiao, Hongjie Chen, Shaofei Zhang, Sining Sun, Yougen Yuan, Pengcheng Li, Tin Lay Nwe, Sunil Sivadas, Bin Ma, Eng Siong Chng, Haizhou Li
This paper describes the system developed by the NNI team for the Query-by-Example Search on Speech Task (QUESST) in the MediaEval 2015 evaluation.
Ranked #9 on Keyword Spotting on QUESST
no code implementations • 16 Oct 2014 • Peng Yang, HaiHua Xu, Xiong Xiao, Lei Xie, Cheung-Chi Leung, Hongjie Chen, JIA YU, Hang Lv, Lei Wang, Su Jun Leow, Bin Ma, Eng Siong Chng, Haizhou Li
For both symbolic and DTW search, partial sequence matching is performed to reduce missing rate, especially for query type 2 and 3.
Ranked #6 on Keyword Spotting on QUESST
no code implementations • 20 Nov 2001 • Ming Li, Xin Chen, Xin Li, Bin Ma, Paul Vitanyi
A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied.