Search Results for author: Bin Ma

Found 30 papers, 7 papers with code

Alibaba Speech Translation Systems for IWSLT 2018

no code implementations • IWSLT (EMNLP) 2018 • Nguyen Bach, Hongjie Chen, Kai Fan, Cheung-Chi Leung, Bo Li, Chongjia Ni, Rong Tong, Pei Zhang, Boxing Chen, Bin Ma, Fei Huang

This work describes the En→De Alibaba speech translation system developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2018.

Sentence Translation

Paper
Add Code

Robust Identity Perceptual Watermark Against Deepfake Face Swapping

no code implementations • 2 Nov 2023 • Tianyi Wang, Mengxiao Huang, Harry Cheng, Bin Ma, Yinglong Wang

Falsification and source tracing are accomplished by justifying the consistency between the content-matched identity perceptual watermark and the recovered robust watermark from the image.

Face Swapping

Paper
Add Code

SPGM: Prioritizing Local Features for enhanced speech separation performance

1 code implementation • 22 Sep 2023 • Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma

Dual-path is a popular architecture for speech separation models (e. g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships.

Ranked #5 on Speech Separation on WSJ0-2mix

Speech Separation

Paper
Code

Orthogonal Temporal Interpolation for Zero-Shot Video Recognition

1 code implementation • 14 Aug 2023 • Yan Zhu, Junbao Zhuo, Bin Ma, Jiajia Geng, Xiaoming Wei, Xiaolin Wei, Shuhui Wang

We propose a model called OTI for ZSVR by employing orthogonal temporal interpolation and the matching loss based on VLMs.

Ranked #1 on Zero-Shot Action Recognition on UCF101

Video Recognition Zero-Shot Action Recognition +2

Paper
Code

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

1 code implementation • 20 May 2023 • Jia Qi Yip, Tuan Truong, Dianwen Ng, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma

In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling.

Speaker Verification

Paper
Code

Immune Defense: A Novel Adversarial Defense Mechanism for Preventing the Generation of Adversarial Examples

no code implementations • 8 Mar 2023 • Jinwei Wang, Hao Wu, Haihua Wang, Jiawei Zhang, Xiangyang Luo, Bin Ma

Therefore, we propose a novel adversarial defense mechanism, which is referred to as immune defense and is the example-based pre-defense.

Adversarial Defense

Paper
Add Code

Adaptive Knowledge Distillation between Text and Speech Pre-trained Models

no code implementations • 7 Mar 2023 • Jinjie Ni, Yukun Ma, Wen Wang, Qian Chen, Dianwen Ng, Han Lei, Trung Hieu Nguyen, Chong Zhang, Bin Ma, Erik Cambria

Learning on a massive amount of speech corpus leads to the recent success of many self-supervised speech models.

Knowledge Distillation Spoken Language Understanding

Paper
Add Code

MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions

1 code implementation • 23 Feb 2023 • Shengkui Zhao, Bin Ma

To effectively solve the indirect elemental interactions across chunks in the dual-path architecture, MossFormer employs a joint local and global self-attention architecture that simultaneously performs a full-computation self-attention on local chunks and a linearised low-cost self-attention over the full sequence.

Ranked #2 on Speech Separation on WHAMR!

Speech Separation

Paper
Code

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion

no code implementations • 25 Oct 2022 • Kun Zhou, Berrak Sisman, Carlos Busso, Bin Ma, Haizhou Li

To achieve this, we propose a novel EVC framework, Mixed-EVC, which only leverages discrete emotion training labels.

Attribute Voice Conversion

Paper
Add Code

Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages

no code implementations • 7 Oct 2022 • Lei Wang, Rong Tong, Cheung Chi Leung, Sunil Sivadas, Chongjia Ni, Bin Ma

This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Multi-scale Video Denoising Algorithm for Raw Image

no code implementations • 5 Sep 2022 • Bin Ma, Yueli Hu, Xianxian Lv, Kai Li

Video denoising for raw image has always been the difficulty of camera image processing.

Image Denoising Motion Estimation +1

Paper
Add Code

Amino Acid Classification in 2D NMR Spectra via Acoustic Signal Embeddings

no code implementations • 1 Aug 2022 • Jia Qi Yip, Dianwen Ng, Bin Ma, Konstantin Pervushin, Eng Siong Chng

Nuclear Magnetic Resonance (NMR) is used in structural biology to experimentally determine the structure of proteins, which is used in many areas of biology and is an important part of drug development.

Speaker Verification

Paper
Add Code

Learning Disentangled Representations for Counterfactual Regression via Mutual Information Minimization

no code implementations • 2 Jun 2022 • Mingyuan Cheng, Xinru Liao, Quan Liu, Bin Ma, Jian Xu, Bo Zheng

Learning individual-level treatment effect is a fundamental problem in causal inference and has received increasing attention in many areas, especially in the user growth area which concerns many internet companies.

Causal Inference counterfactual +3

Paper
Add Code

Heterogeneous Graph Neural Networks for Large-Scale Bid Keyword Matching

no code implementations • 1 Nov 2021 • Zongtao Liu, Bin Ma, Quan Liu, Jian Xu, Bo Zheng

When speaking of sponsored search, bid keyword recommendation is the fundamental service.

Marketing text similarity

Paper
Add Code

A Unified Speaker Adaptation Approach for ASR

1 code implementation • EMNLP 2021 • Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq Joty, Eng Siong Chng, Bin Ma

For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

no code implementations • 2 Oct 2021 • Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma

We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement.

Acoustic echo cancellation Speech Enhancement

Paper
Add Code

Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

no code implementations • 3 Feb 2021 • Shengkui Zhao, Hao Wang, Trung Hieu Nguyen, Bin Ma

Cross-lingual voice conversion (VC) is an important and challenging problem due to significant mismatches of the phonetic set and the speech prosody of different languages.

Voice Conversion

Paper
Add Code

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

1 code implementation • 3 Feb 2021 • Shengkui Zhao, Trung Hieu Nguyen, Bin Ma

In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complex-valued convolutional layers by constructing more informative features.

Ranked #1 on Speech Enhancement on DNS Challenge

Speech Enhancement

Paper
Code

Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion

1 code implementation • 16 Oct 2020 • Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma

With these data, three neural TTS models -- Tacotron2, Transformer and FastSpeech are applied for building bilingual and code-switched TTS.

Speech Synthesis Voice Conversion

7,864

Paper
Code

Cloud Cover and Aurora Contamination at Dome A in 2017 from KLCAM

no code implementations • 7 Oct 2020 • Xu Yang, Zhaohui Shang, Keliang Hu, Yi Hu, Bin Ma, Yongjiang Wang, Zihuang Cao, Michael C. B. Ashley, Wei Wang

Dome A in Antarctica has many characteristics that make it an excellent site for astronomical observations, from the optical to the terahertz.

Instrumentation and Methods for Astrophysics

Paper
Add Code

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

no code implementations • 21 May 2020 • Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Eng Siong Chng, Chongjia Ni, Bin Ma

To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture.

Cross-Lingual Transfer Language Modelling +1

Paper
Add Code

Independent language modeling architecture for end-to-end ASR

no code implementations • 25 Nov 2019 • Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li

To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

no code implementations • 8 Apr 2019 • Yerbolat Khassanov, Hai-Hua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Flow Based Self-supervised Pixel Embedding for Image Segmentation

no code implementations • 2 Jan 2019 • Bin Ma, Shubao Liu, Yingxuan Zhi, Qi Song

Building on these, we demonstrate that image features can be learned in self-supervision by first training an optical flow estimator with synthetic flow data, and then learning image features from the estimated flows in real motion data.

Image Segmentation Optical Flow Estimation +2

Paper
Add Code

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

no code implementations • 10 Jun 2018 • Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li

We also find that it is important to have sufficient speech segment pairs to train the deep CNN for effective acoustic word embeddings.

Dynamic Time Warping Word Embeddings

Paper
Add Code

Fantastic 4 system for NIST 2015 Language Recognition Evaluation

no code implementations • 5 Feb 2016 • Kong Aik Lee, Ville Hautamäki, Anthony Larcher, Wei Rao, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, Haizhou Li, Sylvain Meignier

This article describes the systems jointly submitted by Institute for Infocomm (I$^2$R), the Laboratoire d'Informatique de l'Universit\'e du Maine (LIUM), Nanyang Technology University (NTU) and the University of Eastern Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE).

regression