Search Results for author: Min Tang

Found 20 papers, 5 papers with code

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

no code implementations • 12 Feb 2024 • Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

In this work, we propose ELaTE, a zero-shot TTS that can generate natural laughing speech of any speaker based on a short audio prompt with precise control of laughter timing and expression.

Paper
Add Code

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

no code implementations • 16 Jan 2024 • Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics.

Automatic Speech Recognition Benchmarking +4

Paper
Add Code

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

no code implementations • 14 Aug 2023 • Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech.

Language Modelling Multi-Task Learning +2

Paper
Add Code

CTSN: Predicting Cloth Deformation for Skeleton-based Characters with a Two-stream Skinning Network

no code implementations • 30 May 2023 • Yudi Li, Min Tang, Yun Yang, Ruofeng Tong, Shuangcai Yang, Yao Li, Bailin An, Qilong Kou

We present a novel learning method to predict the cloth deformation for skeleton-based characters with a two-stream network.

Paper
Add Code

Real-Time Audio-Visual End-to-End Speech Enhancement

no code implementations • 13 Mar 2023 • Zirun Zhu, Hemin Yang, Min Tang, ZiYi Yang, Sefik Emre Eskimez, Huaming Wang

In this paper, we propose a low-latency real-time audio-visual end-to-end enhancement (AV-E3Net) model based on the recently proposed end-to-end enhancement network (E3Net).

Speech Enhancement Task 2

Paper
Add Code

StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping

1 code implementation • CVPR 2023 • Diqiong Jiang, Dan Song, Ruofeng Tong, Min Tang

StyleIPSB gives us a novel tool for high-fidelity face swapping, and we propose a three-stage framework for face swapping with StyleIPSB.

Attribute Face Swapping

Paper
Code

Exploring WavLM on Speech Enhancement

no code implementations • 18 Nov 2022 • Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success.

Self-Supervised Learning Speech Enhancement +2

Paper
Add Code

Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation

no code implementations • 4 Nov 2022 • Sefik Emre Eskimez, Takuya Yoshioka, Alex Ju, Min Tang, Tanel Parnamaa, Huaming Wang

Personalized speech enhancement (PSE) is a real-time SE approach utilizing a speaker embedding of a target person to remove background noise, reverberation, and interfering voices.

Acoustic echo cancellation Multi-Task Learning +1

Paper
Add Code

N-Cloth: Predicting 3D Cloth Deformation with Mesh-Based Networks

no code implementations • 13 Dec 2021 • Yudi Li, Min Tang, Yun Yang, Zi Huang, Ruofeng Tong, Shuangcai Yang, Yao Li, Dinesh Manocha

We present a novel mesh-based learning approach (N-Cloth) for plausible 3D cloth deformation prediction.

Paper
Add Code

Sphere Face Model:A 3D Morphable Model with Hypersphere Manifold Latent Space

no code implementations • 4 Dec 2021 • Diqiong Jiang, Yiwei Jin, FangLue Zhang, Zhe Zhu, Yun Zhang, Ruofeng Tong, Min Tang

However, the shape parameters of traditional 3DMMs satisfy the multivariate Gaussian distribution while the identity embeddings satisfy the hypersphere distribution, and this conflict makes it challenging for face reconstruction models to preserve the faithfulness and the shape consistency simultaneously.

Face Model Face Reconstruction

Paper
Add Code

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

no code implementations • 12 Oct 2021 • Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Paper
Add Code

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement

no code implementations • 5 Jun 2021 • Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka

Performance analysis is also carried out by changing the ASR model, the data used for the ASR-step, and the schedule of the two update steps.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Reconstructing Recognizable 3D Face Shapes based on 3D Morphable Models

no code implementations • 8 Apr 2021 • Diqiong Jiang, Yiwei Jin, FangLue Zhang, Yukun Yai, Risheng Deng, Ruofeng Tong, Min Tang

We compare our method with existing methods in terms of the reconstruction error, visual distinguishability, and face recognition accuracy of the shape parameters.

Face Recognition

Paper
Add Code

Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT images

1 code implementation • 9 Feb 2021 • Jiaxuan Li, Peiyao Jin, Jianfeng Zhu, Haidong Zou, Xun Xu, Min Tang, Minwen Zhou, Yu Gan, Jiangnan He, Yuye Ling, Yikai Su

An accurate and automated tissue segmentation algorithm for retinal optical coherence tomography (OCT) images is crucial for the diagnosis of glaucoma.

Medical Image Segmentation Retinal OCT Layer Segmentation +1

Paper
Code

Hierarchical Optimization Time Integration for CFL-rate MPM Stepping

1 code implementation • 18 Nov 2019 • Xinlei Wang, Minchen Li, Yu Fang, Xinxin Zhang, Ming Gao, Min Tang, Danny M. Kaufman, Chenfanfu Jiang

We propose Hierarchical Optimization Time Integration (HOT) for efficient implicit time-stepping of the Material Point Method (MPM) irrespective of simulated materials and conditions.

Graphics

Paper
Code

Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation

1 code implementation • 14 Aug 2018 • Chengyang Li, Dan Song, Ruofeng Tong, Min Tang

To narrow this gap, we propose a network fusion architecture, which consists of a multispectral proposal network to generate pedestrian proposals, and a subsequent multispectral classification network to distinguish pedestrian instances from hard negatives.

Autonomous Driving Pedestrian Detection +1

Paper
Code

Illumination-aware Faster R-CNN for Robust Multispectral Pedestrian Detection

no code implementations • 14 Mar 2018 • Chengyang Li, Dan Song, Ruofeng Tong, Min Tang

Multispectral images of color-thermal pairs have shown more effective than a single color channel for pedestrian detection, especially under challenging illumination conditions.

Pedestrian Detection