Search Results for author: Min Tang

Found 20 papers, 5 papers with code

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

no code implementations12 Feb 2024 Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

In this work, we propose ELaTE, a zero-shot TTS that can generate natural laughing speech of any speaker based on a short audio prompt with precise control of laughter timing and expression.

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

no code implementations16 Jan 2024 Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics.

Automatic Speech Recognition Benchmarking +4

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

no code implementations14 Aug 2023 Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech.

Language Modelling Multi-Task Learning +2

CTSN: Predicting Cloth Deformation for Skeleton-based Characters with a Two-stream Skinning Network

no code implementations30 May 2023 Yudi Li, Min Tang, Yun Yang, Ruofeng Tong, Shuangcai Yang, Yao Li, Bailin An, Qilong Kou

We present a novel learning method to predict the cloth deformation for skeleton-based characters with a two-stream network.

Real-Time Audio-Visual End-to-End Speech Enhancement

no code implementations13 Mar 2023 Zirun Zhu, Hemin Yang, Min Tang, ZiYi Yang, Sefik Emre Eskimez, Huaming Wang

In this paper, we propose a low-latency real-time audio-visual end-to-end enhancement (AV-E3Net) model based on the recently proposed end-to-end enhancement network (E3Net).

Speech Enhancement Task 2

StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping

1 code implementation CVPR 2023 Diqiong Jiang, Dan Song, Ruofeng Tong, Min Tang

StyleIPSB gives us a novel tool for high-fidelity face swapping, and we propose a three-stage framework for face swapping with StyleIPSB.

Attribute Face Swapping

Exploring WavLM on Speech Enhancement

no code implementations18 Nov 2022 Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success.

Self-Supervised Learning Speech Enhancement +2

Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation

no code implementations4 Nov 2022 Sefik Emre Eskimez, Takuya Yoshioka, Alex Ju, Min Tang, Tanel Parnamaa, Huaming Wang

Personalized speech enhancement (PSE) is a real-time SE approach utilizing a speaker embedding of a target person to remove background noise, reverberation, and interfering voices.

Acoustic echo cancellation Multi-Task Learning +1

N-Cloth: Predicting 3D Cloth Deformation with Mesh-Based Networks

no code implementations13 Dec 2021 Yudi Li, Min Tang, Yun Yang, Zi Huang, Ruofeng Tong, Shuangcai Yang, Yao Li, Dinesh Manocha

We present a novel mesh-based learning approach (N-Cloth) for plausible 3D cloth deformation prediction.

Sphere Face Model:A 3D Morphable Model with Hypersphere Manifold Latent Space

no code implementations4 Dec 2021 Diqiong Jiang, Yiwei Jin, FangLue Zhang, Zhe Zhu, Yun Zhang, Ruofeng Tong, Min Tang

However, the shape parameters of traditional 3DMMs satisfy the multivariate Gaussian distribution while the identity embeddings satisfy the hypersphere distribution, and this conflict makes it challenging for face reconstruction models to preserve the faithfulness and the shape consistency simultaneously.

Face Model Face Reconstruction

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

no code implementations12 Oct 2021 Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Reconstructing Recognizable 3D Face Shapes based on 3D Morphable Models

no code implementations8 Apr 2021 Diqiong Jiang, Yiwei Jin, FangLue Zhang, Yukun Yai, Risheng Deng, Ruofeng Tong, Min Tang

We compare our method with existing methods in terms of the reconstruction error, visual distinguishability, and face recognition accuracy of the shape parameters.

Face Recognition

Hierarchical Optimization Time Integration for CFL-rate MPM Stepping

1 code implementation18 Nov 2019 Xinlei Wang, Minchen Li, Yu Fang, Xinxin Zhang, Ming Gao, Min Tang, Danny M. Kaufman, Chenfanfu Jiang

We propose Hierarchical Optimization Time Integration (HOT) for efficient implicit time-stepping of the Material Point Method (MPM) irrespective of simulated materials and conditions.

Graphics

Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation

1 code implementation14 Aug 2018 Chengyang Li, Dan Song, Ruofeng Tong, Min Tang

To narrow this gap, we propose a network fusion architecture, which consists of a multispectral proposal network to generate pedestrian proposals, and a subsequent multispectral classification network to distinguish pedestrian instances from hard negatives.

Autonomous Driving Pedestrian Detection +1

Illumination-aware Faster R-CNN for Robust Multispectral Pedestrian Detection

no code implementations14 Mar 2018 Chengyang Li, Dan Song, Ruofeng Tong, Min Tang

Multispectral images of color-thermal pairs have shown more effective than a single color channel for pedestrian detection, especially under challenging illumination conditions.

Pedestrian Detection

End-to-end detection-segmentation network with ROI convolution

1 code implementation8 Jan 2018 Zichen Zhang, Min Tang, Dana Cobzas, Dornoosh Zonoobi, Martin Jagersand, Jacob L. Jaremko

We propose an end-to-end neural network that improves the segmentation accuracy of fully convolutional networks by incorporating a localization unit.

Object Localization Segmentation

A deep level set method for image segmentation

no code implementations17 May 2017 Min Tang, Sepehr Valipour, Zichen Vincent Zhang, Dana Cobzas, MartinJagersand

This paper proposes a novel image segmentation approachthat integrates fully convolutional networks (FCNs) with a level setmodel.

Image Segmentation Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.