Search Results for author: Yapeng Tian

Found 51 papers, 33 papers with code

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures

1 code implementation • 2 Apr 2024 • Tanvir Mahmud, Yapeng Tian, Diana Marculescu

Visual sound source localization poses a significant challenge in identifying the semantic region of each sounding source within a video.

Paper
Code

Robust Active Speaker Detection in Noisy Environments

no code implementations • 27 Mar 2024 • Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian

Experiments demonstrate that non-speech audio noises significantly impact ASD models, and our proposed approach improves ASD performance in noisy environments.

Speech Separation

Paper
Add Code

Text-to-Audio Generation Synchronized with Videos

no code implementations • 8 Mar 2024 • Shentong Mo, Jing Shi, Yapeng Tian

Extensive evaluations on the AudioCaps and T2AV-Bench demonstrate that our T2AV sets a new standard for video-aligned TTA generation in ensuring visual alignment and temporal consistency.

AudioCaps Audio Generation +1

Paper
Add Code

OSCaR: Object State Captioning and State Change Representation

1 code implementation • 27 Feb 2024 • Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu

To address these challenges, in this paper, we introduce the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark.

Change Detection Object

Paper
Code

Efficiently Leveraging Linguistic Priors for Scene Text Spotting

no code implementations • 27 Feb 2024 • Nguyen Nguyen, Yapeng Tian, Chenliang Xu

This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.

Scene Text Recognition Text Detection +1

Paper
Add Code

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

no code implementations • 21 Dec 2023 • Chenxu Zhang, Chao Wang, Jianfeng Zhang, Hongyi Xu, Guoxian Song, You Xie, Linjie Luo, Yapeng Tian, Xiaohu Guo, Jiashi Feng

The generation of emotional talking faces from a single portrait image remains a significant challenge.

Talking Face Generation

Paper
Add Code

LAVSS: Location-Guided Audio-Visual Spatial Audio Separation

no code implementations • 31 Oct 2023 • Yuxin Ye, Wenming Yang, Yapeng Tian

LAVSS is inspired by the correlation between spatial audio and visual location.

Paper
Add Code

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation

no code implementations • 18 Oct 2023 • Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu

The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view.

Paper
Add Code

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

no code implementations • 27 Sep 2023 • Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment.

Room Impulse Response (RIR)

Paper
Add Code

CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction

1 code implementation • 19 Sep 2023 • Chengyan Wang, Jun Lyu, Shuo Wang, Chen Qin, Kunyuan Guo, Xinyu Zhang, Xiaotong Yu, Yan Li, Fanwen Wang, Jianhua Jin, Zhang Shi, Ziqiang Xu, Yapeng Tian, Sha Hua, Zhensen Chen, Meng Liu, Mengting Sun, Xutong Kuang, Kang Wang, Haoran Wang, Hao Li, Yinghua Chu, Guang Yang, Wenjia Bai, Xiahai Zhuang, He Wang, Jing Qin, Xiaobo Qu

However, a limitation of CMR is its slow imaging speed, which causes patient discomfort and introduces artifacts in the images.

Image Reconstruction

Paper
Code

Class-Incremental Grouping Network for Continual Audio-Visual Learning

1 code implementation • ICCV 2023 • Shentong Mo, Weiguo Pian, Yapeng Tian

Our CIGN leverages learnable audio-visual class tokens and audio-visual grouping to continually aggregate class-aware features.

audio-visual learning Class Incremental Learning +2

Paper
Code

SignDiff: Learning Diffusion Models for American Sign Language Production

no code implementations • 30 Aug 2023 • Sen Fang, Chunyu Sui, Xuedong Zhang, Yapeng Tian

The field of Sign Language Production (SLP) lacked a large-scale, pre-trained model based on deep learning for continuous American Sign Language (ASL) production in the past decade.

Pose Estimation Sign Language Production +1

Paper
Add Code

DiffI2I: Efficient Diffusion Model for Image-to-Image Translation

no code implementations • 26 Aug 2023 • Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Radu Timotfe, Luc van Gool

Compared to traditional DMs, the compact IPR enables DiffI2I to obtain more accurate outcomes and employ a lighter denoising network and fewer iterations.

Denoising Image-to-Image Translation +2

Paper
Add Code

Audio-Visual Class-Incremental Learning

1 code implementation • ICCV 2023 • Weiguo Pian, Shentong Mo, Yunhui Guo, Yapeng Tian

We demonstrate that joint audio-visual modeling can improve class-incremental learning, but current methods fail to preserve semantic similarity between audio and visual features as incremental step grows.

Class Incremental Learning Incremental Learning +3

Paper
Code

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

no code implementations • 31 Jul 2023 • Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu

We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner.

Paper
Add Code

Dual Arbitrary Scale Super-Resolution for Multi-Contrast MRI

1 code implementation • 5 Jul 2023 • Jiamiao Zhang, Yichen Chi, Jun Lyu, Wenming Yang, Yapeng Tian

Limited by imaging systems, the reconstruction of Magnetic Resonance Imaging (MRI) images from partial measurement is essential to medical imaging research.

Decoder Super-Resolution

Paper
Code

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA

no code implementations • 31 May 2023 • Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo

In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.

counterfactual Counterfactual Inference +2

Paper
Add Code

EgoVSR: Towards High-Quality Egocentric Video Super-Resolution

1 code implementation • 24 May 2023 • Yichen Chi, Junhao Gu, Jiamiao Zhang, Wenming Yang, Yapeng Tian

We explicitly tackle motion blurs in egocentric videos using a Dual Branch Deblur Network (DB$^2$Net) in the VSR framework.

Video Super-Resolution

Paper
Code

DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment

no code implementations • 22 May 2023 • Shentong Mo, Jing Shi, Yapeng Tian

In this work, we propose a novel and personalized text-to-sound generation approach with visual alignment based on latent diffusion models, namely DiffAVA, that can simply fine-tune lightweight visual-text alignment modules with frozen modality-specific encoders to update visual-aligned text embeddings as the condition.

AudioCaps Audio Generation +1

Paper
Add Code

AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation

no code implementations • 3 May 2023 • Shentong Mo, Yapeng Tian

In this work, we propose a simple yet effective audio-visual localization and segmentation framework based on the Segment Anything Model, namely AV-SAM, that can generate sounding object masks corresponding to the audio.

Decoder Object Localization +2

Paper
Add Code

Audio-Visual Grouping Network for Sound Localization from Mixtures

1 code implementation • CVPR 2023 • Shentong Mo, Yapeng Tian

Sound source localization is a typical and challenging task that predicts the location of sound sources in a video.

Object Localization

Paper
Code

Egocentric Audio-Visual Object Localization

1 code implementation • CVPR 2023 • Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even within a short duration; 2) The out-of-view sound components can be created while wearers shift their attention.

Object Object Localization

Paper
Code

DiffIR: Efficient Diffusion Model for Image Restoration

1 code implementation • ICCV 2023 • Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc van Gool

Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network.

Denoising Image Generation +1

368

Paper
Code

Knowledge Distillation based Degradation Estimation for Blind Super-Resolution

1 code implementation • 30 Nov 2022 • Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc van Gool

It consists of a knowledge distillation based implicit degradation estimator network (KD-IDE) and an efficient SR network.

Blind Super-Resolution Image Super-Resolution +1

134

Paper
Code

Basic Binary Convolution Unit for Binarized Image Restoration Network

2 code implementations • 2 Oct 2022 • Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc van Gool

In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for IR tasks.

Binarization Image Restoration +1

114

Paper
Code

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

no code implementations • 20 Aug 2022 • Yake Wei, Di Hu, Yapeng Tian, Xuelong Li

A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected.

audio-visual learning Scene Understanding

Paper
Add Code

Meta-Learning based Degradation Representation for Blind Super-Resolution

1 code implementation • 28 Jul 2022 • Bin Xia, Yapeng Tian, Yulun Zhang, Yucheng Hang, Wenming Yang, Qingmin Liao

The most of CNN based super-resolution (SR) methods assume that the degradation is known (\eg, bicubic).

Blind Super-Resolution Knowledge Distillation +2

Paper
Code

Structured Sparsity Learning for Efficient Video Super-Resolution

1 code implementation • CVPR 2023 • Bin Xia, Jingwen He, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Luc van Gool

In SSL, we design pruning schemes for several key components in VSR models, including residual blocks, recurrent networks, and upsampling networks.

Video Super-Resolution

Paper
Code

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

1 code implementation • CVPR 2022 • Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Chenliang Xu, Jing Qin

However, existing methods still have two shortcomings: (1) they neglect that the multi-contrast features at different scales contain different anatomical details and hence lack effective mechanisms to match and fuse these features for better reconstruction; and (2) they are still deficient in capturing long-range dependencies, which are essential for the regions with complicated anatomical structures.

Super-Resolution

Paper
Code

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

1 code implementation • CVPR 2022 • Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu

In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.

Ranked #5 on Audio-visual Question Answering on MUSIC-AVQA

audio-visual learning Audio-visual Question Answering +4

Paper
Code

Learning Spatio-Temporal Downsampling for Effective Video Upscaling

no code implementations • 15 Mar 2022 • Xiaoyu Xiang, Yapeng Tian, Vijay Rengarajan, Lucas Young, Bo Zhu, Rakesh Ranjan

Consequently, the inverse task of upscaling a low-resolution, low frame-rate video in space and time becomes a challenging ill-posed problem due to information loss and aliasing artifacts.

Quantization

Paper
Add Code

STDAN: Deformable Attention Network for Space-Time Video Super-Resolution

1 code implementation • 14 Mar 2022 • Hai Wang, Xiaoyu Xiang, Yapeng Tian, Wenming Yang, Qingmin Liao

Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts in dynamic video frames are adaptively captured and aggregated to enhance SR reconstruction.

Space-time Video Super-resolution Video Super-Resolution

Paper
Code

Coarse-to-Fine Embedded PatchMatch and Multi-Scale Dynamic Aggregation for Reference-based Super-Resolution

1 code implementation • 12 Jan 2022 • Bin Xia, Yapeng Tian, Yucheng Hang, Wenming Yang, Qingmin Liao, Jie zhou

To improve matching efficiency, we design a novel Embedded PatchMacth scheme with random samples propagation, which involves end-to-end training with asymptotic linear computational cost to the input size.

Reference-based Super-Resolution

Paper
Code

Efficient Non-Local Contrastive Attention for Image Super-Resolution

1 code implementation • 11 Jan 2022 • Bin Xia, Yucheng Hang, Yapeng Tian, Wenming Yang, Qingmin Liao, Jie zhou

To demonstrate the effectiveness of ENLCA, we build an architecture called Efficient Non-Local Contrastive Network (ENLCN) by adding a few of our modules in a simple backbone.

Contrastive Learning Feature Correlation +1

Paper
Code

Space-Time Memory Network for Sounding Object Localization in Videos

no code implementations • 10 Nov 2021 • Sizhe Li, Yapeng Tian, Chenliang Xu

Leveraging temporal synchronization and association within sight and sound is an essential step towards robust localization of sounding objects.

Object Localization

Paper
Add Code

Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution

1 code implementation • 15 Apr 2021 • Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu

A na\"ive method is to decompose it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR).

Space-time Video Super-resolution Video Frame Interpolation +1

896

Paper
Code

Can audio-visual integration strengthen robustness under multimodal attacks?

1 code implementation • CVPR 2021 • Yapeng Tian, Chenliang Xu

In this paper, we propose to make a systematic study on machines multisensory perception under attacks.

audio-visual learning Visual Localization

Paper
Code

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation

1 code implementation • CVPR 2021 • Yapeng Tian, Di Hu, Chenliang Xu

There are rich synchronized audio and visual events in our daily life.

Object Visual Grounding

Paper
Code

Video Matting via Consistency-Regularized Graph Neural Networks

no code implementations • ICCV 2021 • Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, Ming-Hsuan Yang

In this paper, we propose to enhance the temporal coherence by Consistency-Regularized Graph Neural Networks (CRGNN) with the aid of a synthesized video matting dataset.

Image Matting Optical Flow Estimation +1

Paper
Add Code

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing

1 code implementation • ECCV 2020 • Yapeng Tian, DIngzeyu Li, Chenliang Xu

In this paper, we introduce a new problem, named audio-visual video parsing, which aims to parse a video into temporal event segments and label them as either audible, visible, or both.

Multiple Instance Learning

Paper
Code

TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution

1 code implementation • CVPR 2020 • Yapeng Tian, Yulun Zhang, Yun Fu, Chenliang Xu

Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames).

Optical Flow Estimation Video Super-Resolution

400

Paper
Code

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

3 code implementations • CVPR 2020 • Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu

Rather than synthesizing missing LR video frames as VFI networks do, we firstly temporally interpolate LR frame features in missing LR video frames capturing local temporal contexts by the proposed feature temporal interpolation network.

Ranked #4 on Video Frame Interpolation on Vid4 - 4x upscaling

Space-time Video Super-resolution Video Frame Interpolation +1

896

Paper
Code

Deep Audio Prior

1 code implementation • 21 Dec 2019 • Yapeng Tian, Chenliang Xu, DIngzeyu Li

We are interested in applying deep networks in the absence of training dataset.

blind source separation Texture Synthesis

157

Paper
Code

LCSCNet: Linear Compressing Based Skip-Connecting Network for Image Super-Resolution

1 code implementation • 9 Sep 2019 • Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue, Qingmin Liao

In this paper, we develop a concise but efficient network architecture called linear compressing based skip-connecting network (LCSCNet) for image super-resolution.

Ranked #14 on Image Super-Resolution on Set14 - 3x upscaling

Image Super-Resolution

Paper
Code

CFSNet: Toward a Controllable Feature Space for Image Restoration

1 code implementation • ICCV 2019 • Wei Wang, Ruiming Guo, Yapeng Tian, Wenming Yang

Deep learning methods have witnessed the great progress in image restoration with specific metrics (e. g., PSNR, SSIM).

Image Restoration Image Super-Resolution +1

Paper
Code

Residual Dense Network for Image Restoration

3 code implementations • 25 Dec 2018 • Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, Yun Fu

We fully exploit the hierarchical features from all the convolutional layers.

Ranked #1 on JPEG Artifact Correction on LIVE1 (Quality 30 Grayscale)

Deblurring Image Compression +5

537

Paper
Code

An Attempt towards Interpretable Audio-Visual Video Captioning

no code implementations • 7 Dec 2018 • Yapeng Tian, Chenxiao Guan, Justin Goodman, Marc Moore, Chenliang Xu

To achieve this, we propose a multimodal convolutional neural network-based audio-visual video captioning framework and introduce a modality-aware module for exploring modality selection during sentence generation.

Audio captioning Audio-Visual Video Captioning +3

Paper
Add Code

TDAN: Temporally Deformable Alignment Network for Video Super-Resolution

2 code implementations • 7 Dec 2018 • Yapeng Tian, Yulun Zhang, Yun Fu, Chenliang Xu

Ranked #12 on Video Super-Resolution on MSU Video Super Resolution Benchmark: Detail Restoration

Optical Flow Estimation Video Super-Resolution

400

Paper
Code

Deep Learning for Single Image Super-Resolution: A Brief Review

1 code implementation • 9 Aug 2018 • Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue

Single image super-resolution (SISR) is a notoriously challenging ill-posed problem, which aims to obtain a high-resolution (HR) output from one of its low-resolution (LR) versions.

Efficient Neural Network Image Super-Resolution

Paper
Code

Audio-Visual Event Localization in Unconstrained Videos

2 code implementations • ECCV 2018 • Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, Chenliang Xu

In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.

audio-visual event localization Temporal Localization

158

Paper
Code

Residual Dense Network for Image Super-Resolution

16 code implementations • CVPR 2018 • Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, Yun Fu

In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. We fully exploit the hierarchical features from all the convolutional layers.

Ranked #3 on Color Image Denoising on CBSD68 sigma50

Color Image Denoising Image Super-Resolution

4,505

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.