Search Results for author: Yapeng Tian

Found 51 papers, 33 papers with code

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures

1 code implementation2 Apr 2024 Tanvir Mahmud, Yapeng Tian, Diana Marculescu

Visual sound source localization poses a significant challenge in identifying the semantic region of each sounding source within a video.

Robust Active Speaker Detection in Noisy Environments

no code implementations27 Mar 2024 Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian

Experiments demonstrate that non-speech audio noises significantly impact ASD models, and our proposed approach improves ASD performance in noisy environments.

Speech Separation

Text-to-Audio Generation Synchronized with Videos

no code implementations8 Mar 2024 Shentong Mo, Jing Shi, Yapeng Tian

Extensive evaluations on the AudioCaps and T2AV-Bench demonstrate that our T2AV sets a new standard for video-aligned TTA generation in ensuring visual alignment and temporal consistency.

AudioCaps Audio Generation +1

OSCaR: Object State Captioning and State Change Representation

1 code implementation27 Feb 2024 Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu

To address these challenges, in this paper, we introduce the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark.

Change Detection Object

Efficiently Leveraging Linguistic Priors for Scene Text Spotting

no code implementations27 Feb 2024 Nguyen Nguyen, Yapeng Tian, Chenliang Xu

This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.

Scene Text Recognition Text Detection +1

LAVSS: Location-Guided Audio-Visual Spatial Audio Separation

no code implementations31 Oct 2023 Yuxin Ye, Wenming Yang, Yapeng Tian

LAVSS is inspired by the correlation between spatial audio and visual location.

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation

no code implementations18 Oct 2023 Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu

The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view.

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

no code implementations27 Sep 2023 Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment.

Room Impulse Response (RIR)

Class-Incremental Grouping Network for Continual Audio-Visual Learning

1 code implementation ICCV 2023 Shentong Mo, Weiguo Pian, Yapeng Tian

Our CIGN leverages learnable audio-visual class tokens and audio-visual grouping to continually aggregate class-aware features.

audio-visual learning Class Incremental Learning +2

SignDiff: Learning Diffusion Models for American Sign Language Production

no code implementations30 Aug 2023 Sen Fang, Chunyu Sui, Xuedong Zhang, Yapeng Tian

The field of Sign Language Production (SLP) lacked a large-scale, pre-trained model based on deep learning for continuous American Sign Language (ASL) production in the past decade.

Pose Estimation Sign Language Production +1

DiffI2I: Efficient Diffusion Model for Image-to-Image Translation

no code implementations26 Aug 2023 Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Radu Timotfe, Luc van Gool

Compared to traditional DMs, the compact IPR enables DiffI2I to obtain more accurate outcomes and employ a lighter denoising network and fewer iterations.

Denoising Image-to-Image Translation +2

Audio-Visual Class-Incremental Learning

1 code implementation ICCV 2023 Weiguo Pian, Shentong Mo, Yunhui Guo, Yapeng Tian

We demonstrate that joint audio-visual modeling can improve class-incremental learning, but current methods fail to preserve semantic similarity between audio and visual features as incremental step grows.

Class Incremental Learning Incremental Learning +3

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

no code implementations31 Jul 2023 Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu

We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner.

Dual Arbitrary Scale Super-Resolution for Multi-Contrast MRI

1 code implementation5 Jul 2023 Jiamiao Zhang, Yichen Chi, Jun Lyu, Wenming Yang, Yapeng Tian

Limited by imaging systems, the reconstruction of Magnetic Resonance Imaging (MRI) images from partial measurement is essential to medical imaging research.

Super-Resolution

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA

no code implementations31 May 2023 Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo

In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.

counterfactual Counterfactual Inference +2

EgoVSR: Towards High-Quality Egocentric Video Super-Resolution

1 code implementation24 May 2023 Yichen Chi, Junhao Gu, Jiamiao Zhang, Wenming Yang, Yapeng Tian

We explicitly tackle motion blurs in egocentric videos using a Dual Branch Deblur Network (DB$^2$Net) in the VSR framework.

Video Super-Resolution

DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment

no code implementations22 May 2023 Shentong Mo, Jing Shi, Yapeng Tian

In this work, we propose a novel and personalized text-to-sound generation approach with visual alignment based on latent diffusion models, namely DiffAVA, that can simply fine-tune lightweight visual-text alignment modules with frozen modality-specific encoders to update visual-aligned text embeddings as the condition.

AudioCaps Audio Generation +1

AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation

no code implementations3 May 2023 Shentong Mo, Yapeng Tian

In this work, we propose a simple yet effective audio-visual localization and segmentation framework based on the Segment Anything Model, namely AV-SAM, that can generate sounding object masks corresponding to the audio.

Object Localization Segmentation +1

Audio-Visual Grouping Network for Sound Localization from Mixtures

1 code implementation CVPR 2023 Shentong Mo, Yapeng Tian

Sound source localization is a typical and challenging task that predicts the location of sound sources in a video.

Object Localization

Egocentric Audio-Visual Object Localization

1 code implementation CVPR 2023 Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even within a short duration; 2) The out-of-view sound components can be created while wearers shift their attention.

Object Object Localization

DiffIR: Efficient Diffusion Model for Image Restoration

1 code implementation ICCV 2023 Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc van Gool

Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network.

Denoising Image Generation +1

Basic Binary Convolution Unit for Binarized Image Restoration Network

2 code implementations2 Oct 2022 Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc van Gool

In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for IR tasks.

Binarization Image Restoration +1

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

no code implementations20 Aug 2022 Yake Wei, Di Hu, Yapeng Tian, Xuelong Li

A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected.

audio-visual learning Scene Understanding

Structured Sparsity Learning for Efficient Video Super-Resolution

1 code implementation CVPR 2023 Bin Xia, Jingwen He, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Luc van Gool

In SSL, we design pruning schemes for several key components in VSR models, including residual blocks, recurrent networks, and upsampling networks.

Video Super-Resolution

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

1 code implementation CVPR 2022 Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Chenliang Xu, Jing Qin

However, existing methods still have two shortcomings: (1) they neglect that the multi-contrast features at different scales contain different anatomical details and hence lack effective mechanisms to match and fuse these features for better reconstruction; and (2) they are still deficient in capturing long-range dependencies, which are essential for the regions with complicated anatomical structures.

Super-Resolution

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

1 code implementation CVPR 2022 Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu

In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.

audio-visual learning Audio-visual Question Answering +4

Learning Spatio-Temporal Downsampling for Effective Video Upscaling

no code implementations15 Mar 2022 Xiaoyu Xiang, Yapeng Tian, Vijay Rengarajan, Lucas Young, Bo Zhu, Rakesh Ranjan

Consequently, the inverse task of upscaling a low-resolution, low frame-rate video in space and time becomes a challenging ill-posed problem due to information loss and aliasing artifacts.

Quantization

STDAN: Deformable Attention Network for Space-Time Video Super-Resolution

1 code implementation14 Mar 2022 Hai Wang, Xiaoyu Xiang, Yapeng Tian, Wenming Yang, Qingmin Liao

Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts in dynamic video frames are adaptively captured and aggregated to enhance SR reconstruction.

Space-time Video Super-resolution Video Super-Resolution

Coarse-to-Fine Embedded PatchMatch and Multi-Scale Dynamic Aggregation for Reference-based Super-Resolution

1 code implementation12 Jan 2022 Bin Xia, Yapeng Tian, Yucheng Hang, Wenming Yang, Qingmin Liao, Jie zhou

To improve matching efficiency, we design a novel Embedded PatchMacth scheme with random samples propagation, which involves end-to-end training with asymptotic linear computational cost to the input size.

Reference-based Super-Resolution

Efficient Non-Local Contrastive Attention for Image Super-Resolution

1 code implementation11 Jan 2022 Bin Xia, Yucheng Hang, Yapeng Tian, Wenming Yang, Qingmin Liao, Jie zhou

To demonstrate the effectiveness of ENLCA, we build an architecture called Efficient Non-Local Contrastive Network (ENLCN) by adding a few of our modules in a simple backbone.

Contrastive Learning Feature Correlation +1

Space-Time Memory Network for Sounding Object Localization in Videos

no code implementations10 Nov 2021 Sizhe Li, Yapeng Tian, Chenliang Xu

Leveraging temporal synchronization and association within sight and sound is an essential step towards robust localization of sounding objects.

Object Localization

Video Matting via Consistency-Regularized Graph Neural Networks

no code implementations ICCV 2021 Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, Ming-Hsuan Yang

In this paper, we propose to enhance the temporal coherence by Consistency-Regularized Graph Neural Networks (CRGNN) with the aid of a synthesized video matting dataset.

Image Matting Optical Flow Estimation +1

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing

1 code implementation ECCV 2020 Yapeng Tian, DIngzeyu Li, Chenliang Xu

In this paper, we introduce a new problem, named audio-visual video parsing, which aims to parse a video into temporal event segments and label them as either audible, visible, or both.

Multiple Instance Learning

TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution

1 code implementation CVPR 2020 Yapeng Tian, Yulun Zhang, Yun Fu, Chenliang Xu

Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames).

Optical Flow Estimation Video Super-Resolution

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

3 code implementations CVPR 2020 Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu

Rather than synthesizing missing LR video frames as VFI networks do, we firstly temporally interpolate LR frame features in missing LR video frames capturing local temporal contexts by the proposed feature temporal interpolation network.

Space-time Video Super-resolution Video Frame Interpolation +1

Deep Audio Prior

1 code implementation21 Dec 2019 Yapeng Tian, Chenliang Xu, DIngzeyu Li

We are interested in applying deep networks in the absence of training dataset.

blind source separation Texture Synthesis

LCSCNet: Linear Compressing Based Skip-Connecting Network for Image Super-Resolution

1 code implementation9 Sep 2019 Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue, Qingmin Liao

In this paper, we develop a concise but efficient network architecture called linear compressing based skip-connecting network (LCSCNet) for image super-resolution.

Image Super-Resolution

CFSNet: Toward a Controllable Feature Space for Image Restoration

1 code implementation ICCV 2019 Wei Wang, Ruiming Guo, Yapeng Tian, Wenming Yang

Deep learning methods have witnessed the great progress in image restoration with specific metrics (e. g., PSNR, SSIM).

Image Restoration Image Super-Resolution +1

TDAN: Temporally Deformable Alignment Network for Video Super-Resolution

2 code implementations7 Dec 2018 Yapeng Tian, Yulun Zhang, Yun Fu, Chenliang Xu

Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames).

Optical Flow Estimation Video Super-Resolution

An Attempt towards Interpretable Audio-Visual Video Captioning

no code implementations7 Dec 2018 Yapeng Tian, Chenxiao Guan, Justin Goodman, Marc Moore, Chenliang Xu

To achieve this, we propose a multimodal convolutional neural network-based audio-visual video captioning framework and introduce a modality-aware module for exploring modality selection during sentence generation.

Audio captioning Audio-Visual Video Captioning +3

Deep Learning for Single Image Super-Resolution: A Brief Review

1 code implementation9 Aug 2018 Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue

Single image super-resolution (SISR) is a notoriously challenging ill-posed problem, which aims to obtain a high-resolution (HR) output from one of its low-resolution (LR) versions.

Efficient Neural Network Image Super-Resolution

Residual Dense Network for Image Super-Resolution

16 code implementations CVPR 2018 Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, Yun Fu

In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. We fully exploit the hierarchical features from all the convolutional layers.

Color Image Denoising Image Super-Resolution

Cannot find the paper you are looking for? You can Submit a new open access paper.