1 code implementation • 2 Apr 2024 • Tanvir Mahmud, Yapeng Tian, Diana Marculescu
Visual sound source localization poses a significant challenge in identifying the semantic region of each sounding source within a video.
no code implementations • 27 Mar 2024 • Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian
Experiments demonstrate that non-speech audio noises significantly impact ASD models, and our proposed approach improves ASD performance in noisy environments.
no code implementations • 8 Mar 2024 • Shentong Mo, Jing Shi, Yapeng Tian
Extensive evaluations on the AudioCaps and T2AV-Bench demonstrate that our T2AV sets a new standard for video-aligned TTA generation in ensuring visual alignment and temporal consistency.
1 code implementation • 27 Feb 2024 • Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu
To address these challenges, in this paper, we introduce the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark.
no code implementations • 27 Feb 2024 • Nguyen Nguyen, Yapeng Tian, Chenliang Xu
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
no code implementations • 21 Dec 2023 • Chenxu Zhang, Chao Wang, Jianfeng Zhang, Hongyi Xu, Guoxian Song, You Xie, Linjie Luo, Yapeng Tian, Xiaohu Guo, Jiashi Feng
The generation of emotional talking faces from a single portrait image remains a significant challenge.
no code implementations • 31 Oct 2023 • Yuxin Ye, Wenming Yang, Yapeng Tian
LAVSS is inspired by the correlation between spatial audio and visual location.
no code implementations • 18 Oct 2023 • Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu
The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view.
no code implementations • 27 Sep 2023 • Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment.
1 code implementation • 19 Sep 2023 • Chengyan Wang, Jun Lyu, Shuo Wang, Chen Qin, Kunyuan Guo, Xinyu Zhang, Xiaotong Yu, Yan Li, Fanwen Wang, Jianhua Jin, Zhang Shi, Ziqiang Xu, Yapeng Tian, Sha Hua, Zhensen Chen, Meng Liu, Mengting Sun, Xutong Kuang, Kang Wang, Haoran Wang, Hao Li, Yinghua Chu, Guang Yang, Wenjia Bai, Xiahai Zhuang, He Wang, Jing Qin, Xiaobo Qu
However, a limitation of CMR is its slow imaging speed, which causes patient discomfort and introduces artifacts in the images.
1 code implementation • ICCV 2023 • Shentong Mo, Weiguo Pian, Yapeng Tian
Our CIGN leverages learnable audio-visual class tokens and audio-visual grouping to continually aggregate class-aware features.
no code implementations • 30 Aug 2023 • Sen Fang, Chunyu Sui, Xuedong Zhang, Yapeng Tian
The field of Sign Language Production (SLP) lacked a large-scale, pre-trained model based on deep learning for continuous American Sign Language (ASL) production in the past decade.
no code implementations • 26 Aug 2023 • Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Radu Timotfe, Luc van Gool
Compared to traditional DMs, the compact IPR enables DiffI2I to obtain more accurate outcomes and employ a lighter denoising network and fewer iterations.
1 code implementation • ICCV 2023 • Weiguo Pian, Shentong Mo, Yunhui Guo, Yapeng Tian
We demonstrate that joint audio-visual modeling can improve class-incremental learning, but current methods fail to preserve semantic similarity between audio and visual features as incremental step grows.
no code implementations • 31 Jul 2023 • Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu
We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner.
1 code implementation • 5 Jul 2023 • Jiamiao Zhang, Yichen Chi, Jun Lyu, Wenming Yang, Yapeng Tian
Limited by imaging systems, the reconstruction of Magnetic Resonance Imaging (MRI) images from partial measurement is essential to medical imaging research.
no code implementations • 31 May 2023 • Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo
In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.
1 code implementation • 24 May 2023 • Yichen Chi, Junhao Gu, Jiamiao Zhang, Wenming Yang, Yapeng Tian
We explicitly tackle motion blurs in egocentric videos using a Dual Branch Deblur Network (DB$^2$Net) in the VSR framework.
no code implementations • 22 May 2023 • Shentong Mo, Jing Shi, Yapeng Tian
In this work, we propose a novel and personalized text-to-sound generation approach with visual alignment based on latent diffusion models, namely DiffAVA, that can simply fine-tune lightweight visual-text alignment modules with frozen modality-specific encoders to update visual-aligned text embeddings as the condition.
no code implementations • 3 May 2023 • Shentong Mo, Yapeng Tian
In this work, we propose a simple yet effective audio-visual localization and segmentation framework based on the Segment Anything Model, namely AV-SAM, that can generate sounding object masks corresponding to the audio.
1 code implementation • CVPR 2023 • Shentong Mo, Yapeng Tian
Sound source localization is a typical and challenging task that predicts the location of sound sources in a video.
1 code implementation • CVPR 2023 • Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even within a short duration; 2) The out-of-view sound components can be created while wearers shift their attention.
1 code implementation • ICCV 2023 • Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc van Gool
Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network.
1 code implementation • 30 Nov 2022 • Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc van Gool
It consists of a knowledge distillation based implicit degradation estimator network (KD-IDE) and an efficient SR network.
2 code implementations • 2 Oct 2022 • Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc van Gool
In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for IR tasks.
no code implementations • 20 Aug 2022 • Yake Wei, Di Hu, Yapeng Tian, Xuelong Li
A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected.
1 code implementation • 28 Jul 2022 • Bin Xia, Yapeng Tian, Yulun Zhang, Yucheng Hang, Wenming Yang, Qingmin Liao
The most of CNN based super-resolution (SR) methods assume that the degradation is known (\eg, bicubic).
1 code implementation • CVPR 2023 • Bin Xia, Jingwen He, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Luc van Gool
In SSL, we design pruning schemes for several key components in VSR models, including residual blocks, recurrent networks, and upsampling networks.
1 code implementation • CVPR 2022 • Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Chenliang Xu, Jing Qin
However, existing methods still have two shortcomings: (1) they neglect that the multi-contrast features at different scales contain different anatomical details and hence lack effective mechanisms to match and fuse these features for better reconstruction; and (2) they are still deficient in capturing long-range dependencies, which are essential for the regions with complicated anatomical structures.
1 code implementation • CVPR 2022 • Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu
In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.
Ranked #5 on Audio-visual Question Answering on MUSIC-AVQA
no code implementations • 15 Mar 2022 • Xiaoyu Xiang, Yapeng Tian, Vijay Rengarajan, Lucas Young, Bo Zhu, Rakesh Ranjan
Consequently, the inverse task of upscaling a low-resolution, low frame-rate video in space and time becomes a challenging ill-posed problem due to information loss and aliasing artifacts.
1 code implementation • 14 Mar 2022 • Hai Wang, Xiaoyu Xiang, Yapeng Tian, Wenming Yang, Qingmin Liao
Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts in dynamic video frames are adaptively captured and aggregated to enhance SR reconstruction.
1 code implementation • 12 Jan 2022 • Bin Xia, Yapeng Tian, Yucheng Hang, Wenming Yang, Qingmin Liao, Jie zhou
To improve matching efficiency, we design a novel Embedded PatchMacth scheme with random samples propagation, which involves end-to-end training with asymptotic linear computational cost to the input size.
1 code implementation • 11 Jan 2022 • Bin Xia, Yucheng Hang, Yapeng Tian, Wenming Yang, Qingmin Liao, Jie zhou
To demonstrate the effectiveness of ENLCA, we build an architecture called Efficient Non-Local Contrastive Network (ENLCN) by adding a few of our modules in a simple backbone.
no code implementations • 10 Nov 2021 • Sizhe Li, Yapeng Tian, Chenliang Xu
Leveraging temporal synchronization and association within sight and sound is an essential step towards robust localization of sounding objects.
1 code implementation • 15 Apr 2021 • Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu
A na\"ive method is to decompose it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR).
Space-time Video Super-resolution Video Frame Interpolation +1
1 code implementation • CVPR 2021 • Yapeng Tian, Chenliang Xu
In this paper, we propose to make a systematic study on machines multisensory perception under attacks.
1 code implementation • CVPR 2021 • Yapeng Tian, Di Hu, Chenliang Xu
There are rich synchronized audio and visual events in our daily life.
no code implementations • ICCV 2021 • Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, Ming-Hsuan Yang
In this paper, we propose to enhance the temporal coherence by Consistency-Regularized Graph Neural Networks (CRGNN) with the aid of a synthesized video matting dataset.
1 code implementation • ECCV 2020 • Yapeng Tian, DIngzeyu Li, Chenliang Xu
In this paper, we introduce a new problem, named audio-visual video parsing, which aims to parse a video into temporal event segments and label them as either audible, visible, or both.
1 code implementation • CVPR 2020 • Yapeng Tian, Yulun Zhang, Yun Fu, Chenliang Xu
Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames).
3 code implementations • CVPR 2020 • Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu
Rather than synthesizing missing LR video frames as VFI networks do, we firstly temporally interpolate LR frame features in missing LR video frames capturing local temporal contexts by the proposed feature temporal interpolation network.
Ranked #4 on Video Frame Interpolation on Vid4 - 4x upscaling
Space-time Video Super-resolution Video Frame Interpolation +1
1 code implementation • 21 Dec 2019 • Yapeng Tian, Chenliang Xu, DIngzeyu Li
We are interested in applying deep networks in the absence of training dataset.
1 code implementation • 9 Sep 2019 • Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue, Qingmin Liao
In this paper, we develop a concise but efficient network architecture called linear compressing based skip-connecting network (LCSCNet) for image super-resolution.
Ranked #14 on Image Super-Resolution on Set14 - 3x upscaling
1 code implementation • ICCV 2019 • Wei Wang, Ruiming Guo, Yapeng Tian, Wenming Yang
Deep learning methods have witnessed the great progress in image restoration with specific metrics (e. g., PSNR, SSIM).
3 code implementations • 25 Dec 2018 • Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, Yun Fu
We fully exploit the hierarchical features from all the convolutional layers.
no code implementations • 7 Dec 2018 • Yapeng Tian, Chenxiao Guan, Justin Goodman, Marc Moore, Chenliang Xu
To achieve this, we propose a multimodal convolutional neural network-based audio-visual video captioning framework and introduce a modality-aware module for exploring modality selection during sentence generation.
2 code implementations • 7 Dec 2018 • Yapeng Tian, Yulun Zhang, Yun Fu, Chenliang Xu
Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames).
1 code implementation • 9 Aug 2018 • Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue
Single image super-resolution (SISR) is a notoriously challenging ill-posed problem, which aims to obtain a high-resolution (HR) output from one of its low-resolution (LR) versions.
2 code implementations • ECCV 2018 • Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, Chenliang Xu
In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.
16 code implementations • CVPR 2018 • Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, Yun Fu
In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. We fully exploit the hierarchical features from all the convolutional layers.
Ranked #3 on Color Image Denoising on CBSD68 sigma50