1 code implementation • 18 Mar 2024 • Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You
Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency.
1 code implementation • 13 Mar 2024 • Liang Chen, Yong Zhang, Yibing Song, Zhen Zhang, Lingqiao Liu
By d-separation, we observe that the causal feature can be further characterized by being independent of the domain conditioned on the object, and we propose the following two strategies as complements for the basic framework.
no code implementations • 12 Dec 2023 • Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen
The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction.
1 code implementation • 26 Nov 2023 • Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo
The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value.
1 code implementation • 8 Oct 2023 • Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song
In order to encompass common detection expressions, we involve emerging vision-language model (VLM) and large language model (LLM) to generate instructions guided by text prompts and object bbxs, as the generalizations of foundation models are effective to produce human-like expressions (e. g., describing object property, category, and relationship).
no code implementations • 25 Sep 2023 • Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu
This work aims to improve unsupervised audio-visual pre-training.
1 code implementation • ICCV 2023 • Liang Chen, Yong Zhang, Yibing Song, Anton Van Den Hengel, Lingqiao Liu
Specifically, we propose treating the element-wise contributions to the final results as the rationale for making a decision and representing the rationale for each sample as a matrix.
1 code implementation • CVPR 2023 • Zhihong Chen, Ruifei Zhang, Yibing Song, Xiang Wan, Guanbin Li
Therefore, in this paper, we propose a novel benchmark of \underline{S}cene \underline{K}nowledge-guided \underline{V}isual \underline{G}rounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge.
1 code implementation • ICCV 2023 • Zunnan Xu, Zhihong Chen, Yong Zhang, Yibing Song, Xiang Wan, Guanbin Li
Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities.
Ranked #2 on Referring Expression Segmentation on RefCOCO
no code implementations • 12 Jun 2023 • Shiming Chen, Wenjin Hou, Ziming Hong, Xiaohan Ding, Yibing Song, Xinge You, Tongliang Liu, Kun Zhang
After alignment, synthesized sample features from unseen classes are closer to the real sample features and benefit DSP to improve existing generative ZSL methods by 8. 5\%, 8. 0\%, and 9. 7\% on the standard CUB, SUN AWA2 datasets, the significant performance improvement indicates that evolving semantic prototype explores a virgin field in ZSL.
2 code implementations • ICCV 2023 • Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, LiMin Wang
Our EVAD consists of two specialized designs for video action detection.
1 code implementation • CVPR 2023 • Liang Chen, Yong Zhang, Yibing Song, Ying Shan, Lingqiao Liu
Generally, a TTT strategy hinges its performance on two main factors: selecting an appropriate auxiliary TTT task for updating and identifying reliable parameters to update during the test phase.
no code implementations • 30 Mar 2023 • Chongjian Ge, Jiangliu Wang, Zhan Tong, Shoufa Chen, Yibing Song, Ping Luo
We evaluate our soft neighbor contrastive learning method (SNCLR) on standard visual recognition benchmarks, including image classification, object detection, and instance segmentation.
no code implementations • 28 Mar 2023 • Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, LiMin Wang
Existing studies model each actor and scene relation to improve action recognition.
1 code implementation • 22 Feb 2023 • Hongyu Liu, Xintong Han, ChengBin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yibing Song, Jia Xu, Qifeng Chen
In this paper, we propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching, respectively.
no code implementations • ICCV 2023 • Changfeng Yu, Shiming Chen, Yi Chang, Yibing Song, Luxin Yan
To solve this dilemma, we propose a physical alignment and controllable generation network (PCGNet) for diverse and realistic rain generation.
2 code implementations • 6 Dec 2022 • Wenbo Li, Xin Yu, Kun Zhou, Yibing Song, Zhe Lin, Jiaya Jia
To achieve high-quality results with low computational cost, we present a novel pixel spread model (PSM) that iteratively employs decoupled probabilistic modeling, combining the optimization efficiency of GANs with the prediction tractability of probabilistic models.
1 code implementation • CVPR 2023 • Hongyu Liu, Yibing Song, Qifeng Chen
In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$.
3 code implementations • ICCV 2023 • Shoufa Chen, Peize Sun, Yibing Song, Ping Luo
We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes.
1 code implementation • 14 Oct 2022 • Yiming Zhu, Hongyu Liu, Yibing Song, Ziyang Yuan, Xintong Han, Chun Yuan, Qifeng Chen, Jue Wang
Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations.
2 code implementations • 26 May 2022 • Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, Ping Luo
To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently.
4 code implementations • 23 Mar 2022 • Zhan Tong, Yibing Song, Jue Wang, LiMin Wang
Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.
Ranked #5 on Self-Supervised Action Recognition on HMDB51
1 code implementation • CVPR 2022 • Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, Jue Wang
Following this principle, we propose to enrich the "diversity" of forgeries by synthesizing augmented forgeries with a pool of forgery configurations and strengthen the "sensitivity" to the forgeries by enforcing the model to predict the forgery configurations.
1 code implementation • 16 Feb 2022 • Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie
Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images.
Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)
2 code implementations • 28 Jan 2022 • Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu
In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models.
no code implementations • 13 Jan 2022 • Yuying Ge, Yibing Song, Ruimao Zhang, Ping Luo
Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person.
1 code implementation • 16 Dec 2021 • Shiming Chen, Ziming Hong, Wenjin Hou, Guo-Sen Xie, Yibing Song, Jian Zhao, Xinge You, Shuicheng Yan, Ling Shao
Analogously, VAT uses the similar feature augmentation encoder to refine the visual features, which are further applied in visual$\rightarrow$attribute decoder to learn visual-based attribute features.
1 code implementation • NeurIPS 2021 • Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo
Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.
1 code implementation • 11 Oct 2021 • Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo
Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.
1 code implementation • ICLR 2022 • Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie
Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images.
1 code implementation • CVPR 2021 • Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao
To this end, we propose spatially probabilistic diversity normalization (SPDNorm) inside the modulation to model the probability of generating a pixel conditioned on the context information.
1 code implementation • CVPR 2021 • Jie An, Siyu Huang, Yibing Song, Dejing Dou, Wei Liu, Jiebo Luo
The forward inference projects input images into deep features, while the backward inference remaps deep features back to input images in a lossless and unbiased way.
1 code implementation • CVPR 2021 • Shuai Jia, Yibing Song, Chao Ma, Xiaokang Yang
Recently, adversarial attack has been applied to visual object tracking to evaluate the robustness of deep trackers.
1 code implementation • CVPR 2021 • Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao, Bing Jiang, Wei Liu
While existing methods combine an input image and these low-level controls for CNN inputs, the corresponding feature representations are not sufficient to convey user intentions, leading to unfaithfully generated content.
1 code implementation • CVPR 2021 • Chongjian Ge, Yibing Song, Yuying Ge, Han Yang, Wei Liu, Ping Luo
To this end, DCTON can be naturally trained in a self-supervised manner following cycle consistency learning.
1 code implementation • CVPR 2021 • Tian Pan, Yibing Song, Tianyu Yang, Wenhao Jiang, Wei Liu
By empowering the temporal robustness of the encoder and modeling the temporal decay of the keys, our VideoMoCo improves MoCo temporally based on contrastive learning.
Ranked #76 on Action Recognition on HMDB-51
1 code implementation • 9 Mar 2021 • Gege Qi, Lijun Gong, Yibing Song, Kai Ma, Yefeng Zheng
However, a threat to these systems arises that adversarial attacks make CNNs vulnerable.
2 code implementations • CVPR 2021 • Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo
A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model.
Ranked #1 on Virtual Try-on on MPV
no code implementations • ICLR 2021 • Gege Qi, Lijun Gong, Yibing Song, Kai Ma, Yefeng Zheng
We further analyze the KL-divergence of the proposed loss function and find that the loss stabilization term makes the perturbations updated towards a fixed objective spot while deviating from the ground truth.
1 code implementation • ECCV 2020 • Yinglong Wang, Yibing Song, Chao Ma, Bing Zeng
Single image deraining regards an input image as a fusion of a background image, a transmission map, rain streaks, and atmosphere light.
1 code implementation • 22 Jul 2020 • Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Wei Liu, Houqiang Li
The advancement of visual tracking has continuously been brought by deep learning models.
2 code implementations • ECCV 2020 • Shuai Jia, Chao Ma, Yibing Song, Xiaokang Yang
On one hand, we add the temporal perturbations into the original video sequences as adversarial examples to greatly degrade the tracking performance.
1 code implementation • ECCV 2020 • Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, Chao Yang
We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively.
1 code implementation • 25 Oct 2019 • Yajing Chen, Fanzi Wu, Zeyu Wang, Yibing Song, Yonggen Ling, Linchao Bao
The displacement map and the coarse model are used to render a final detailed face, which again can be compared with the original input image to serve as a photometric loss for the second stage.
1 code implementation • 23 Jul 2019 • Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Houqiang Li
In the distillation process, we propose a fidelity loss to enable the student network to maintain the representation capability of the teacher network.
1 code implementation • CVPR 2019 • Fanzi Wu, Linchao Bao, Yajing Chen, Yonggen Ling, Yibing Song, Songnan Li, King Ngi Ngan, Wei Liu
The main ingredient of the view alignment loss is a differentiable dense optical flow estimator that can backpropagate the alignment errors between an input view and a synthetic rendering from another input view, which is projected to the target view through the 3D shape to be inferred.
1 code implementation • CVPR 2019 • Ning Wang, Yibing Song, Chao Ma, Wengang Zhou, Wei Liu, Houqiang Li
We propose an unsupervised visual tracking method in this paper.
no code implementations • 22 Nov 2018 • Yibing Song, Jiawei Zhang, Lijun Gong, Shengfeng He, Linchao Bao, Jinshan Pan, Qingxiong Yang, Ming-Hsuan Yang
We first propose a facial component guided deep Convolutional Neural Network (CNN) to restore a coarse face image, which is denoted as the base image where the facial component is automatically generated from the input face image.
no code implementations • NeurIPS 2018 • Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming-Hsuan Yang
Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data.
no code implementations • 27 Sep 2018 • Wenxi Liu, Yibing Song, Dengsheng Chen, Shengfeng He, Yuanlong Yu, Tao Yan, Gerhard P. Hancke, Rynson W. H. Lau
In addition, we also propose a gated fusion scheme to control how the variations captured by the deformable convolution affect the original appearance.
no code implementations • ECCV 2018 • Jianbo Jiao, Ying Cao, Yibing Song, Rynson Lau
Monocular depth estimation benefits greatly from learning based techniques.
1 code implementation • CVPR 2018 • Jiawei Zhang, Jinshan Pan, Jimmy Ren, Yibing Song, Linchao Bao, Rynson W. H. Lau, Ming-Hsuan Yang
The proposed network is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN).
Ranked #10 on Deblurring on RealBlur-R (trained on GoPro) (SSIM (sRGB) metric)
no code implementations • CVPR 2018 • Xin Yang, Ke Xu, Yibing Song, Qiang Zhang, Xiaopeng Wei, Rynson Lau
Given an input LDR image, we first reconstruct the missing details in the HDR domain.
no code implementations • CVPR 2018 • Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, WangMeng Zuo, Chunhua Shen, Rynson Lau, Ming-Hsuan Yang
To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes.
no code implementations • 28 Aug 2017 • Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, Ming-Hsuan Yang
We address the problem of transferring the style of a headshot photo to face images.
no code implementations • 1 Aug 2017 • Yibing Song, Jiawei Zhang, Linchao Bao, Qingxiong Yang
Exemplar-based face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos.
no code implementations • ICCV 2017 • Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson Lau, Ming-Hsuan Yang
Our method integrates feature extraction, response map generation as well as model update into the neural networks for an end-to-end training.
no code implementations • 1 Aug 2017 • Yibing Song, Jiawei Zhang, Shengfeng He, Linchao Bao, Qingxiong Yang
We propose a two-stage method for face hallucination.