Search Results for author: Yibing Song

Found 58 papers, 40 papers with code

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

1 code implementation • 18 Mar 2024 • Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You

Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency.

Semantic Segmentation Video Recognition

Paper
Code

A Causal Inspired Early-Branching Structure for Domain Generalization

1 code implementation • 13 Mar 2024 • Liang Chen, Yong Zhang, Yibing Song, Zhen Zhang, Lingqiao Liu

By d-separation, we observe that the causal feature can be further characterized by being independent of the domain conditioned on the object, and we propose the following two strategies as complements for the basic framework.

Domain Generalization

Paper
Code

HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation

no code implementations • 12 Dec 2023 • Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen

The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction.

Paper
Add Code

Advancing Vision Transformers with Group-Mix Attention

1 code implementation • 26 Nov 2023 • Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo

The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value.

Image Classification object-detection +2

105

Paper
Code

InstructDET: Diversifying Referring Object Detection with Generalized Instructions

1 code implementation • 8 Oct 2023 • Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song

In order to encompass common detection expressions, we involve emerging vision-language model (VLM) and large language model (LLM) to generate instructions guided by text prompts and object bbxs, as the generalizations of foundation models are effective to produce human-like expressions (e. g., describing object property, category, and relationship).

Language Modelling Large Language Model +4

Paper
Code

Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

no code implementations • 25 Sep 2023 • Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu

This work aims to improve unsupervised audio-visual pre-training.

Contrastive Learning Data Augmentation

Paper
Add Code

Domain Generalization via Rationale Invariance

1 code implementation • ICCV 2023 • Liang Chen, Yong Zhang, Yibing Song, Anton Van Den Hengel, Lingqiao Liu

Specifically, we propose treating the element-wise contributions to the final results as the rationale for making a decision and representing the rationale for each sample as a matrix.

Decision Making Domain Generalization

Paper
Code

Advancing Visual Grounding with Scene Knowledge: Benchmark and Method

1 code implementation • CVPR 2023 • Zhihong Chen, Ruifei Zhang, Yibing Song, Xiang Wan, Guanbin Li

Therefore, in this paper, we propose a novel benchmark of \underline{S}cene \underline{K}nowledge-guided \underline{V}isual \underline{G}rounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge.

Image-text matching Text Matching +1

Paper
Code

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

1 code implementation • ICCV 2023 • Zunnan Xu, Zhihong Chen, Yong Zhang, Yibing Song, Xiang Wan, Guanbin Li

Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities.

Ranked #2 on Referring Expression Segmentation on RefCOCO

Decoder Image Segmentation +3

Paper
Code

Evolving Semantic Prototype Improves Generative Zero-Shot Learning

no code implementations • 12 Jun 2023 • Shiming Chen, Wenjin Hou, Ziming Hong, Xiaohan Ding, Yibing Song, Xinge You, Tongliang Liu, Kun Zhang

After alignment, synthesized sample features from unseen classes are closer to the real sample features and benefit DSP to improve existing generative ZSL methods by 8. 5\%, 8. 0\%, and 9. 7\% on the standard CUB, SUN AWA2 datasets, the significant performance improvement indicates that evolving semantic prototype explores a virgin field in ZSL.

Zero-Shot Learning

Paper
Add Code

Efficient Video Action Detection with Token Dropout and Context Refinement

2 code implementations • ICCV 2023 • Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, LiMin Wang

Our EVAD consists of two specialized designs for video action detection.

Action Detection Decoder

1,217

Paper
Code

Improved Test-Time Adaptation for Domain Generalization

1 code implementation • CVPR 2023 • Liang Chen, Yong Zhang, Yibing Song, Ying Shan, Lingqiao Liu

Generally, a TTT strategy hinges its performance on two main factors: selecting an appropriate auxiliary TTT task for updating and identifying reliable parameters to update during the test phase.

Domain Generalization Test-time Adaptation

Paper
Code

Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning

no code implementations • 30 Mar 2023 • Chongjian Ge, Jiangliu Wang, Zhan Tong, Shoufa Chen, Yibing Song, Ping Luo

We evaluate our soft neighbor contrastive learning method (SNCLR) on standard visual recognition benchmarks, including image classification, object detection, and instance segmentation.

Contrastive Learning Image Classification +6

Paper
Add Code

CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection

no code implementations • 28 Mar 2023 • Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, LiMin Wang

Existing studies model each actor and scene relation to improve action recognition.

Action Detection Action Recognition +2

Paper
Add Code

Human MotionFormer: Transferring Human Motions with Vision Transformers

1 code implementation • 22 Feb 2023 • Hongyu Liu, Xintong Han, ChengBin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yibing Song, Jia Xu, Qifeng Chen

In this paper, we propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching, respectively.

Decoder Motion Synthesis

Paper
Code

Both Diverse and Realism Matter: Physical Attribute and Style Alignment for Rainy Image Generation

no code implementations • ICCV 2023 • Changfeng Yu, Shiming Chen, Yi Chang, Yibing Song, Luxin Yan

To solve this dilemma, we propose a physical alignment and controllable generation network (PCGNet) for diverse and realistic rain generation.

Attribute Image Generation +1

Paper
Add Code

Image Inpainting via Iteratively Decoupled Probabilistic Modeling

2 code implementations • 6 Dec 2022 • Wenbo Li, Xin Yu, Kun Zhou, Yibing Song, Zhe Lin, Jiaya Jia

To achieve high-quality results with low computational cost, we present a novel pixel spread model (PSM) that iteratively employs decoupled probabilistic modeling, combining the optimization efficiency of GANs with the prediction tractability of probabilistic models.

Denoising Image Inpainting

Paper
Code

Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

1 code implementation • CVPR 2023 • Hongyu Liu, Yibing Song, Qifeng Chen

In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$.

Contrastive Learning

Paper
Code

DiffusionDet: Diffusion Model for Object Detection

3 code implementations • ICCV 2023 • Shoufa Chen, Peize Sun, Yibing Song, Ping Luo

We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes.

Denoising Object +2

1,995

Paper
Code

One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations

1 code implementation • 14 Oct 2022 • Yiming Zhu, Hongyu Liu, Yibing Song, Ziyang Yuan, Xintong Han, Chun Yuan, Qifeng Chen, Jue Wang

Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations.

Attribute Image Manipulation

Paper
Code

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

2 code implementations • 26 May 2022 • Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, Ping Luo

To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently.

Action Recognition Video Recognition

291

Paper
Code

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

4 code implementations • 23 Mar 2022 • Zhan Tong, Yibing Song, Jue Wang, LiMin Wang

Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.

Ranked #5 on Self-Supervised Action Recognition on HMDB51

4k Action Classification +3

125,725

Paper
Code

Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection

1 code implementation • CVPR 2022 • Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, Jue Wang

Following this principle, we propose to enrich the "diversity" of forgeries by synthesizing augmented forgeries with a pool of forgery configurations and strengthen the "sensitivity" to the forgeries by enforcing the model to predict the forgery configurations.

DeepFake Detection Face Swapping +1

126

Paper
Code

Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations

1 code implementation • 16 Feb 2022 • Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie

Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images.

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Efficient ViTs

156

Paper
Code

DynaMixer: A Vision MLP Architecture with Dynamic Mixing

2 code implementations • 28 Jan 2022 • Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu

In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models.

Image Classification

161

Paper
Code

MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning

no code implementations • 13 Jan 2022 • Yuying Ge, Yibing Song, Ruimao Zhang, Ping Luo

Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person.

Meta-Learning

Paper
Add Code

TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

1 code implementation • 16 Dec 2021 • Shiming Chen, Ziming Hong, Wenjin Hou, Guo-Sen Xie, Yibing Song, Jian Zhao, Xinge You, Shuicheng Yan, Ling Shao

Analogously, VAT uses the similar feature augmentation encoder to refine the visual features, which are further applied in visual$\rightarrow$attribute decoder to learn visual-based attribute features.

Attribute Decoder +1

Paper
Code

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

1 code implementation • NeurIPS 2021 • Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo

Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.

Image Classification object-detection +3

116

Paper
Code

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

1 code implementation • 11 Oct 2021 • Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo

Image Classification object-detection +3

116

Paper
Code

EViT: Expediting Vision Transformers via Token Reorganizations

1 code implementation • ICLR 2022 • Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie

156

Paper
Code

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

1 code implementation • CVPR 2021 • Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao

To this end, we propose spatially probabilistic diversity normalization (SPDNorm) inside the modulation to model the probability of generating a pixel conditioned on the context information.

Image Inpainting Image Restoration

127

Paper
Code

ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows

1 code implementation • CVPR 2021 • Jie An, Siyu Huang, Yibing Song, Dejing Dou, Wei Liu, Jiebo Luo

The forward inference projects input images into deep features, while the backward inference remaps deep features back to input images in a lossless and unbiased way.

Style Transfer

149

Paper
Code

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

1 code implementation • CVPR 2021 • Shuai Jia, Yibing Song, Chao Ma, Xiaokang Yang

Recently, adversarial attack has been applied to visual object tracking to evaluate the robustness of deep trackers.

Adversarial Attack Image Classification +3

Paper
Code

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

1 code implementation • CVPR 2021 • Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao, Bing Jiang, Wei Liu

While existing methods combine an input image and these low-level controls for CNN inputs, the corresponding feature representations are not sufficient to convey user intentions, leading to unfaithfully generated content.

Decoder Texture Synthesis

Paper
Code

Disentangled Cycle Consistency for Highly-realistic Virtual Try-On

1 code implementation • CVPR 2021 • Chongjian Ge, Yibing Song, Yuying Ge, Han Yang, Wei Liu, Ping Luo

To this end, DCTON can be naturally trained in a self-supervised manner following cycle consistency learning.

Virtual Try-on

104

Paper
Code

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

1 code implementation • CVPR 2021 • Tian Pan, Yibing Song, Tianyu Yang, Wenhao Jiang, Wei Liu

By empowering the temporal robustness of the encoder and modeling the temporal decay of the keys, our VideoMoCo improves MoCo temporally based on contrastive learning.

Ranked #76 on Action Recognition on HMDB-51

Action Recognition Contrastive Learning +1

140

Paper
Code

Stabilized Medical Image Attacks

1 code implementation • 9 Mar 2021 • Gege Qi, Lijun Gong, Yibing Song, Kai Ma, Yefeng Zheng

However, a threat to these systems arises that adversarial attacks make CNNs vulnerable.

Adversarial Attack Medical Diagnosis

Paper
Code

Parser-Free Virtual Try-on via Distilling Appearance Flows

2 code implementations • CVPR 2021 • Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo

A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model.

Ranked #1 on Virtual Try-on on MPV

Human Parsing Knowledge Distillation +1

523

Paper
Code

Stabilized Medical Attacks

no code implementations • ICLR 2021 • Gege Qi, Lijun Gong, Yibing Song, Kai Ma, Yefeng Zheng

We further analyze the KL-divergence of the proposed loss function and find that the loss stabilization term makes the perturbations updated towards a fixed objective spot while deviating from the ground truth.

Adversarial Attack Medical Diagnosis

Paper
Add Code

Rethinking Image Deraining via Rain Streaks and Vapors

1 code implementation • ECCV 2020 • Yinglong Wang, Yibing Song, Chao Ma, Bing Zeng

Single image deraining regards an input image as a fusion of a background image, a transmission map, rain streaks, and atmosphere light.

Image Generation Image Restoration +1

Paper
Code

Unsupervised Deep Representation Learning for Real-Time Tracking

1 code implementation • 22 Jul 2020 • Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Wei Liu, Houqiang Li

The advancement of visual tracking has continuously been brought by deep learning models.

Representation Learning Visual Tracking

158

Paper
Code

Robust Tracking against Adversarial Attacks

2 code implementations • ECCV 2020 • Shuai Jia, Chao Ma, Yibing Song, Xiaokang Yang

On one hand, we add the temporal perturbations into the original video sequences as adversarial examples to greatly degrade the tracking performance.

Adversarial Attack

Paper
Code

Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations

1 code implementation • ECCV 2020 • Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, Chao Yang

We use CNN features from the deep and shallow layers of the encoder to represent structures and textures of an input image, respectively.

Decoder Image Inpainting

365

Paper
Code

Self-supervised Learning of Detailed 3D Face Reconstruction

1 code implementation • 25 Oct 2019 • Yajing Chen, Fanzi Wu, Zeyu Wang, Yibing Song, Yonggen Ling, Linchao Bao

The displacement map and the coarse model are used to render a final detailed face, which again can be compared with the original input image to serve as a photometric loss for the second stage.

3D Face Reconstruction Face Alignment +1

Paper
Code

Real-Time Correlation Tracking via Joint Model Compression and Transfer

1 code implementation • 23 Jul 2019 • Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Houqiang Li

In the distillation process, we propose a fidelity loss to enable the student network to maintain the representation capability of the teacher network.

Computational Efficiency Image Classification +4

Paper
Code

MVF-Net: Multi-View 3D Face Morphable Model Regression

1 code implementation • CVPR 2019 • Fanzi Wu, Linchao Bao, Yajing Chen, Yonggen Ling, Yibing Song, Songnan Li, King Ngi Ngan, Wei Liu

The main ingredient of the view alignment loss is a differentiable dense optical flow estimator that can backpropagate the alignment errors between an input view and a synthetic rendering from another input view, which is projected to the target view through the 3D shape to be inferred.

Optical Flow Estimation regression

157

Paper
Code

Unsupervised Deep Tracking

1 code implementation • CVPR 2019 • Ning Wang, Yibing Song, Chao Ma, Wengang Zhou, Wei Liu, Houqiang Li

We propose an unsupervised visual tracking method in this paper.

Visual Tracking

158

Paper
Code

Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement

no code implementations • 22 Nov 2018 • Yibing Song, Jiawei Zhang, Lijun Gong, Shengfeng He, Linchao Bao, Jinshan Pan, Qingxiong Yang, Ming-Hsuan Yang

We first propose a facial component guided deep Convolutional Neural Network (CNN) to restore a coarse face image, which is denoted as the base image where the facial component is automatically generated from the input face image.

Deblurring Face Hallucination +2

Paper
Add Code

Deep Attentive Tracking via Reciprocative Learning

no code implementations • NeurIPS 2018 • Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming-Hsuan Yang

Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data.

Visual Tracking

Paper
Add Code

Deformable Object Tracking with Gated Fusion

no code implementations • 27 Sep 2018 • Wenxi Liu, Yibing Song, Dengsheng Chen, Shengfeng He, Yuanlong Yu, Tao Yan, Gerhard P. Hancke, Rynson W. H. Lau

In addition, we also propose a gated fusion scheme to control how the variations captured by the deformable convolution affect the original appearance.

Object Object Tracking

Paper
Add Code

Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss

no code implementations • ECCV 2018 • Jianbo Jiao, Ying Cao, Yibing Song, Rynson Lau

Monocular depth estimation benefits greatly from learning based techniques.

Monocular Depth Estimation

Paper
Add Code

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

1 code implementation • CVPR 2018 • Jiawei Zhang, Jinshan Pan, Jimmy Ren, Yibing Song, Linchao Bao, Rynson W. H. Lau, Ming-Hsuan Yang

The proposed network is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN).

Ranked #10 on Deblurring on RealBlur-R (trained on GoPro) (SSIM (sRGB) metric)

Deblurring

Paper
Code

Image Correction via Deep Reciprocating HDR Transformation

no code implementations • CVPR 2018 • Xin Yang, Ke Xu, Yibing Song, Qiang Zhang, Xiaopeng Wei, Rynson Lau

Given an input LDR image, we first reconstruct the missing details in the HDR domain.

HDR Reconstruction Tone Mapping

Paper
Add Code

VITAL: VIsual Tracking via Adversarial Learning

no code implementations • CVPR 2018 • Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, WangMeng Zuo, Chunhua Shen, Rynson Lau, Ming-Hsuan Yang

To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes.

General Classification Visual Tracking

Paper
Add Code

Stylizing Face Images via Multiple Exemplars

no code implementations • 28 Aug 2017 • Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, Ming-Hsuan Yang

We address the problem of transferring the style of a headshot photo to face images.

Paper
Add Code

Fast Preprocessing for Robust Face Sketch Synthesis

no code implementations • 1 Aug 2017 • Yibing Song, Jiawei Zhang, Linchao Bao, Qingxiong Yang

Exemplar-based face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos.

Face Sketch Synthesis

Paper
Add Code

CREST: Convolutional Residual Learning for Visual Tracking

no code implementations • ICCV 2017 • Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson Lau, Ming-Hsuan Yang

Our method integrates feature extraction, response map generation as well as model update into the neural networks for an end-to-end training.

Visual Tracking

Paper
Add Code

Learning to Hallucinate Face Images via Component Generation and Enhancement

no code implementations • 1 Aug 2017 • Yibing Song, Jiawei Zhang, Shengfeng He, Linchao Bao, Qingxiong Yang

We propose a two-stage method for face hallucination.

Face Hallucination Hallucination

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.