Search Results for author: Zicheng Liu

Found 127 papers, 61 papers with code

LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory

no code implementations • 17 Apr 2024 • Zicheng Liu, Li Wang, Siyuan Li, Zedong Wang, Haitao Lin, Stan Z. Li

Transformer models have been successful in various sequence processing tasks, but the self-attention mechanism's computational cost limits its practicality for long sequences.

Computational Efficiency Language Modelling +1

Paper
Add Code

A robust assessment for invariant representations

no code implementations • 7 Apr 2024 • Wenlu Tang, Zicheng Liu

The performance of machine learning models can be impacted by changes in data over time.

Data Augmentation

Paper
Add Code

Advances of Deep Learning in Protein Science: A Comprehensive Survey

no code implementations • 8 Mar 2024 • Bozhen Hu, Cheng Tan, Lirong Wu, Jiangbin Zheng, Jun Xia, Zhangyang Gao, Zicheng Liu, Fandi Wu, Guijun Zhang, Stan Z. Li

Protein representation learning plays a crucial role in understanding the structure and function of proteins, which are essential biomolecules involved in various biological processes.

Drug Discovery Protein Function Prediction +2

Paper
Add Code

Switch EMA: A Free Lunch for Better Flatness and Sharpness

2 code implementations • 14 Feb 2024 • Siyuan Li, Zicheng Liu, Juanxi Tian, Ge Wang, Zedong Wang, Weiyang Jin, Di wu, Cheng Tan, Tao Lin, Yang Liu, Baigui Sun, Stan Z. Li

Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization.

Attribute Image Classification +7

567

Paper
Code

PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction

1 code implementation • 13 Feb 2024 • Lirong Wu, Yufei Huang, Cheng Tan, Zhangyang Gao, Bozhen Hu, Haitao Lin, Zicheng Liu, Stan Z. Li

Compound-Protein Interaction (CPI) prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.

Drug Discovery

Paper
Code

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

no code implementations • 30 Jan 2024 • Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan

To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes.

Vector Graphics

Paper
Add Code

A Unified Gaussian Process for Branching and Nested Hyperparameter Optimization

no code implementations • 19 Jan 2024 • Jiazhao Zhang, Ying Hung, Chung-Ching Lin, Zicheng Liu

To capture the conditional dependence between branching and nested parameters, a unified Bayesian optimization framework is proposed.

Bayesian Optimization Hyperparameter Optimization

Paper
Add Code

Bring Metric Functions into Diffusion Models

no code implementations • 4 Jan 2024 • Jie An, Zhengyuan Yang, JianFeng Wang, Linjie Li, Zicheng Liu, Lijuan Wang, Jiebo Luo

The first module, similar to a standard DDPM, learns to predict the added noise and is unaffected by the metric function.

Denoising

Paper
Add Code

Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

1 code implementation • 31 Dec 2023 • Siyuan Li, Luyuan Zhang, Zedong Wang, Di wu, Lirong Wu, Zicheng Liu, Jun Xia, Cheng Tan, Yang Liu, Baigui Sun, Stan Z. Li

As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data.

Representation Learning Self-Supervised Learning

234

Paper
Code

Segment and Caption Anything

1 code implementation • 1 Dec 2023 • Xiaoke Huang, JianFeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu

We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability to generate regional captions.

Caption Generation object-detection +2

148

Paper
Code

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

no code implementations • 29 Nov 2023 • Chaoyi Zhang, Kevin Lin, Zhengyuan Yang, JianFeng Wang, Linjie Li, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We present MM-Narrator, a novel system leveraging GPT-4 with multimodal in-context learning for the generation of audio descriptions (AD).

In-Context Learning Text Generation

Paper
Add Code

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

2 code implementations • 13 Nov 2023 • An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, JianFeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang

We first benchmark MM-Navigator on our collected iOS screen dataset.

Action Localization

106

Paper
Code

MM-VID: Advancing Video Understanding with GPT-4V(ision)

no code implementations • 30 Oct 2023 • Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, JianFeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.

Video Understanding

Paper
Add Code

On the Hidden Waves of Image

no code implementations • 19 Oct 2023 • Yinpeng Chen, Dongdong Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

We term this phenomenon hidden waves, as it reveals that, although the speeds of the set of wave equations and autoregressive coefficient matrices are latent, they are both learnable and shared across images.

Paper
Add Code

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

no code implementations • 12 Oct 2023 • Zhengyuan Yang, JianFeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation.

Paper
Add Code

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

no code implementations • 11 Oct 2023 • Jie An, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo

We hope our proposed framework, benchmark, and LMM evaluation could help establish the intriguing interleaved image-text generation task.

Question Answering Text Generation

Paper
Add Code

SemiReward: A General Reward Model for Semi-supervised Learning

1 code implementation • 4 Oct 2023 • Siyuan Li, Weiyang Jin, Zedong Wang, Fang Wu, Zicheng Liu, Cheng Tan, Stan Z. Li

The main challenge is how to distinguish high-quality pseudo labels against the confirmation bias.

Ranked #1 on Semi-Supervised Image Classification on CIFAR-100, 400 Labels

Few-Shot Image Classification Pseudo Label +4

Paper
Code

Completing Visual Objects via Bridging Generation and Segmentation

no code implementations • 1 Oct 2023 • Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu

This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components.

Image Generation Object +1

Paper
Add Code

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

1 code implementation • 29 Sep 2023 • Zhengyuan Yang, Linjie Li, Kevin Lin, JianFeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models.

182

Paper
Code

ORES: Open-vocabulary Responsible Visual Synthesis

1 code implementation • 26 Aug 2023 • Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan

In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content.

Image Generation Language Modelling

Paper
Code

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

1 code implementation • 4 Aug 2023 • Weihao Yu, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang

Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.

Math Zero-Shot Visual Question Answring

174

Paper
Code

Does Full Waveform Inversion Benefit from Big Data?

no code implementations • 28 Jul 2023 • Peng Jin, Yinan Feng, Shihang Feng, Hanchen Wang, Yinpeng Chen, Benjamin Consolvo, Zicheng Liu, Youzuo Lin

This paper investigates the impact of big data on deep learning models for full waveform inversion (FWI).

Paper
Add Code

Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models

no code implementations • 27 Jul 2023 • Xin Yuan, Linjie Li, JianFeng Wang, Zhengyuan Yang, Kevin Lin, Zicheng Liu, Lijuan Wang

In this paper, we study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.

Denoising

Paper
Add Code

DisCo: Disentangled Control for Realistic Human Dance Generation

1 code implementation • 30 Jun 2023 • Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

In this paper, we depart from the traditional paradigm of human motion transfer and emphasize two additional critical attributes for the synthesis of human dance content in social media contexts: (i) Generalizability: the model should be able to generalize beyond generic human viewpoints as well as unseen human subjects, backgrounds, and poses; (ii) Compositionality: it should allow for the seamless composition of seen/unseen subjects, backgrounds, and poses from different sources.

Attribute

901

Paper
Code

OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning

2 code implementations • NeurIPS 2023 • Cheng Tan, Siyuan Li, Zhangyang Gao, Wenfei Guan, Zedong Wang, Zicheng Liu, Lirong Wu, Stan Z. Li

Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner.

Weather Forecasting

566

Paper
Code

RefineVIS: Video Instance Segmentation with Temporal Attention Refinement

no code implementations • 7 Jun 2023 • Andre Abrantes, Jiang Wang, Peng Chu, Quanzeng You, Zicheng Liu

We introduce a novel framework called RefineVIS for Video Instance Segmentation (VIS) that achieves good object association between frames and accurate segmentation masks by iteratively refining the representations using sequence context.

Ranked #3 on Video Instance Segmentation on YouTube-VIS 2021 (using extra training data)

Contrastive Learning Denoising +4

Paper
Add Code

PaintSeg: Training-free Segmentation via Painting

1 code implementation • 30 May 2023 • Xiang Li, Chung-Ching Lin, Yinpeng Chen, Zicheng Liu, Jinglu Wang, Bhiksha Raj

The paper introduces PaintSeg, a new unsupervised method for segmenting objects without any training.

Referring Image Matting (Prompt-based) Segmentation +1

Paper
Code

Image as First-Order Norm+Linear Autoregression: Unveiling Mathematical Invariance

no code implementations • 25 May 2023 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

This paper introduces a novel mathematical property applicable to diverse images, referred to as FINOLA (First-Order Norm+Linear Autoregressive).

Image Classification Image Reconstruction +3

Paper
Add Code

Conformal Inference for Invariant Risk Minimization

no code implementations • 22 May 2023 • Wenlu Tang, Zicheng Liu

The application of machine learning models can be significantly impeded by the occurrence of distributional shifts, as the assumption of homogeneity between the population of training and testing samples in machine learning and statistics may not be feasible in practical situations.

Paper
Add Code

Neural Voting Field for Camera-Space 3D Hand Pose Estimation

no code implementations • CVPR 2023 • Lin Huang, Chung-Ching Lin, Kevin Lin, Lin Liang, Lijuan Wang, Junsong Yuan, Zicheng Liu

We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.

Ranked #4 on 3D Hand Pose Estimation on HO-3D

3D Hand Pose Estimation regression

Paper
Add Code

Simplifying Full Waveform Inversion via Domain-Independent Self-Supervised Learning

no code implementations • 27 Apr 2023 • Yinan Feng, Yinpeng Chen, Peng Jin, Shihang Feng, Zicheng Liu, Youzuo Lin

Geophysics has witnessed success in applying deep learning to one of its core problems: full waveform inversion (FWI) to predict subsurface velocity maps from seismic data.

Geophysics Image-to-Image Translation +1

Paper
Add Code

Adaptive Human Matting for Dynamic Videos

1 code implementation • CVPR 2023 • Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu

The most recent efforts in video matting have focused on eliminating trimap dependency since trimap annotations are expensive and trimap-based methods are less adaptable for real-time applications.

Image Matting Video Matting

Paper
Code

Binary Latent Diffusion

no code implementations • CVPR 2023 • Ze Wang, Jiang Wang, Zicheng Liu, Qiang Qiu

In this paper, we show that a binary latent space can be explored for compact yet expressive image representations.

Image Generation Quantization +1

Paper
Add Code

Towards Reasonable Budget Allocation in Untargeted Graph Structure Attacks via Gradient Debias

1 code implementation • 29 Mar 2023 • Zihan Liu, Yun Luo, Lirong Wu, Zicheng Liu, Stan Z. Li

It has become cognitive inertia to employ cross-entropy loss function in classification related tasks.

Paper
Code

Equivariant Similarity for Vision-Language Foundation Models

1 code implementation • ICCV 2023 • Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

Unlike the existing image-text similarity objective which only categorizes matched pairs as similar and unmatched pairs as dissimilar, equivariance also requires similarity to vary faithfully according to the semantic changes.

Ranked #7 on Visual Reasoning on Winoground

Retrieval Text Retrieval +2

119

Paper
Code

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

no code implementations • 22 Mar 2023 • Shengming Yin, Chenfei Wu, Huan Yang, JianFeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan

In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation.

Video Generation

Paper
Add Code

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

1 code implementation • 20 Mar 2023 • Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

We propose MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action.

Ranked #22 on Visual Question Answering on MM-Vet

Multimodal Reasoning Visual Question Answering

904

Paper
Code

Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations

1 code implementation • 27 Feb 2023 • Ziyu Jiang, Yinpeng Chen, Mengchen Liu, Dongdong Chen, Xiyang Dai, Lu Yuan, Zicheng Liu, Zhangyang Wang

This motivates us to shift the paradigm from combining loss at the end, to choosing the proper learning method per network layer.

Contrastive Learning Few-Shot Learning

Paper
Code

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

no code implementations • 21 Feb 2023 • Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, JianFeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

3D photography renders a static image into a video with appealing 3D visual effects.

Ranked #1 on Image Outpainting on MSCOCO

Image Outpainting Monocular Depth Estimation

Paper
Add Code

Energy-Inspired Self-Supervised Pretraining for Vision Models

no code implementations • 2 Feb 2023 • Ze Wang, Jiang Wang, Zicheng Liu, Qiang Qiu

In the proposed framework, we model energy estimation and data restoration as the forward and backward passes of a single network without any auxiliary components, e. g., an extra decoder.

Colorization Denoising +2

Paper
Add Code

RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design

1 code implementation • 25 Jan 2023 • Cheng Tan, Yijie Zhang, Zhangyang Gao, Bozhen Hu, Siyuan Li, Zicheng Liu, Stan Z. Li

We crafted a large, well-curated benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure.

Contrastive Learning Protein Design +2

Paper
Code

Federated Learning for Inference at Anytime and Anywhere

no code implementations • 8 Dec 2022 • Zicheng Liu, Da Li, Javier Fernandez-Marques, Stefanos Laskaridis, Yan Gao, Łukasz Dudziak, Stan Z. Li, Shell Xu Hu, Timothy Hospedales

Federated learning has been predominantly concerned with collaborative training of deep networks from scratch, and especially the many challenges that arise, such as communication cost, robustness to heterogeneous data, and support for diverse device capabilities.

Federated Learning

Paper
Add Code

GRiT: A Generative Region-to-text Transformer for Object Understanding

1 code implementation • 1 Dec 2022 • Jialian Wu, JianFeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, Lijuan Wang

Specifically, GRiT consists of a visual encoder to extract image features, a foreground object extractor to localize objects, and a text decoder to generate open-set object descriptions.

Ranked #2 on Dense Captioning on Visual Genome

Dense Captioning Descriptive +3

271

Paper
Code

MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction

no code implementations • 24 Nov 2022 • Kevin Lin, Chung-Ching Lin, Lin Liang, Zicheng Liu, Lijuan Wang

Traditional methods of reconstructing 3D human pose and mesh from single images rely on paired image-mesh datasets, which can be difficult and expensive to obtain.

Ranked #14 on 3D Human Pose Estimation on 3DPW

3D Human Pose Estimation Hand Pose Estimation

Paper
Add Code

ReCo: Region-Controlled Text-to-Image Generation

no code implementations • CVPR 2023 • Zhengyuan Yang, JianFeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

Human evaluation on PaintSkill shows that ReCo is +19. 28% and +17. 21% more accurate in generating images with correct object count and spatial relationship than the T2I model.

Ranked #2 on Conditional Text-to-Image Synthesis on COCO-MIG

Conditional Text-to-Image Synthesis Position

Paper
Add Code

Self-Supervised Learning based on Heat Equation

no code implementations • 23 Nov 2022 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

When transferring to object detection with frozen backbone, QB-Heat outperforms MoCo-v2 and supervised pre-training on ImageNet by 7. 9 and 4. 5 AP respectively.

Image Classification object-detection +2

Paper
Add Code

Exploring Discrete Diffusion Models for Image Captioning

1 code implementation • 21 Nov 2022 • Zixin Zhu, Yixuan Wei, JianFeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one.

Image Captioning Image Generation

Paper
Code

MogaNet: Multi-order Gated Aggregation Network

6 code implementations • 7 Nov 2022 • Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di wu, ZhiYuan Chen, Jiangbin Zheng, Stan Z. Li

Notably, MogaNet hits 80. 0\% and 87. 8\% accuracy with 5. 2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59\% FLOPs and 17M parameters, respectively.

Ranked #1 on Pose Estimation on COCO val2017

3D Human Pose Estimation Image Classification +6

567

Paper
Code

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

1 code implementation • 17 Oct 2022 • Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao

This paper surveys vision-language pre-training (VLP) methods for multimodal intelligence that have been developed in the last few years.

Few-Shot Learning Image Captioning +11

994

Paper
Code

Teaching Yourself: Graph Self-Distillation on Neighborhood for Node Classification

no code implementations • 5 Oct 2022 • Lirong Wu, Jun Xia, Haitao Lin, Zhangyang Gao, Zicheng Liu, Guojiang Zhao, Stan Z. Li

Despite their great academic success, Multi-Layer Perceptrons (MLPs) remain the primary workhorse for practical industrial applications.

Classification Node Classification

Paper
Add Code

Automated Graph Self-supervised Learning via Multi-teacher Knowledge Distillation

no code implementations • 5 Oct 2022 • Lirong Wu, Yufei Huang, Haitao Lin, Zicheng Liu, Tianyu Fan, Stan Z. Li

Self-supervised learning on graphs has recently achieved remarkable success in graph representation learning.

Graph Representation Learning Knowledge Distillation +1

Paper
Add Code

OpenMixup: A Comprehensive Mixup Benchmark for Visual Classification

1 code implementation • 11 Sep 2022 • Siyuan Li, Zedong Wang, Zicheng Liu, Di wu, Cheng Tan, Weiyang Jin, Stan Z. Li

Data mixing, or mixup, is a data-dependent augmentation technique that has greatly enhanced the generalizability of modern deep neural networks.

Benchmarking Classification +3

567

Paper
Code

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling

1 code implementation • CVPR 2023 • Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Masked visual modeling (MVM) has been recently proven effective for visual pre-training.

Ranked #1 on Video Question Answering on LSMDC-MC

Fill Mask Optical Flow Estimation +10

Paper
Code

Are Gradients on Graph Structure Reliable in Gray-box Attacks?

1 code implementation • 7 Aug 2022 • Zihan Liu, Yun Luo, Lirong Wu, Siyuan Li, Zicheng Liu, Stan Z. Li

These errors arise from rough gradient usage due to the discreteness of the graph structure and from the unreliability in the meta-gradient on the graph structure.

Computational Efficiency

Paper
Code

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

1 code implementation • 20 Jul 2022 • Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, JianFeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos.

Ranked #1 on Image Outpainting on LHQC

Image Outpainting Text-to-Image Generation +1

2,795

Paper
Code

Should All Proposals be Treated Equally in Object Detection?

1 code implementation • 7 Jul 2022 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Pei Yu, Jing Yin, Lu Yuan, Zicheng Liu, Nuno Vasconcelos

We formulate this as a learning problem where the goal is to assign operators to proposals, in the detection head, so that the total computational cost is constrained and the precision is maximized.

Object Object Detection

Paper
Code

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

1 code implementation • NeurIPS 2022 • Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, JianFeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann Lecun, Nanyun Peng, Jianfeng Gao, Lijuan Wang

Vision-language (VL) pre-training has recently received considerable attention.

Ranked #1 on Phrase Grounding on Flickr30k Entities Dev

Described Object Detection Image Captioning +5

124

Paper
Code

Consistent Video Instance Segmentation with Inter-Frame Recurrent Attention

no code implementations • 14 Jun 2022 • Quanzeng You, Jiang Wang, Peng Chu, Andre Abrantes, Zicheng Liu

We propose a consistent end-to-end video instance segmentation framework with Inter-Frame Recurrent Attention to model both the temporal instance consistency for adjacent frames and the global temporal context.

Instance Segmentation Object +3

Paper
Add Code

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

1 code implementation • CVPR 2023 • Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang

In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling (MLM) is used as the common interface for all pre-training and downstream tasks.

Language Modelling Masked Language Modeling +6

Paper
Code

GIT: A Generative Image-to-text Transformer for Vision and Language

2 code implementations • 27 May 2022 • JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.

Ranked #1 on Image Captioning on nocaps-XD near-domain

Image Captioning Image Classification +7

124,593

Paper
Code

Cross-modal Representation Learning for Zero-shot Action Recognition

no code implementations • CVPR 2022 • Chung-Ching Lin, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu

The model design provides a natural mechanism for visual and semantic representations to be learned in a shared knowledge space, whereby it encourages the learned visual embedding to be discriminative and more semantically consistent.

Ranked #3 on Zero-Shot Action Recognition on ActivityNet

Action Recognition Representation Learning +1

Paper
Add Code

An Intriguing Property of Geophysics Inversion

no code implementations • 28 Apr 2022 • Yinan Feng, Yinpeng Chen, Shihang Feng, Peng Jin, Zicheng Liu, Youzuo Lin

In particular, when dealing with the inversion from seismic data to subsurface velocity governed by a wave equation, the integral results of velocity with Gaussian kernels are linearly correlated to the integral of seismic data with sine kernels.

Geophysics

Paper
Add Code

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

8 code implementations • 19 Apr 2022 • Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, Jianfeng Gao

In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks.

Ranked #1 on Object Detection on ELEVATER

Fairness Few-Shot Image Classification +4

1,947

Paper
Code

MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images

no code implementations • 31 Mar 2022 • Xiangjun Gao, Jiaolong Yang, Jongyoo Kim, Sida Peng, Zicheng Liu, Xin Tong

For this task, we propose a simple yet effective method to train a generalizable NeRF with multiview images as conditional input.

Novel View Synthesis

Paper
Add Code

Deep Frequency Filtering for Domain Generalization

no code implementations • CVPR 2023 • Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen

Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge.

Domain Generalization Retrieval

Paper
Add Code

Harnessing Hard Mixed Samples with Decoupled Regularizer

1 code implementation • NeurIPS 2023 • Zicheng Liu, Siyuan Li, Ge Wang, Cheng Tan, Lirong Wu, Stan Z. Li

However, we found that the extra optimizing step may be redundant because label-mismatched mixed samples are informative hard mixed samples for deep models to localize discriminative features.

Data Augmentation

567

Paper
Code

The Overlooked Classifier in Human-Object Interaction Recognition

no code implementations • 10 Mar 2022 • Ying Jin, Yinpeng Chen, Lijuan Wang, JianFeng Wang, Pei Yu, Lin Liang, Jenq-Neng Hwang, Zicheng Liu

Human-Object Interaction (HOI) recognition is challenging due to two factors: (1) significant imbalance across classes and (2) requiring multiple labels per image.

Classification Human-Object Interaction Detection +4

Paper
Add Code

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

no code implementations • 25 Jan 2022 • Peixi Xiong, Quanzeng You, Pei Yu, Zicheng Liu, Ying Wu

As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality representations.

Question Answering Visual Question Answering

Paper
Add Code

The Overlooked Classifier in Human-Object Interaction Recognition

no code implementations • arXiv 2021 • Ying Jin, Yinpeng Chen, Lijuan Wang, JianFeng Wang, Pei Yu, Lin Liang, Jenq-Neng Hwang, Zicheng Liu

Human-Object Interaction (HOI) recognition is challenging due to two factors: (1) significant imbalance across classes and (2) requiring multiple labels per image.

Ranked #1 on Human-Object Interaction Detection on HICO

Classification Human-Object Interaction Detection +4

Paper
Add Code

Lifelong Unsupervised Domain Adaptive Person Re-identification with Coordinated Anti-forgetting and Adaptation

no code implementations • CVPR 2022 • Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-Jun Zha

In this paper, to address more practical scenarios, we propose a new task, Lifelong Unsupervised Domain Adaptive (LUDA) person ReID.

Domain Adaptive Person Re-Identification Knowledge Distillation +4

Paper
Add Code

Improving Vision Transformers for Incremental Learning

no code implementations • 12 Dec 2021 • Pei Yu, Yinpeng Chen, Ying Jin, Zicheng Liu

This paper proposes a working recipe of using Vision Transformer (ViT) in class incremental learning.

Class Incremental Learning Incremental Learning

Paper
Add Code

Injecting Semantic Concepts into End-to-End Image Captioning

1 code implementation • CVPR 2022 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.

Caption Generation Image Captioning

Paper
Code

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

1 code implementation • 8 Dec 2021 • Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang

Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs?

Language Modelling Visual Question Answering (VQA)

Paper
Code

MMPTRACK: Large-scale Densely Annotated Multi-camera Multiple People Tracking Benchmark

no code implementations • 30 Nov 2021 • Xiaotian Han, Quanzeng You, Chunyu Wang, Zhizheng Zhang, Peng Chu, Houdong Hu, Jiang Wang, Zicheng Liu

This dataset provides a more reliable benchmark of multi-camera, multi-object tracking systems in cluttered and crowded environments.

Ranked #2 on Object Tracking on MMPTRACK

Multi-Object Tracking Multiple People Tracking +1

Paper
Add Code

Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup

1 code implementation • 30 Nov 2021 • Siyuan Li, Zicheng Liu, Zedong Wang, Di wu, Zihan Liu, Stan Z. Li

Accordingly, we propose $\eta$-balanced mixup loss for complementary learning of the two sub-objectives.

Ranked #7 on Image Classification on Places205

Data Augmentation Image Classification +2

567

Paper
Code

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

1 code implementation • CVPR 2022 • Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e. g., video question answering).

Caption Generation Question Answering +3

225

Paper
Code

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

1 code implementation • 24 Nov 2021 • Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.

Ranked #20 on Zero-Shot Video Retrieval on DiDeMo

Question Answering Retrieval +5

136

Paper
Code

Scaling Up Vision-Language Pre-training for Image Captioning

no code implementations • CVPR 2022 • Xiaowei Hu, Zhe Gan, JianFeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning.

Ranked #3 on Image Captioning on nocaps-XD entire (using extra training data)

Attribute Image Captioning

Paper
Add Code

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

1 code implementation • 23 Nov 2021 • Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang

On grounded captioning, UniTAB presents a simpler solution with a single output head, and significantly outperforms state of the art in both grounding and captioning evaluations.

Image Captioning Language Modelling +5

Paper
Code

Florence: A New Foundation Model for Computer Vision

1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Ranked #1 on Action Recognition In Videos on Kinetics-600

Action Classification Action Recognition In Videos +12

369

Paper
Code

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

no code implementations • 19 Nov 2021 • JianFeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we propose a single UniFied transfOrmer (UFO), which is capable of processing either unimodal inputs (e. g., image or language) or multimodal inputs (e. g., the concatenation of the image and the question), for vision-language (VL) representation learning.

Image Captioning Image-text matching +9

Paper
Add Code

Physics-guided Loss Functions Improve Deep Learning Performance in Inverse Scattering

no code implementations • 13 Nov 2021 • Zicheng Liu, Mayank Roy, Dilip K. Prasad, Krishna Agarwal

Solving electromagnetic inverse scattering problems (ISPs) is challenging due to the intrinsic nonlinearity, ill-posedness, and expensive computational cost.

Paper
Add Code

An Empirical Study of Training End-to-End Vision-and-Language Transformers

2 code implementations • CVPR 2022 • Zi-Yi Dou, Yichong Xu, Zhe Gan, JianFeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng

Vision-and-language (VL) pre-training has proven to be highly effective on various VL downstream tasks.

Ranked #20 on Cross-Modal Retrieval on COCO 2014 (using extra training data)

Cross-Modal Retrieval Visual Question Answering (VQA) +1

350

Paper
Code

GenURL: A General Framework for Unsupervised Representation Learning

1 code implementation • 27 Oct 2021 • Siyuan Li, Zicheng Liu, Zelin Zang, Di wu, ZhiYuan Chen, Stan Z. Li

For example, dimension reduction methods, t-SNE, and UMAP optimize pair-wise data relationships by preserving the global geometric structure, while self-supervised learning, SimCLR, and BYOL focus on mining the local statistics of instances under specific augmentations.

Contrastive Learning Dimensionality Reduction +4

567

Paper
Code

Unsupervised Learning of Full-Waveform Inversion: Connecting CNN and Partial Differential Equation in a Loop

no code implementations • ICLR 2022 • Peng Jin, Xitong Zhang, Yinpeng Chen, Sharon Xiaolei Huang, Zicheng Liu, Youzuo Lin

In particular, we use finite difference to approximate the forward modeling of PDE as a differentiable operator (from velocity map to seismic data) and model its inversion by CNN (from seismic data to velocity map).

Geophysics

Paper
Add Code

Improving Discriminative Visual Representation Learning via Automatic Mixup

no code implementations • 29 Sep 2021 • Siyuan Li, Zicheng Liu, Di wu, Stan Z. Li

In this paper, we decompose mixup into two sub-tasks of mixup generation and classification and formulate it for discriminative representations as class- and instance-level mixup.

Data Augmentation Representation Learning

Paper
Add Code

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

1 code implementation • 10 Sep 2021 • Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang

To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA.

Ranked #20 on Visual Question Answering (VQA) on OK-VQA (using extra training data)

Image Captioning Question Answering +2

Paper
Code

Mobile-Former: Bridging MobileNet and Transformer

4 code implementations • CVPR 2022 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi Dong, Lu Yuan, Zicheng Liu

This structure leverages the advantages of MobileNet at local processing and transformer at global interaction.

object-detection Object Detection

1,183

Paper
Code

MicroNet: Improving Image Recognition with Extremely Low FLOPs

1 code implementation • ICCV 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e. g. 5M FLOPs on ImageNet classification).

328

Paper
Code

OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning

no code implementations • 8 Aug 2021 • Sheng Liu, Kevin Lin, Lijuan Wang, Junsong Yuan, Zicheng Liu

We introduce the task of open-vocabulary visual instance search (OVIS).

Instance Search Representation Learning

Paper
Add Code

Is Object Detection Necessary for Human-Object Interaction Recognition?

no code implementations • arXiv 2021 • Ying Jin, Yinpeng Chen, Lijuan Wang, JianFeng Wang, Pei Yu, Zicheng Liu, Jenq-Neng Hwang

This paper revisits human-object interaction (HOI) recognition at image level without using supervisions of object location and human pose.

Human-Object Interaction Detection Object +2

Paper
Add Code

Probabilistic Model Distillation for Semantic Correspondence

1 code implementation • CVPR 2021 • Xin Li, Deng-Ping Fan, Fan Yang, Ao Luo, Hong Cheng, Zicheng Liu

We address this problem with the use of a novel Probabilistic Model Distillation (PMD) approach which transfers knowledge learned by a probabilistic teacher model on synthetic data to a static student model with the use of unlabeled real image pairs.

Representation Learning Semantic correspondence

Paper
Code

A data-based comparative review and AI-driven symbolic model for longitudinal dispersion coefficient in natural streams

no code implementations • 17 Jun 2021 • Yifeng Zhao, Zicheng Liu, Pei Zhang, S. A. Galindo-Torres, Stan Z. Li

Whereas implicit ML-driven methods are black-boxes in nature, explicit ML-driven methods have more potential in prediction of LDC.

regression Symbolic Regression

Paper
Add Code

End-to-End Semi-Supervised Object Detection with Soft Teacher

8 code implementations • ICCV 2021 • Mengde Xu, Zheng Zhang, Han Hu, JianFeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu

This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.

Ranked #6 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)

Instance Segmentation object-detection +4

885

Paper
Code

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

1 code implementation • 8 Jun 2021 • Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu

Most existing video-and-language (VidL) research focuses on a single dataset, or multiple datasets of a single task.

Multi-Task Learning Question Answering +5

Paper
Code

Playing Lottery Tickets with Vision and Language

no code implementations • 23 Apr 2021 • Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu

However, we can find "relaxed" winning tickets at 50%-70% sparsity that maintain 99% of the full accuracy.

Question Answering Referring Expression +6

Paper
Add Code

Compressing Visual-linguistic Model via Knowledge Distillation

no code implementations • ICCV 2021 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.

Image Captioning Knowledge Distillation +2

Paper
Add Code

Mesh Graphormer

1 code implementation • ICCV 2021 • Kevin Lin, Lijuan Wang, Zicheng Liu

We present a graph-convolution-reinforced transformer, named Mesh Graphormer, for 3D human pose and mesh reconstruction from a single image.

Ranked #1 on 3D Hand Pose Estimation on FreiHAND

3D Hand Pose Estimation 3D Human Pose Estimation

356

Paper
Code

TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

no code implementations • 1 Apr 2021 • Peng Chu, Jiang Wang, Quanzeng You, Haibin Ling, Zicheng Liu

TransMOT effectively models the interactions of a large number of objects by arranging the trajectories of the tracked objects as a set of sparse weighted graphs, and constructing a spatial graph transformer encoder layer, a temporal transformer encoder layer, and a spatial graph transformer decoder layer based on the graphs.

Ranked #2 on Multi-Object Tracking on 2DMOT15 (using extra training data)

Multi-Object Tracking Multiple Object Tracking +2

Paper
Add Code

Disentanglement-based Cross-Domain Feature Augmentation for Effective Unsupervised Domain Adaptive Person Re-identification

no code implementations • 25 Mar 2021 • Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Quanzeng You, Zicheng Liu, Kecheng Zheng, Zhibo Chen

Each recomposed feature, obtained based on the domain-invariant feature (which enables a reliable inheritance of identity) and an enhancement from a domain specific feature (which enables the approximation of real distributions), is thus an "ideal" augmentation.

Disentanglement Domain Adaptive Person Re-Identification +2

Paper
Add Code

AutoMix: Unveiling the Power of Mixup for Stronger Classifiers

2 code implementations • 24 Mar 2021 • Zicheng Liu, Siyuan Li, Di wu, Zihan Liu, ZhiYuan Chen, Lirong Wu, Stan Z. Li

Specifically, AutoMix reformulates the mixup classification into two sub-tasks (i. e., mixed sample generation and mixup classification) with corresponding sub-networks and solves them in a bi-level optimization framework.

Ranked #8 on Image Classification on Places205

Classification Data Augmentation +3

567

Paper
Code

Revisiting Dynamic Convolution via Matrix Decomposition

1 code implementation • ICLR 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos

It has two limitations: (a) it increases the number of convolutional weights by K-times, and (b) the joint optimization of dynamic attention and static convolution kernels is challenging.

Dimensionality Reduction

129

Paper
Code

Stronger NAS with Weaker Predictors

1 code implementation • NeurIPS 2021 • Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors.

Neural Architecture Search

Paper
Code

SEED: Self-supervised Distillation For Visual Representation

1 code implementation • ICLR 2021 • Zhiyuan Fang, JianFeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu

This paper is concerned with self-supervised learning for small models.

Knowledge Distillation Self-Supervised Learning +1

Paper
Code

Weak NAS Predictor Is All You Need

no code implementations • 1 Jan 2021 • Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

Rather than expecting a single strong predictor to model the whole space, we seek a progressive line of weak predictors that can connect a path to the best architecture, thus greatly simplifying the learning task of each predictor.

Neural Architecture Search

Paper
Add Code

3D Human motion anticipation and classification

no code implementations • 31 Dec 2020 • Emad Barsoum, John Kender, Zicheng Liu

Our model learns to predict multiple future sequences of human poses from the same input sequence.

Action Recognition Classification +6

Paper
Add Code

End-to-End Human Pose and Mesh Reconstruction with Transformers

1 code implementation • CVPR 2021 • Kevin Lin, Lijuan Wang, Zicheng Liu

We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image.

Ranked #4 on 3D Hand Pose Estimation on FreiHAND

3D Absolute Human Pose Estimation 3D Hand Pose Estimation

585

Paper
Code

MiniVLM: A Smaller and Faster Vision-Language Model

no code implementations • 13 Dec 2020 • JianFeng Wang, Xiaowei Hu, Pengchuan Zhang, Xiujun Li, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

We design a Two-stage Efficient feature Extractor (TEE), inspired by the one-stage EfficientDet network, to significantly reduce the time cost of visual feature extraction by $95\%$, compared to a baseline model.

Language Modelling

Paper
Add Code

MicroNet: Towards Image Recognition with Extremely Low FLOPs

no code implementations • 24 Nov 2020 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

In this paper, we present MicroNet, which is an efficient convolutional neural network using extremely low computational cost (e. g. 6 MFLOPs on ImageNet classification).

Paper
Add Code

Semantic Change Detection with Asymmetric Siamese Networks

1 code implementation • 12 Oct 2020 • Kunping Yang, Gui-Song Xia, Zicheng Liu, Bo Du, Wen Yang, Marcello Pelillo, Liangpei Zhang

Given two multi-temporal aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries.

Change Detection Management

Paper
Code

Deep Clustering and Representation Learning that Preserves Geometric Structures

no code implementations • 28 Sep 2020 • Lirong Wu, Zicheng Liu, Zelin Zang, Jun Xia, Siyuan Li, Stan Z. Li

To overcome the problem that clusteringoriented losses may deteriorate the geometric structure of embeddings in the latent space, an isometric loss is proposed for preserving intra-manifold structure locally and a ranking loss for inter-manifold structure globally.

Clustering Deep Clustering +1

Paper
Add Code

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning

no code implementations • 28 Sep 2020 • Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps).

Ranked #3 on Image Captioning on nocaps-XD out-of-domain

Image Captioning Object +1

Paper
Add Code

Generalized Clustering and Multi-Manifold Learning with Geometric Structure Preservation

1 code implementation • 21 Sep 2020 • Lirong Wu, Zicheng Liu, Zelin Zang, Jun Xia, Siyuan Li, Stan Z. Li

Though manifold-based clustering has become a popular research topic, we observe that one important factor has been omitted by these works, namely that the defined clustering loss may corrupt the local and global structure of the latent space.

Clustering Deep Clustering +1

Paper
Code

Dynamic ReLU

2 code implementations • ECCV 2020 • Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dong-Dong Chen, Lu Yuan, Zicheng Liu

Rectified linear units (ReLU) are commonly used in deep neural networks.

204

Paper
Code

Learning Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes

no code implementations • 28 Feb 2020 • Kevin Lin, Lijuan Wang, Ying Jin, Zicheng Liu, Ming-Ting Sun

Experimental results on multiple public datasets show that without using 3D ground truth meshes, the proposed approach outperforms the previous state-of-the-art approaches that require ground truth meshes for training.

Segmentation

Paper
Add Code

Dynamic Convolution: Attention over Convolution Kernels

5 code implementations • CVPR 2020 • Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dong-Dong Chen, Lu Yuan, Zicheng Liu

Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability.

Ranked #905 on Image Classification on ImageNet

Image Classification Keypoint Detection

10,805

Paper
Code

Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation

3 code implementations • 11 Jul 2019 • Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun

On the other hand, if part labels are also available in the real-images during training, our method outperforms the supervised state-of-the-art methods by a large margin.

Ranked #1 on Human Part Segmentation on PASCAL-Part (using extra training data)

Domain Adaptation Human Part Segmentation +3

270

Paper
Code

Large Scale Incremental Learning

4 code implementations • CVPR 2019 • Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Yun Fu

We believe this is because of the combination of two factors: (a) the data imbalance between the old and new classes, and (b) the increasing number of visually similar classes.

Ranked #2 on Incremental Learning on CIFAR-100 - 50 classes + 50 steps of 1 class

Class Incremental Learning Incremental Learning

1,659

Paper
Code

Rethinking Classification and Localization for Object Detection

2 code implementations • CVPR 2020 • Yue Wu, Yinpeng Chen, Lu Yuan, Zicheng Liu, Lijuan Wang, Hongzhi Li, Yun Fu

Two head structures (i. e. fully connected head and convolution head) have been widely used in R-CNN based detectors for classification and localization tasks.

Classification General Classification +3

27,716

Paper
Code

Incremental Classifier Learning with Generative Adversarial Networks

no code implementations • 2 Feb 2018 • Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Zhengyou Zhang, Yun Fu

To address these problems, we propose (a) a new loss function to combine the cross-entropy loss and distillation loss, (b) a simple way to estimate and remove the unbalance between the old and new classes , and (c) using Generative Adversarial Networks (GANs) to generate historical data and select representative exemplars during generation.

General Classification

Paper
Add Code

HP-GAN: Probabilistic 3D human motion prediction via GAN

3 code implementations • 27 Nov 2017 • Emad Barsoum, John Kender, Zicheng Liu

Our model, which we call HP-GAN, learns a probability density function of future human poses conditioned on previous poses.

Ranked #7 on Human Pose Forecasting on Human3.6M (APD metric)

Autonomous Vehicles Human motion prediction +5

Paper
Code

Reinforced Temporal Attention and Split-Rate Transfer for Depth-Based Person Re-Identification

no code implementations • ECCV 2018 • Nikolaos Karianakis, Zicheng Liu, Yinpeng Chen, Stefano Soatto

We address the problem of person re-identification from commodity depth sensors.

Person Re-Identification

Paper
Add Code

A Tube-and-Droplet-based Approach for Representing and Analyzing Motion Trajectories

no code implementations • 10 Sep 2016 • Weiyao Lin, Yang Zhou, Hongteng Xu, Junchi Yan, Mingliang Xu, Jianxin Wu, Zicheng Liu

Our approach first leverages the complete information from given trajectories to construct a thermal transfer field which provides a context-rich way to describe the global motion pattern in a scene.

3D Action Recognition Anomaly Detection +2

Paper
Add Code

Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

no code implementations • CVPR 2013 • Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen

Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels.

Image Segmentation Segmentation +2

Paper
Add Code

Semi-supervised Node Splitting for Random Forest Construction

no code implementations • CVPR 2013 • Xiao Liu, Mingli Song, DaCheng Tao, Zicheng Liu, Luming Zhang, Chun Chen, Jiajun Bu

Node splitting is an important issue in Random Forest but robust splitting requires a large number of training samples.

Image Segmentation Object Categorization +1

Paper
Add Code

Tensor-Based Human Body Modeling

no code implementations • CVPR 2013 • Yinpeng Chen, Zicheng Liu, Zhengyou Zhang

In this paper, we present a novel approach to model 3D human body with variations on both human shape and pose, by exploring a tensor decomposition technique.

3D Reconstruction Tensor Decomposition

Paper
Add Code

HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences

no code implementations • CVPR 2013 • Omar Oreifej, Zicheng Liu

In contrast, we describe the depth sequence using a histogram capturing the distribution of the surface normal orientation in the 4D space of time, depth, and spatial coordinates.

Activity Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.