Search Results for author: David Junhao Zhang

In contrast to previous methods that rely on dense correspondences, we introduce the VideoSwap framework that exploits semantic point correspondences, inspired by our observation that only a small number of semantic points are necessary to align the subject's motion trajectory and modify its shape.

Video Editing

Paper
Add Code

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

1 code implementation • 12 Oct 2023 • Rui Zhao, YuChao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou

Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion.

693

Paper
Code

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

1 code implementation • 27 Sep 2023 • David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, YuChao Gu, Difei Gao, Mike Zheng Shou

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

Ranked #2 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Alignment +1

995

Paper
Code

Dataset Condensation via Generative Model

no code implementations • 14 Sep 2023 • David Junhao Zhang, Heng Wang, Chuhui Xue, Rui Yan, Wenqing Zhang, Song Bai, Mike Zheng Shou

Dataset condensation aims to condense a large dataset with a lot of training samples into a small set.

Dataset Condensation

Paper
Add Code

Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

no code implementations • 13 Aug 2023 • David Junhao Zhang, Mutian Xu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou

Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy.

Contrastive Learning Image Classification +2

Paper
Add Code

Too Large; Data Reduction for Vision-Language Pre-Training

2 code implementations • ICCV 2023 • Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou

Specifically, TL;DR can compress the mainstream VLP datasets at a high ratio, e. g., reduce well-cleaned CC3M dataset from 2. 82M to 0. 67M ($\sim$24\%) and noisy YFCC15M from 15M to 2. 5M ($\sim$16. 7\%).

Paper
Code

Making Vision Transformers Efficient from A Token Sparsification View

1 code implementation • CVPR 2023 • Shuning Chang, Pichao Wang, Ming Lin, Fan Wang, David Junhao Zhang, Rong Jin, Mike Zheng Shou

In this work, we propose a novel Semantic Token ViT (STViT), for efficient global and local vision transformers, which can also be revised to serve as backbone for downstream tasks.

Efficient ViTs Instance Segmentation +4

Paper
Code

Label-Efficient Online Continual Object Detection in Streaming Video

1 code implementation • ICCV 2023 • Jay Zhangjie Wu, David Junhao Zhang, Wynne Hsu, Mengmi Zhang, Mike Zheng Shou

Remarkably, with only 25% annotated video frames, our method still outperforms the base CL learners, which are trained with 100% annotations on all video frames.

Continual Learning Hippocampus +3

Paper
Code

DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes

1 code implementation • 31 May 2022 • Jia-Wei Liu, Yan-Pei Cao, Weijia Mao, Wenqiao Zhang, David Junhao Zhang, Jussi Keppo, Ying Shan, XiaoHu Qie, Mike Zheng Shou

In this paper, we present DeVRF, a novel representation to accelerate learning dynamic radiance fields.

Novel View Synthesis

178

Paper
Code

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition

no code implementations • CVPR 2022 • Mingfei Han, David Junhao Zhang, Yali Wang, Rui Yan, Lina Yao, Xiaojun Chang, Yu Qiao

Learning spatial-temporal relation among multiple actors is crucial for group activity recognition.

Group Activity Recognition

Paper
Add Code

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning

2 code implementations • 24 Nov 2021 • David Junhao Zhang, Kunchang Li, Yali Wang, Yunpeng Chen, Shashwat Chandra, Yu Qiao, Luoqi Liu, Mike Zheng Shou

With such multi-dimension and multi-scale factorization, our MorphMLP block can achieve a great accuracy-computation balance.

Ranked #38 on Action Recognition on Something-Something V2 (using extra training data)

Action Recognition Image Classification +3

165

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.