Search Results for author: Zuyan Liu

Found 6 papers, 5 papers with code

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

1 code implementation • 19 Mar 2024 • Zuyan Liu, Yuhao Dong, Yongming Rao, Jie zhou, Jiwen Lu

In the realm of vision-language understanding, the proficiency of models in interpreting and reasoning over visual content has become a cornerstone for numerous applications.

Ranked #44 on Visual Question Answering on MM-Vet

visual instruction following Visual Question Answering

Paper
Code

HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation

no code implementations • 29 Jul 2023 • Zuyan Liu, Gaojie Lin, Congyi Wang, Min Zheng, Feida Zhu

Our approach involves a unified and multi-granularity strategy that includes a pseudo keypoint alignment module in the teacher-student framework for learning pose-aware semantic class tokens.

Pose Estimation regression +2

Paper
Add Code

Unleashing Text-to-Image Diffusion Models for Visual Perception

2 code implementations • ICCV 2023 • Wenliang Zhao, Yongming Rao, Zuyan Liu, Benlin Liu, Jie zhou, Jiwen Lu

In this paper, we propose VPD (Visual Perception with a pre-trained Diffusion model), a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.

Ranked #7 on Referring Expression Segmentation on RefCoCo val

Denoising Image Segmentation +4

7,431

Paper
Code

DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion

1 code implementation • CVPR 2023 • Wenliang Zhao, Yongming Rao, Weikang Shi, Zuyan Liu, Jie zhou, Jiwen Lu

Unlike previous work that relies on carefully designed network architectures and loss functions to fuse the information from the source and target faces, we reformulate the face swapping as a conditional inpainting task, performed by a powerful diffusion model guided by the desired face attributes (e. g., identity and landmarks).

Face Swapping

Paper
Code

Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks

1 code implementation • 4 Jul 2022 • Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie zhou, Jiwen Lu

We extend our method to hierarchical models including CNNs and hierarchical vision Transformers as well as more complex dense prediction tasks that require structured feature maps by formulating a more generic dynamic spatial sparsification framework with progressive sparsification and asymmetric computation for different spatial locations.

534

Paper
Code

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

1 code implementation • ICCV 2021 • Xumin Yu, Yongming Rao, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie zhou

In this paper, we present a new method that reformulates point cloud completion as a set-to-set translation problem and design a new model, called PoinTr that adopts a transformer encoder-decoder architecture for point cloud completion.

Ranked #1 on Point Cloud Completion on ShapeNet (Chamfer Distance L2 metric)

Inductive Bias Point Cloud Completion +1

521

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.