1 code implementation • 19 Mar 2024 • Zuyan Liu, Yuhao Dong, Yongming Rao, Jie zhou, Jiwen Lu
In the realm of vision-language understanding, the proficiency of models in interpreting and reasoning over visual content has become a cornerstone for numerous applications.
Ranked #44 on Visual Question Answering on MM-Vet
no code implementations • 29 Jul 2023 • Zuyan Liu, Gaojie Lin, Congyi Wang, Min Zheng, Feida Zhu
Our approach involves a unified and multi-granularity strategy that includes a pseudo keypoint alignment module in the teacher-student framework for learning pose-aware semantic class tokens.
2 code implementations • ICCV 2023 • Wenliang Zhao, Yongming Rao, Zuyan Liu, Benlin Liu, Jie zhou, Jiwen Lu
In this paper, we propose VPD (Visual Perception with a pre-trained Diffusion model), a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.
Ranked #7 on Referring Expression Segmentation on RefCoCo val
1 code implementation • CVPR 2023 • Wenliang Zhao, Yongming Rao, Weikang Shi, Zuyan Liu, Jie zhou, Jiwen Lu
Unlike previous work that relies on carefully designed network architectures and loss functions to fuse the information from the source and target faces, we reformulate the face swapping as a conditional inpainting task, performed by a powerful diffusion model guided by the desired face attributes (e. g., identity and landmarks).
1 code implementation • 4 Jul 2022 • Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie zhou, Jiwen Lu
We extend our method to hierarchical models including CNNs and hierarchical vision Transformers as well as more complex dense prediction tasks that require structured feature maps by formulating a more generic dynamic spatial sparsification framework with progressive sparsification and asymmetric computation for different spatial locations.
1 code implementation • ICCV 2021 • Xumin Yu, Yongming Rao, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie zhou
In this paper, we present a new method that reformulates point cloud completion as a set-to-set translation problem and design a new model, called PoinTr that adopts a transformer encoder-decoder architecture for point cloud completion.
Ranked #1 on Point Cloud Completion on ShapeNet (Chamfer Distance L2 metric)