Search Results for author: Yuanze Lin

Found 8 papers, 5 papers with code

DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion

1 code implementation25 Mar 2024 Yuanze Lin, Ronald Clark, Philip Torr

We present DreamPolisher, a novel Gaussian Splatting based method with geometric guidance, tailored to learn cross-view consistency and intricate detail from textual descriptions.

3D Generation Text to 3D

SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training

no code implementations ICCV 2023 Yuanze Lin, Chen Wei, Huiyu Wang, Alan Yuille, Cihang Xie

Coupling all these designs allows our method to enjoy both competitive performances on text-to-video retrieval and video question answering tasks, and much less pre-training costs by 1. 9X or more.

Question Answering Retrieval +3

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

1 code implementation2 Jun 2022 Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, Lu Yuan

Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final answering model, which is counter-intuitive to some extent.

Question Answering Retrieval +1

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

1 code implementation CVPR 2022 Haojun Jiang, Yuanze Lin, Dongchen Han, Shiji Song, Gao Huang

Our method leverages an off-the-shelf object detector to identify visual objects from unlabeled images, and then language queries for these objects are obtained in an unsupervised fashion with a pseudo-query generation module.

Language Modelling Natural Language Queries +1

Cross-Stage Transformer for Video Learning

no code implementations29 Sep 2021 Yuanze Lin, Xun Guo, Yan Lu

By inserting the proposed cross-stage mechanism in existing spatial and temporal transformer blocks, we build a separable transformer network for video learning based on ViT structure, in which self-attentions and features are progressively aggregated from one block to the next.

Action Recognition Temporal Action Localization

Self-Supervised Video Representation Learning with Meta-Contrastive Network

no code implementations ICCV 2021 Yuanze Lin, Xun Guo, Yan Lu

Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch.

Contrastive Learning Meta-Learning +6

Cannot find the paper you are looking for? You can Submit a new open access paper.