Search Results for author: Yicong Hong

Found 17 papers, 13 papers with code

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

no code implementations • 24 Feb 2024 • Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang

Vision-and-Language Navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions.

Decision Making Instruction Following +3

Paper
Add Code

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

no code implementations • 10 Nov 2023 • Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi

Text-to-3D with diffusion models has achieved remarkable progress in recent years.

Text to 3D

Paper
Add Code

LRM: Large Reconstruction Model for Single Image to 3D

1 code implementation • 8 Nov 2023 • Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan

We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.

Image to 3D

757

Paper
Code

Scaling Data Generation in Vision-and-Language Navigation

1 code implementation • ICCV 2023 • Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.

Imitation Learning Vision and Language Navigation +1

136

Paper
Code

Learning Navigational Visual Representations with Semantic Map Supervision

1 code implementation • ICCV 2023 • Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui, Stephen Gould, Hao Tan

Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot.

Representation Learning Self-Supervised Learning +2

Paper
Code

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

1 code implementation • 26 May 2023 • Gengze Zhou, Yicong Hong, Qi Wu

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling.

Instruction Following Vision and Language Navigation +1

Paper
Code

Bi-directional Training for Composed Image Retrieval via Text Prompt Learning

1 code implementation • 29 Mar 2023 • Zheyuan Liu, Weixuan Sun, Yicong Hong, Damien Teney, Stephen Gould

Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes.

Ranked #6 on Image Retrieval on Fashion IQ

Composed Image Retrieval (CoIR) Retrieval

Paper
Code

HOP+: History-enhanced and Order-aware Pre-training for Vision-and-Language Navigation

no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2023 • Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, and Qi Wu ̊

To address these problems, we present a history-enhanced and order-aware pre-training with the complementing fine-tuning paradigm (HOP+) for VLN.

Decision Making Language Modelling +2

Paper
Add Code

1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)

1 code implementation • 23 Jun 2022 • Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao

Our model consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller.

Data Augmentation Vision and Language Navigation

162

Paper
Code

HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation

1 code implementation • CVPR 2022 • Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, Qi Wu

Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation (VLN).

Ranked #4 on Visual Navigation on R2R

Decision Making Language Modelling +3

Paper
Code

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

1 code implementation • CVPR 2022 • Yicong Hong, Zun Wang, Qi Wu, Stephen Gould

To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments.

Imitation Learning Vision and Language Navigation

Paper
Code

VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

no code implementations • CVPR 2021 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould

In this paper we propose a recurrent BERT model that is time-aware for use in VLN.

Decision Making Decoder +2

Paper
Add Code

Learning structure-aware semantic segmentation with image-level supervision

1 code implementation • 15 Apr 2021 • Jiawei Liu, Jing Zhang, Yicong Hong, Nick Barnes

Within this pipeline, the class activation map (CAM) is obtained and further processed to serve as a pseudo label to train the semantic segmentation model in a fully-supervised manner.

Boundary Detection Common Sense Reasoning +4

Paper
Code

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

1 code implementation • ICCV 2021 • Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu

Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.

Vision and Language Navigation Vision-Language Navigation

Paper
Code

A Recurrent Vision-and-Language BERT for Navigation

1 code implementation • 26 Nov 2020 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould

In this paper we propose a recurrent BERT model that is time-aware for use in VLN.

Ranked #7 on Visual Navigation on R2R

Decision Making Decoder +4

144

Paper
Code

Language and Visual Entity Relationship Graph for Agent Navigation

1 code implementation • NeurIPS 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen Gould

From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.

Dynamic Time Warping Navigate +2

Paper
Code

Sub-Instruction Aware Vision-and-Language Navigation

1 code implementation • EMNLP 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould

Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.

Navigate Vision and Language Navigation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.