no code implementations • 24 Feb 2024 • Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang
Vision-and-Language Navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions.
no code implementations • 10 Nov 2023 • Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi
Text-to-3D with diffusion models has achieved remarkable progress in recent years.
1 code implementation • 8 Nov 2023 • Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.
1 code implementation • ICCV 2023 • Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao
Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.
1 code implementation • ICCV 2023 • Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui, Stephen Gould, Hao Tan
Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot.
1 code implementation • 26 May 2023 • Gengze Zhou, Yicong Hong, Qi Wu
Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling.
1 code implementation • 29 Mar 2023 • Zheyuan Liu, Weixuan Sun, Yicong Hong, Damien Teney, Stephen Gould
Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes.
Ranked #6 on Image Retrieval on Fashion IQ
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2023 • Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, and Qi Wu ̊
To address these problems, we present a history-enhanced and order-aware pre-training with the complementing fine-tuning paradigm (HOP+) for VLN.
1 code implementation • 23 Jun 2022 • Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao
Our model consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller.
1 code implementation • CVPR 2022 • Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, Qi Wu
Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation (VLN).
Ranked #4 on Visual Navigation on R2R
1 code implementation • CVPR 2022 • Yicong Hong, Zun Wang, Qi Wu, Stephen Gould
To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments.
no code implementations • CVPR 2021 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
In this paper we propose a recurrent BERT model that is time-aware for use in VLN.
1 code implementation • 15 Apr 2021 • Jiawei Liu, Jing Zhang, Yicong Hong, Nick Barnes
Within this pipeline, the class activation map (CAM) is obtained and further processed to serve as a pseudo label to train the semantic segmentation model in a fully-supervised manner.
1 code implementation • ICCV 2021 • Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu
Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.
1 code implementation • 26 Nov 2020 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
In this paper we propose a recurrent BERT model that is time-aware for use in VLN.
Ranked #7 on Visual Navigation on R2R
1 code implementation • NeurIPS 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen Gould
From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.
1 code implementation • EMNLP 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould
Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.