no code implementations • 28 Apr 2024 • Zhiwei Huang, Yikang Zhang, Qijun Chen, Rui Fan
The cornerstone of our framework and toolbox is the cross-modal mask matching (C3M) algorithm, developed based on a state-of-the-art (SoTA) LVM and capable of generating sufficient and reliable matches.
1 code implementation • 16 Apr 2024 • Liuyi Wang, Zongtao He, Ronghao Dang, Mengjiao Shen, Chengju Liu, Qijun Chen
In the pursuit of robust and generalizable environment perception and language understanding, the ubiquitous challenge of dataset bias continues to plague vision-and-language navigation (VLN) agents, hindering their performance in unseen environments.
no code implementations • 9 Apr 2024 • Chuang-Wei Liu, Qijun Chen, Rui Fan
We believe this new paradigm will pave the way for the next generation of stereo matching networks.
1 code implementation • 4 Apr 2024 • Jiahang Li, Peng Yun, Qijun Chen, Rui Fan
In this study, we take one step toward this new research area by exploring a feasible strategy to fully exploit VFM features for RGB-thermal scene parsing.
Ranked #1 on Thermal Image Segmentation on KP day-night
no code implementations • 13 Mar 2024 • Sicen Guo, Zhiyuan Wu, Qijun Chen, Ioannis Pitas, Rui Fan
We introduce the Learning to Infuse "X" (LIX) framework, with novel contributions in both logit distillation and feature distillation aspects.
no code implementations • 6 Mar 2024 • Liuyi Wang, Zongtao He, Ronghao Dang, Huiyi Chen, Chengju Liu, Qijun Chen
Vision-and-Language Navigation (VLN) has gained significant research interest in recent years due to its potential applications in real-world scenarios.
no code implementations • 29 Feb 2024 • Yi Feng, Yu Ma, Qijun Chen, Ioannis Pitas, Rui Fan
Feature-fusion networks with duplex encoders have proven to be an effective technique to solve the freespace detection problem.
no code implementations • 24 Feb 2024 • Xiao Lin, Minghao Zhu, Ronghao Dang, Guangliang Zhou, Shaolong Shu, Feng Lin, Chengju Liu, Qijun Chen
Inspired by this motivation, we propose CLIPose, a novel 6D pose framework that employs the pre-trained vision-language model to develop better learning of object category information, which can fully leverage abundant semantic knowledge in image and text modalities.
no code implementations • 21 Jan 2024 • Zhiyuan Wu, Yi Feng, Chuang-Wei Liu, Fisher Yu, Qijun Chen, Rui Fan
Hence, in this article, we introduce S$^3$M-Net, a novel joint learning framework developed to perform semantic segmentation and stereo matching simultaneously.
1 code implementation • 19 Dec 2023 • Wengang Guo, Jiayi Yang, Huilin Yin, Qijun Chen, Wei Ye
Experimental results have demonstrated that our method PICNN (the combination of standard CNNs with our proposed pathway) exhibits greater interpretability than standard CNNs while achieving higher or comparable discrimination power.
no code implementations • 13 Dec 2023 • Jingwei Yang, Bohuan Xue, Yi Feng, Deming Wang, Rui Fan, Qijun Chen
This article introduces three-filters-to-normal+ (3F2N+), an extension of our previous work three-filters-to-normal (3F2N), with a specific focus on incorporating discontinuity discrimination capability into surface normal estimators (SNEs).
no code implementations • 25 Oct 2023 • Xiao Lin, Deming Wang, Guangliang Zhou, Chengju Liu, Qijun Chen
To improve robustness to occlusion, we adopt Transformer to perform the exchange of global information, making each local feature contains global information.
1 code implementation • 8 Oct 2023 • Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song
In order to encompass common detection expressions, we involve emerging vision-language model (VLM) and large language model (LLM) to generate instructions guided by text prompts and object bbxs, as the generalizations of foundation models are effective to produce human-like expressions (e. g., describing object property, category, and relationship).
no code implementations • 19 Sep 2023 • Hongbo Zhao, Yikang Zhang, Qijun Chen, Rui Fan
Instead, we introduce four new evaluation metrics to quantify the robustness and accuracy of extrinsic parameter estimation, applicable to both single-pair and multi-pair cases.
no code implementations • 19 Sep 2023 • Jiahang Li, Yikang Zhang, Peng Yun, Guangliang Zhou, Qijun Chen, Rui Fan
Additionally, we release SYN-UDTIRI, the first large-scale road scene parsing dataset that contains over 10, 407 RGB images, dense depth images, and the corresponding pixel-level annotations for both freespace and road defects of different shapes and sizes.
1 code implementation • 1 Sep 2023 • Minghao Zhu, Xiao Lin, Ronghao Dang, Chengju Liu, Qijun Chen
As the most essential property in a video, motion information is critical to a robust and generalized video representation.
no code implementations • 31 Aug 2023 • Chenbo Zhou, Shuai Su, Qijun Chen, Rui Fan
Accurate and robust correspondence matching is of utmost importance for various 3D computer vision tasks.
no code implementations • 29 Jul 2023 • Yi Feng, Ruge Zhang, Jiayuan Du, Qijun Chen, Rui Fan
Additionally, our proposed freespace optical flow model boasts a diverse array of applications within the realm of automated driving, providing a geometric constraint in freespace detection, vehicle localization, and more.
1 code implementation • 14 Jun 2023 • Linfeng Yuan, Miaojing Shi, Zijie Yue, Qijun Chen
Referring video object segmentation (RVOS) aims to segment the target instance referred by a given text expression in a video clip.
Ranked #12 on Referring Expression Segmentation on Refer-YouTube-VOS (2021 public validation) (using extra training data)
Referring Expression Segmentation Referring Video Object Segmentation +2
no code implementations • 19 May 2023 • Liuyi Wang, Chengju Liu, Zongtao He, Shu Li, Qingqing Yan, Huiyi Chen, Qijun Chen
The experimental results demonstrate that PASTS outperforms all existing speaker models and successfully improves the performance of previous VLN models, achieving state-of-the-art performance on the standard Room-to-Room (R2R) dataset.
1 code implementation • 5 May 2023 • Liuyi Wang, Zongtao He, Jiagui Tang, Ronghao Dang, Naijia Wang, Chengju Liu, Qijun Chen
Vision-and-Language Navigation (VLN) is a realistic but challenging task that requires an agent to locate the target region using verbal and visual cues.
1 code implementation • 24 Apr 2023 • Yi Feng, Bohuan Xue, Ming Liu, Qijun Chen, Rui Fan
Surface normal holds significant importance in visual environmental perception, serving as a source of rich geometric information.
no code implementations • 18 Apr 2023 • Sicen Guo, Jiahang Li, Yi Feng, Dacheng Zhou, Denghuang Zhang, Chen Chen, Shuai Su, Xingyi Zhu, Qijun Chen, Rui Fan
To foster advancements in this burgeoning field, we have launched an online open-source benchmark suite, referred to as UDTIRI.
1 code implementation • 2 Mar 2023 • Zongtao He, Liuyi Wang, Shu Li, Qingqing Yan, Chengju Liu, Qijun Chen
For a better performance in continuous VLN, we design a multi-level instruction understanding procedure and propose a novel model, Multi-Level Attention Network (MLANet).
no code implementations • 3 Feb 2023 • Ronghao Dang, Lu Chen, Liuyi Wang, Zongtao He, Chengju Liu, Qijun Chen
We propose a meta-ability decoupling (MAD) paradigm, which brings together various object navigation methods in an architecture system, allowing them to mutually enhance each other and evolve together.
no code implementations • 21 Aug 2022 • Shuai Su, Zhongkai Zhao, Yixin Fei, Shuda Li, Qijun Chen, Rui Fan
The experimental results demonstrate the importance of group equivariant algorithms for correspondence matching on various sim(2) transformation conditions.
no code implementations • ICCV 2023 • Ronghao Dang, Liuyi Wang, Zongtao He, Shuai Su, Chengju Liu, Qijun Chen
After seeing the target, we remember the target location and navigate to.
1 code implementation • 2 Jun 2022 • Wei Ye, Hao Tian, Qijun Chen
To mitigate the two challenges, we propose a novel graph kernel called the Multi-scale Wasserstein Shortest-Path graph kernel (MWSP), at the heart of which is the multi-scale shortest-path node feature map, of which each element denotes the number of occurrences of a shortest path around a node.
no code implementations • 9 Apr 2022 • Ronghao Dang, Zhuofan Shi, Liuyi Wang, Zongtao He, Chengju Liu, Qijun Chen
Thus, in this paper, we propose a directed object attention (DOA) graph to guide the agent in explicitly learning the attention relationships between objects, thereby reducing the object attention bias.
no code implementations • 10 Mar 2022 • Yun Xiang, Qijun Chen, Zhongjin Su, Lu Zhang, Zuohui Chen, Guozhi Zhou, Zhuping Yao, Qi Xuan, Yuan Cheng
Cherry tomato (Solanum Lycopersicum) is popular with consumers over the world due to its special flavor.
no code implementations • 17 Dec 2021 • Yiyue Zhao, Cailin Lei, Yu Shen, Yuchuan Du, Qijun Chen
To enhance the visual perception capability of human-vehicle cooperative driving, this paper proposed a cooperative visual perception model.
no code implementations • CVPR 2019 • Miaojing Shi, Zhaohui Yang, Chao Xu, Qijun Chen
Modern crowd counting methods employ deep neural networks to estimate crowd counts via crowd density regressions.
1 code implementation • 20 Feb 2018 • Xiaochuan Yin, Henglai Wei, Penghong Lin, Xiangwei Wang, Qijun Chen
Novel view synthesis aims to synthesize new images from different viewpoints of given images.
no code implementations • ICCV 2017 • Xiaochuan Yin, Xiangwei Wang, Xiaoguo Du, Qijun Chen
Normally, road plane and camera height are specified as reference to recover the scale.