no code implementations • 22 Mar 2024 • Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, Qing Li
Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems.
no code implementations • 18 Mar 2024 • Yue Fan, Xiaojian Ma, Rujie Wu, Yuntao Du, Jiaqi Li, Zhi Gao, Qing Li
We explore how reconciling several foundation models (large language models and vision-language models) with a novel unified memory mechanism could tackle the challenging video understanding problem, especially capturing the long-term temporal relations in lengthy videos.
1 code implementation • 3 Feb 2024 • Yi Xin, Siqi Luo, Haodi Zhou, Junlong Du, Xiaohong Liu, Yue Fan, Qing Li, Yuntao Du
Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability across various downstream vision tasks.
no code implementations • 29 Jan 2024 • Yue Fan, Jing Gu, Kaiwen Zhou, Qianqi Yan, Shan Jiang, Ching-Chen Kuo, Xinze Guan, Xin Eric Wang
Our evaluation shows that questions in the MultipanelVQA benchmark pose significant challenges to the state-of-the-art Large Vision Language Models (LVLMs) tested, even though humans can attain approximately 99\% accuracy on these questions.
1 code implementation • ICCV 2023 • Yue Fan, Anna Kukleva, Dengxin Dai, Bernt Schiele
In experiments, SSB greatly improves both inlier classification and outlier detection performance, outperforming existing methods by a large margin.
1 code implementation • 5 Oct 2023 • Saaket Agashe, Yue Fan, Anthony Reyna, Xin Eric Wang
In this study, we introduce a new LLM-Coordination Benchmark aimed at a detailed analysis of LLMs within the context of Pure Coordination Games, where participating agents need to cooperate for the most gain.
no code implementations • 29 May 2023 • Yue Fan, Ivan Skorokhodov, Oleg Voynov, Savva Ignatyev, Evgeny Burnaev, Peter Wonka, Yiqun Wang
We develop a method that recovers the surface, materials, and illumination of a scene from its posed multi-view images.
no code implementations • 23 May 2023 • Yue Fan, Jing Gu, Kaizhi Zheng, Xin Eric Wang
Intelligent navigation-helper agents are critical as they can navigate users in unknown areas through environmental awareness and conversational ability, serving as potential accessibility tools for individuals with disabilities.
4 code implementations • 26 Jan 2023 • Hao Chen, Ran Tao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Bhiksha Raj, Marios Savvides
The critical challenge of Semi-Supervised Learning (SSL) is how to effectively leverage the limited labeled data and massive unlabeled data to improve the model's generalization performance.
no code implementations • 20 Nov 2022 • Hao Chen, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Marios Savvides, Bhiksha Raj
While standard SSL assumes uniform data distribution, we consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.
no code implementations • 28 Aug 2022 • Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang
Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc.
4 code implementations • 12 Aug 2022 • Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, RenJie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang, Xing Xie, Yue Zhang
We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning.
2 code implementations • 24 May 2022 • Yue Fan, Winson Chen, Tongzhou Jiang, Chun Zhou, Yi Zhang, Xin Eric Wang
To this end, we introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation.
4 code implementations • 15 May 2022 • Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie
Semi-supervised Learning (SSL) has witnessed great success owing to the impressive performances brought by various methods based on pseudo labeling and consistency regularization.
no code implementations • 10 Dec 2021 • Yue Fan, Anna Kukleva, Bernt Schiele
Generally, the aim is to train a model that is invariant to various data augmentations.
1 code implementation • CVPR 2022 • Yue Fan, Dengxin Dai, Anna Kukleva, Bernt Schiele
In this paper, we propose a novel co-learning framework (CoSSL) with decoupled representation learning and classifier learning for imbalanced SSL.
no code implementations • 29 Sep 2021 • Yue Fan, Xiuli Ma
Networks serve as efficient tools to describe close relationships among nodes.
no code implementations • 30 Aug 2020 • Yue Fan, Shilei Chu, Wei zhang, Ran Song, Yibin Li
Extensive experiments are conducted to demonstrate the accuracy of the proposed imitating learning process as well as the reliability of the holistic system for autonomous drone navigation.
no code implementations • 5 Feb 2020 • Yue Fan, Yongqin Xian, Max Maria Losch, Bernt Schiele
In this paper, we are pushing the envelope and aim to further investigate the reliance on spatial information.
2 code implementations • 31 Oct 2019 • Yue Fan, Jiawen Kang, Lantian Li, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang, Ziya Zhou, Yunqi Cai, Dong Wang
These datasets tend to deliver over optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions.
no code implementations • 19 Apr 2019 • Junshan Wang, Zhicong Lu, Guojie Song, Yue Fan, Lun Du, Wei. Lin
Network embedding is a method to learn low-dimensional representation vectors for nodes in complex networks.
3 code implementations • 29 Nov 2018 • Haoran Wang, Yue Fan, Zexin Wang, Licheng Jiao, Bernt Schiele
We propose a novel architecture for Person Re-Identification, based on a novel parameter-free spatial attention layer introducing spatial relations among the feature map activations back to the model.
Ranked #20 on Person Re-Identification on DukeMTMC-reID
no code implementations • 19 Dec 2012 • Yue Fan, Louise Raphael, Mark Kon
Such feature vector regularization inherits a property from function denoising on ${\bf R}^n$, in that accuracy is non-monotonic in the denoising (regularization) parameter $\alpha$.