no code implementations • 19 Mar 2024 • Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, Lin Wang
Then, we propose a conceptual reasoning-based uncertainty estimation module, which simulates the recognition process to enrich the semantic representation.
no code implementations • 19 Mar 2024 • Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou, Lin Wang
To make this possible, we 1) construct a knowledge base of text embeddings with the help of LLMs and multi-modal LLMs; 2) adaptively build LLM-augmented class-wise embedding center on top of the knowledge base and encoded visual embeddings; 3) align all the embeddings to the LLM-augmented embedding center via contrastive learning to achieve a unified and balanced representation space.
no code implementations • 31 Jan 2024 • Yuanhuiyi Lyu, Xu Zheng, Lin Wang
It extracts entity features from the multi-modal representations powered by our specially constructed entity knowledge graph; 2) Attribute Fusion Branch adeptly preserves and processes the attributes.
no code implementations • 17 Sep 2023 • Jiahang Cao, Xu Zheng, Yuanhuiyi Lyu, Jiaxu Wang, Renjing Xu, Lin Wang
The ability to detect objects in all lighting (i. e., normal-, over-, and under-exposed) conditions is crucial for real-world applications, such as self-driving. Traditional RGB-based detectors often fail under such varying lighting conditions. Therefore, recent works utilize novel event cameras to supplement or guide the RGB modality; however, these methods typically adopt asymmetric network structures that rely predominantly on the RGB modality, resulting in limited robustness for all-day detection.
no code implementations • 6 Aug 2023 • Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, Lin Wang
Accordingly, we first introduce a novel event encoder that subtly models the temporal information from events and meanwhile, generates event prompts for modality bridging.
no code implementations • 24 Mar 2023 • Haotian Bai, Yuanhuiyi Lyu, Lutao Jiang, Sijia Li, Haonan Lu, Xiaodong Lin, Lin Wang
To tackle the issue of 'guidance collapse' and enhance consistency, we propose a novel framework, dubbed CompoNeRF, by integrating an editable 3D scene layout with object specific and scene-wide guidance mechanisms.