1 code implementation • 10 Mar 2024 • Minjie Zhu, Yichen Zhu, Xin Liu, Ning Liu, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Zhicai Ou, Feifei Feng, Jian Tang
Multimodal Large Language Models (MLLMs) have showcased impressive skills in tasks related to visual understanding and reasoning.
Ranked #69 on Visual Question Answering on MM-Vet
no code implementations • 8 Jan 2024 • Minjie Zhu, Yichen Zhu, Jinming Li, Junjie Wen, Zhiyuan Xu, Zhengping Che, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, Jian Tang
The language-conditioned robotic manipulation aims to transfer natural language instructions into executable actions, from simple pick-and-place to tasks requiring intent recognition and visual reasoning.
no code implementations • 5 Jan 2024 • Junjie Wen, Yichen Zhu, Minjie Zhu, Jinming Li, Zhiyuan Xu, Zhengping Che, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, Jian Tang
Humans interpret scenes by recognizing both the identities and positions of objects in their observations.
no code implementations • 4 Aug 2023 • Haowen Wang, Zhipeng Fan, Zhen Zhao, Zhengping Che, Zhiyuan Xu, Dong Liu, Feifei Feng, Yakun Huang, XIUQUAN QIAO, Jian Tang
We introduce a pose regression module that shares the deformation features and template codes from the fields to estimate the accurate 6D pose of each object in the scene.
no code implementations • 6 Jun 2023 • Haowen Wang, Zhengping Che, Yufan Yang, Mingyuan Wang, Zhiyuan Xu, XIUQUAN QIAO, Mengshi Qi, Feifei Feng, Jian Tang
Raw depth images captured in indoor scenarios frequently exhibit extensive missing values due to the inherent limitations of the sensors and environments.
no code implementations • 23 Mar 2023 • Mingze Wei, Yaomin Huang, Zhiyuan Xu, Ning Liu, Zhengping Che, Xinyu Zhang, Chaomin Shen, Feifei Feng, Chun Shan, Jian Tang
Our work significantly outperforms the state-of-the-art for three-finger robotic hands.
no code implementations • 23 Mar 2023 • Yaomin Huang, Ning Liu, Zhengping Che, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Guixu Zhang, Xinmei Liu, Feifei Feng, Jian Tang
CP$^3$ is elaborately designed to leverage the characteristics of point clouds and PNNs in order to enable 2D channel pruning methods for PNNs.
no code implementations • CVPR 2023 • Yaomin Huang, Ning Liu, Zhengping Che, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Guixu Zhang, Xinmei Liu, Feifei Feng, Jian Tang
Directly implementing the 2D CNN channel pruning methods to PNNs undermine the performance of PNNs because of the different representations of 2D images and 3D point clouds as well as the network architecture disparity.
1 code implementation • 24 Jul 2022 • Yaomin Huang, Xinmei Liu, Yichen Zhu, Zhiyuan Xu, Chaomin Shen, Zhengping Che, Guixu Zhang, Yaxin Peng, Feifei Feng, Jian Tang
Detecting 3D objects from point clouds is a practical yet challenging task that has attracted increasing attention recently.
no code implementations • CVPR 2022 • Haowen Wang, Mingyuan Wang, Zhengping Che, Zhiyuan Xu, XIUQUAN QIAO, Mengshi Qi, Feifei Feng, Jian Tang
In this paper, we design a novel two-branch end-to-end fusion network, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map.
no code implementations • 3 Dec 2021 • Yichen Zhu, Yuqin Zhu, Jie Du, Yi Wang, Zhicai Ou, Feifei Feng, Jian Tang
The TLA enables the ReViT to process the image with the minimum sufficient number of tokens during inference.
no code implementations • 1 Dec 2021 • Yichen Zhu, Jie Du, Yuqin Zhu, Yi Wang, Zhicai Ou, Feifei Feng, Jian Tang
Critically, there is no effort to understand 1) why training BatchNorm only can find the perform-well architectures with the reduced supernet-training time, and 2) what is the difference between the train-BN-only supernet and the standard-train supernet.