Search Results for author: Runpei Dong

Found 14 papers, 9 papers with code

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

3 code implementations • 27 Feb 2024 • Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, Kaisheng Ma

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.

Ranked #1 on 3D Question Answering (3D-QA) on 3D MM-Vet

3D Object Captioning 3D Point Cloud Linear Classification +10

113

Paper
Code

DreamLLM: Synergistic Multimodal Comprehension and Creation

1 code implementation • 20 Sep 2023 • Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)

multimodal generation Visual Question Answering +2

305

Paper
Code

VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation

2 code implementations • NeurIPS 2023 • Zekun Qi, Muzhou Yu, Runpei Dong, Kaisheng Ma

VPP leverages structured voxel representation in the proposed Voxel Semantic Generator and the sparsity of unstructured point representation in the Point Upsampler, enabling efficient generation of multi-category objects.

3D Generation 8k

113

Paper
Code

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

no code implementations • 18 Jul 2023 • Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.

Instruction Following Language Modelling +1

Paper
Add Code

CORSD: Class-Oriented Relational Self Distillation

no code implementations • 28 Apr 2023 • Muzhou Yu, Sia Huat Tan, Kailu Wu, Runpei Dong, Linfeng Zhang, Kaisheng Ma

Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data examples; (2) the relational distillation methods are either limited to the handcrafted functions for relation extraction, such as L2 norm, or weak in inter- and intra- class relation modeling.

Knowledge Distillation Model Compression +2

Paper
Add Code

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

no code implementations • 10 Mar 2023 • Chunrui Han, Jinrong Yang, Jianjian Sun, Zheng Ge, Runpei Dong, HongYu Zhou, Weixin Mao, Yuang Peng, Xiangyu Zhang

In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i. e., rich long-term information and efficient fusion pipeline.

motion prediction object-detection +1

Paper
Add Code

CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP

no code implementations • 8 Mar 2023 • Junbo Zhang, Runpei Dong, Kaisheng Ma

Training a 3D scene understanding model requires complicated human annotations, which are laborious to collect and result in a model only encoding close-set object semantics.

Scene Understanding Semantic Segmentation

Paper
Add Code

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

3 code implementations • 5 Feb 2023 • Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, Li Yi

This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms.

Ranked #1 on Zero-Shot Transfer 3D Point Cloud Classification on ModelNet10 (using extra training data)

3D Point Cloud Linear Classification Few-Shot 3D Point Cloud Classification +2

113

Paper
Code

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

3 code implementations • 16 Dec 2022 • Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, Kaisheng Ma

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.

Ranked #5 on Few-Shot 3D Point Cloud Classification on ModelNet40 10-way (10-shot) (using extra training data)

Few-Shot 3D Point Cloud Classification Knowledge Distillation +1

113

Paper
Code

Contrastive Deep Supervision

1 code implementation • 12 Jul 2022 • Linfeng Zhang, Xin Chen, Junbo Zhang, Runpei Dong, Kaisheng Ma

The success of deep learning is usually accompanied by the growth in neural network depth.

Contrastive Learning Fine-Grained Image Classification +3

Paper
Code

Region-aware Knowledge Distillation for Efficient Image-to-Image Translation

no code implementations • 25 May 2022 • Linfeng Zhang, Xin Chen, Runpei Dong, Kaisheng Ma

In this paper, we propose Region-aware Knowledge Distillation ReKo to compress image-to-image translation models.

Contrastive Learning Image-to-Image Translation +1

Paper
Add Code

PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection

1 code implementation • CVPR 2023 • Linfeng Zhang, Runpei Dong, Hung-Shuo Tai, Kaisheng Ma

The remarkable breakthroughs in point cloud representation learning have boosted their usage in real-world applications such as self-driving cars and virtual reality.

3D Object Detection Knowledge Distillation +4

Paper
Code

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

1 code implementation • 30 Dec 2021 • Runpei Dong, Zhanhong Tan, Mengdi Wu, Linfeng Zhang, Kaisheng Ma

Besides, an efficient deployment flow for the mobile CPU is developed, achieving up to 7. 46$\times$ inference acceleration on an octa-core ARM CPU.

Image Classification Model Compression +3

Paper
Code

Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention

1 code implementation • 3 Nov 2021 • Sia Huat Tan, Runpei Dong, Kaisheng Ma

Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high computation and the lack of robustness based on recurrent downsampled attention mechanism.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.