Search Results for author: Chunrui Han

Found 14 papers, 6 papers with code

Focus Anywhere for Fine-grained Multi-page Document Understanding

1 code implementation • 23 May 2024 • Chenglong Liu, Haoran Wei, Jinyue Chen, Lingyu Kong, Zheng Ge, Zining Zhu, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Modern LVLMs still struggle to achieve fine-grained document understanding, such as OCR/translation/caption for regions of interest to the user, tasks that require the context of the entire page, or even multiple pages.

document understanding Optical Character Recognition (OCR)

1,600

Paper
Code

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

1 code implementation • 15 Apr 2024 • Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information.

Decoder

Paper
Code

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

3 code implementations • 27 Feb 2024 • Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, Kaisheng Ma

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.

Ranked #1 on 3D Point Cloud Linear Classification on ModelNet40

3D Object Captioning 3D Point Cloud Linear Classification +10

114

Paper
Code

Small Language Model Meets with Reinforced Vision Vocabulary

no code implementations • 23 Jan 2024 • Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, En Yu, Jianjian Sun, Chunrui Han, Xiangyu Zhang

In Vary-toy, we introduce an improved vision vocabulary, allowing the model to not only possess all features of Vary but also gather more generality.

Ranked #81 on Visual Question Answering on MM-Vet

Language Modelling Large Language Model +3

Paper
Add Code

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

1 code implementation • 11 Dec 2023 • Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Accordingly, we propose Vary, an efficient and effective method to scale up the vision vocabulary of LVLMs.

Ranked #56 on Visual Question Answering on MM-Vet

Decoder Optical Character Recognition (OCR) +1

1,600

Paper
Code

DreamLLM: Synergistic Multimodal Comprehension and Creation

1 code implementation • 20 Sep 2023 • Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

Ranked #2 on Visual Question Answering on MMBench (GPT-3.5 score metric)

multimodal generation Visual Question Answering +2

326

Paper
Code

SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers

no code implementations • 14 Aug 2023 • Xijun Wang, Xiaojie Chu, Chunrui Han, Xiangyu Zhang

This paper presents a module, Spatial Cross-scale Convolution (SCSC), which is verified to be effective in improving both CNNs and Transformers.

Face Recognition

Paper
Add Code

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

no code implementations • 18 Jul 2023 • Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.

Instruction Following Language Modelling +1

Paper
Add Code

GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping

no code implementations • 18 Jul 2023 • Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Hengshuang Zhao, Xiangyu Zhang

Besides, GroupLane with ResNet18 still surpasses PersFormer by 4. 9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13. 3% of it.

3D Lane Detection

Paper
Add Code

The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge

1 code implementation • 16 Jun 2023 • Dongming Wu, Fan Jia, Jiahao Chang, Zhuoling Li, Jianjian Sun, Chunrui Han, Shuailin Li, Yingfei Liu, Zheng Ge, Tiancai Wang

We present the 1st-place solution of OpenLane Topology in Autonomous Driving Challenge.

Autonomous Driving

124

Paper
Code

Triplet Knowledge Distillation

no code implementations • 25 May 2023 • Xijun Wang, Dongyang Liu, Meina Kan, Chunrui Han, Zhongqin Wu, Shiguang Shan

Distillation then begins in an online manner, and the teacher is only allowed to express solutions within the aforementioned subspace.

Face Recognition Image Classification +1

Paper
Add Code

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

no code implementations • 10 Mar 2023 • Chunrui Han, Jinrong Yang, Jianjian Sun, Zheng Ge, Runpei Dong, HongYu Zhou, Weixin Mao, Yuang Peng, Xiangyu Zhang

In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i. e., rich long-term information and efficient fusion pipeline.

motion prediction object-detection +1

Paper
Add Code

Meta-Learning with Individualized Feature Space for Few-Shot Classification

no code implementations • 27 Sep 2018 • Chunrui Han, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen

Specifically, we introduce a kernel generator as meta-learner to learn to construct feature embedding for query images.

Classification Meta-Learning +1

Paper
Add Code

Face Recognition with Contrastive Convolution

no code implementations • ECCV 2018 • Chunrui Han, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen

In current face recognition approaches with convolutional neural network (CNN), a pair of faces to compare are independently fed into the CNN for feature extraction.

Face Recognition Face Verification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.