Search Results for author: Zhuofan Xia

Found 8 papers, 7 papers with code

GSVA: Generalized Segmentation via Multimodal Large Language Models

no code implementations • 15 Dec 2023 • Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang

Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image.

Decoder Generalized Referring Expression Segmentation +2

Paper
Add Code

Agent Attention: On the Integration of Softmax and Linear Attention

2 code implementations • 14 Dec 2023 • Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Shiji Song, Gao Huang

Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module.

Computational Efficiency Image Classification +4

347

Paper
Code

Generalized Activation via Multivariate Projection

1 code implementation • 29 Sep 2023 • Jiayun Li, Yuxiao Cheng, Yiwen Lu, Zhuofan Xia, Yilin Mo, Gao Huang

Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness.

Paper
Code

DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

1 code implementation • 4 Sep 2023 • Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

On the one hand, using dense attention in ViT leads to excessive memory and computational cost, and features can be influenced by irrelevant parts that are beyond the region of interests.

Ranked #4 on Object Detection on COCO 2017

Image Classification Instance Segmentation +2

712

Paper
Code

Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention

1 code implementation • CVPR 2023 • Xuran Pan, Tianzhu Ye, Zhuofan Xia, Shiji Song, Gao Huang

Self-attention mechanism has been a key factor in the recent progress of Vision Transformer (ViT), which enables adaptive feature extraction from global contexts.

feature selection Inductive Bias

149

Paper
Code

Adaptive Rotated Convolution for Rotated Object Detection

1 code implementation • ICCV 2023 • Yifan Pu, Yiru Wang, Zhuofan Xia, Yizeng Han, Yulin Wang, Weihao Gan, Zidong Wang, Shiji Song, Gao Huang

In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images, and an efficient conditional computation mechanism is introduced to accommodate the large orientation variations of objects within an image.

Ranked #3 on Object Detection In Aerial Images on DOTA (using extra training data)

Object object-detection +2

Paper
Code

Vision Transformer with Deformable Attention

2 code implementations • CVPR 2022 • Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

On the one hand, using dense attention e. g., in ViT, leads to excessive memory and computational cost, and features can be influenced by irrelevant parts which are beyond the region of interests.

Ranked #107 on Object Detection on COCO test-dev

Image Classification Object Detection +1

712

Paper
Code

3D Object Detection with Pointformer

1 code implementation • CVPR 2021 • Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, Gao Huang

In this paper, we propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.

3D Object Detection Object +2

154

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.