Search Results for author: Xuran Pan

Found 14 papers, 10 papers with code

GSVA: Generalized Segmentation via Multimodal Large Language Models

no code implementations15 Dec 2023 Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang

Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image.

Generalized Referring Expression Segmentation Referring Expression +1

DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

1 code implementation4 Sep 2023 Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

On the one hand, using dense attention in ViT leads to excessive memory and computational cost, and features can be influenced by irrelevant parts that are beyond the region of interests.

Image Classification Instance Segmentation +2

FLatten Transformer: Vision Transformer using Focused Linear Attention

1 code implementation ICCV 2023 Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, Gao Huang

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks.

Dynamic Perceiver for Efficient Visual Recognition

1 code implementation ICCV 2023 Yizeng Han, Dongchen Han, Zeyu Liu, Yulin Wang, Xuran Pan, Yifan Pu, Chao Deng, Junlan Feng, Shiji Song, Gao Huang

Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.

Action Recognition Classification +4

Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention

1 code implementation CVPR 2023 Xuran Pan, Tianzhu Ye, Zhuofan Xia, Shiji Song, Gao Huang

Self-attention mechanism has been a key factor in the recent progress of Vision Transformer (ViT), which enables adaptive feature extraction from global contexts.

feature selection Inductive Bias

Joint Representation Learning for Text and 3D Point Cloud

no code implementations18 Jan 2023 Rui Huang, Xuran Pan, Henry Zheng, Haojun Jiang, Zhifeng Xie, Shiji Song, Gao Huang

During the pre-training stage, we establish the correspondence of images and point clouds based on the readily available RGB-D data and use contrastive learning to align the image and point cloud representations.

Contrastive Learning Instance Segmentation +4

Contrastive Language-Image Pre-Training with Knowledge Graphs

no code implementations17 Oct 2022 Xuran Pan, Tianzhu Ye, Dongchen Han, Shiji Song, Gao Huang

Recent years have witnessed the fast development of large-scale pre-training frameworks that can extract multi-modal representations in a unified form and achieve promising performances when transferred to downstream tasks.

Knowledge Graphs

ActiveNeRF: Learning where to See with Uncertainty Estimation

1 code implementation18 Sep 2022 Xuran Pan, Zihang Lai, Shiji Song, Gao Huang

In this paper, we present a novel learning framework, ActiveNeRF, aiming to model a 3D scene with a constrained input budget.

Active Learning Novel View Synthesis

Vision Transformer with Deformable Attention

2 code implementations CVPR 2022 Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

On the one hand, using dense attention e. g., in ViT, leads to excessive memory and computational cost, and features can be influenced by irrelevant parts which are beyond the region of interests.

Image Classification Object Detection +1

On the Integration of Self-Attention and Convolution

2 code implementations CVPR 2022 Xuran Pan, Chunjiang Ge, Rui Lu, Shiji Song, Guanfu Chen, Zeyi Huang, Gao Huang

In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation.

Representation Learning

A Unified Framework for Convolution-based Graph Neural Networks

no code implementations1 Jan 2021 Xuran Pan, Shiji Song, Gao Huang

In this paper, we take a step forward to establish a unified framework for convolution-based graph neural networks, by formulating the basic graph convolution operation as an optimization problem in the graph Fourier space.

3D Object Detection with Pointformer

1 code implementation CVPR 2021 Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, Gao Huang

In this paper, we propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.

3D Object Detection Object +2

Regularizing Deep Networks with Semantic Data Augmentation

1 code implementation21 Jul 2020 Yulin Wang, Gao Huang, Shiji Song, Xuran Pan, Yitong Xia, Cheng Wu

The proposed method is inspired by the intriguing property that deep networks are effective in learning linearized features, i. e., certain directions in the deep feature space correspond to meaningful semantic transformations, e. g., changing the background or view angle of an object.

Data Augmentation

Implicit Semantic Data Augmentation for Deep Networks

1 code implementation NeurIPS 2019 Yulin Wang, Xuran Pan, Shiji Song, Hong Zhang, Cheng Wu, Gao Huang

Our work is motivated by the intriguing property that deep networks are surprisingly good at linearizing features, such that certain directions in the deep feature space correspond to meaningful semantic transformations, e. g., adding sunglasses or changing backgrounds.

Image Augmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.