Search Results for author: Xiaolin Wei

Found 43 papers, 22 papers with code

SeisFusion: Constrained Diffusion Model with Input Guidance for 3D Seismic Data Interpolation and Reconstruction

1 code implementation • 18 Mar 2024 • Shuang Wang, Fei Deng, Peifan Jiang, Zishan Gong, Xiaolin Wei, Yuqing Wang

In response to this challenge, we propose a novel diffusion model reconstruction framework tailored for 3D seismic data.

Paper
Code

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

1 code implementation • 28 Dec 2023 • Xiangxiang Chu, Limeng Qiao, Xinyang Lin, Shuang Xu, Yang Yang, Yiming Hu, Fei Wei, Xinyu Zhang, Bo Zhang, Xiaolin Wei, Chunhua Shen

We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices.

AutoML Language Modelling

773

Paper
Code

Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding

no code implementations • 2 Nov 2023 • Tianrui Hui, Zihan Ding, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu

Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption.

Object

Paper
Add Code

Orthogonal Temporal Interpolation for Zero-Shot Video Recognition

1 code implementation • 14 Aug 2023 • Yan Zhu, Junbao Zhuo, Bin Ma, Jiajia Geng, Xiaoming Wei, Xiaolin Wei, Shuhui Wang

We propose a model called OTI for ZSVR by employing orthogonal temporal interpolation and the matching loss based on VLMs.

Ranked #1 on Zero-Shot Action Recognition on UCF101

Video Recognition Zero-Shot Action Recognition +2

Paper
Code

Exploration and Exploitation of Unlabeled Data for Open-Set Semi-Supervised Learning

no code implementations • 30 Jun 2023 • Ganlong Zhao, Guanbin Li, Yipeng Qin, Jinjin Zhang, Zhenhua Chai, Xiaolin Wei, Liang Lin, Yizhou Yu

In this paper, we address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples.

Paper
Add Code

3rd Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

no code implementations • 11 Jun 2023 • Jinming Su, Wangwang Yang, Junfeng Luo, Xiaolin Wei

In our solution, we regard the video panoptic segmentation task as a segmentation target querying task, represent both semantic and instance targets as a set of queries, and then combine these queries with video features extracted by neural networks to predict segmentation masks.

Instance Segmentation Segmentation +3

Paper
Add Code

Towards Accurate Post-Training Quantization for Vision Transformer

no code implementations • 25 Mar 2023 • Yifu Ding, Haotong Qin, Qinghua Yan, Zhenhua Chai, Junjie Liu, Xiaolin Wei, Xianglong Liu

We find the main reasons lie in (1) the existing calibration metric is inaccurate in measuring the quantization influence for extremely low-bit representation, and (2) the existing quantization paradigm is unfriendly to the power-law distribution of Softmax.

Model Compression Quantization

Paper
Add Code

Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention

no code implementations • 24 Feb 2023 • Bin Liu, Xiaolin Wei, Bo Li, Junjie Cao, Yu-Kun Lai

In this paper, a novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention.

Attribute Face Model

Paper
Add Code

3D Colored Shape Reconstruction from a Single RGB Image through Diffusion

no code implementations • 11 Feb 2023 • Bo Li, Xiaolin Wei, Fengwei Chen, Bin Liu

In shape prediction module, the reference RGB image is first encoded into a high-level shape feature and then the shape feature is utilized as a condition to predict the reverse geometric noise in diffusion model.

3D Reconstruction 3D Shape Generation +1

Paper
Add Code

Bridging Search Region Interaction With Template for RGB-T Tracking

1 code implementation • CVPR 2023 • Tianrui Hui, Zizheng Xun, Fengguang Peng, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu

To alleviate these limitations, we propose a novel Template-Bridged Search region Interaction (TBSI) module which exploits templates as the medium to bridge the cross-modal interaction between RGB and TIR search regions by gathering and distributing target-relevant object and environment contexts.

Ranked #2 on Rgb-T Tracking on RGBT210

Rgb-T Tracking Template Matching

Paper
Code

Masked Auto-Encoders Meet Generative Adversarial Networks and Beyond

1 code implementation • CVPR 2023 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei

In this paper, we introduce a novel Generative Adversarial Networks alike framework, referred to as GAN-MAE, where a generator is used to generate the masked patches according to the remaining visible patches, and a discriminator is employed to predict whether the patch is synthesized by the generator.

Representation Learning

Paper
Code

Multiple Object Tracking Challenge Technical Report for Team MT_IoT

1 code implementation • 7 Dec 2022 • Feng Yan, Zhiheng Li, Weixin Luo, Zequn Jie, Fan Liang, Xiaolin Wei, Lin Ma

This is a brief technical report of our proposed method for Multiple-Object Tracking (MOT) Challenge in Complex Environments.

Ranked #8 on Multi-Object Tracking on DanceTrack (using extra training data)

Human Detection Multi-Object Tracking +2

Paper
Code

Uncertainty-Aware Image Captioning

no code implementations • 30 Nov 2022 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei

It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it.

Caption Generation Image Captioning +1

Paper
Add Code

Learning Point-Language Hierarchical Alignment for 3D Visual Grounding

1 code implementation • 22 Oct 2022 • Jiaming Chen, Weixin Luo, Ran Song, Xiaolin Wei, Lin Ma, Wei zhang

This paper presents a novel hierarchical alignment model (HAM) that learns multi-granularity visual and linguistic representations in an end-to-end manner.

Sentence Visual Grounding +1

Paper
Code

SegViT: Semantic Segmentation with Plain Vision Transformers

1 code implementation • 12 Oct 2022 • BoWen Zhang, Zhi Tian, Quan Tang, Xiangxiang Chu, Xiaolin Wei, Chunhua Shen, Yifan Liu

We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose the SegVit.

Ranked #4 on Semantic Segmentation on COCO-Stuff test

Segmentation Semantic Segmentation

177

Paper
Code

SoccerNet 2022 Challenges Results

7 code implementations • 5 Oct 2022 • Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

Action Spotting Camera Calibration +3

Paper
Code

Meta-Ensemble Parameter Learning

no code implementations • 5 Oct 2022 • Zhengcong Fei, Shuman Tian, Junshi Huang, Xiaoming Wei, Xiaolin Wei

Knowledge distillation is an approach that allows a single model to efficiently capture the approximate performance of an ensemble while showing poor scalability as demand for re-training when introducing new teacher models.

Knowledge Distillation Meta-Learning

Paper
Add Code

Weakly Supervised Semantic Segmentation via Progressive Patch Learning

1 code implementation • 16 Sep 2022 • Jinlong Li, Zequn Jie, Xu Wang, Yu Zhou, Xiaolin Wei, Lin Ma

"Progressive Patch Learning" further extends the feature destruction and patch learning to multi-level granularities in a progressive manner.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Paper
Code

Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation

1 code implementation • 16 Sep 2022 • Jinlong Li, Zequn Jie, Xu Wang, Xiaolin Wei, Lin Ma

To tackle with this issue, this paper proposes an Expansion and Shrinkage scheme based on the offset learning in the deformable convolution, to sequentially improve the recall and precision of the located object in the two respective stages.

Object Weakly supervised Semantic Segmentation +1

Paper
Code

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

7 code implementations • 7 Sep 2022 • Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, Yiduo Li, Bo Zhang, Yufei Liang, Linyuan Zhou, Xiaoming Xu, Xiangxiang Chu, Xiaoming Wei, Xiaolin Wei

The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios.

Ranked #14 on Object Detection on COCO-O

Object Detection Quantization

12,069

Paper
Code

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

1 code implementation • 11 Aug 2022 • Zihan Ding, Zi-han Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Si Liu

To alleviate these drawbacks, we propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals and outputs panoptic segmentation by simple combination.

Panoptic Segmentation Segmentation +1

Paper
Code

Efficient Modeling of Future Context for Image Captioning

1 code implementation • 22 Jul 2022 • Zhengcong Fei, Junshi Huang, Xiaoming Wei, Xiaolin Wei

Existing approaches to image captioning usually generate the sentence word-by-word from left to right, with the constraint of conditioned on local context including the given image and history generated words.

Image Captioning Sentence +1

Paper
Code

MT-Net Submission to the Waymo 3D Detection Leaderboard

no code implementations • 11 Jul 2022 • Shaoxiang Chen, Zequn Jie, Xiaolin Wei, Lin Ma

In this technical report, we introduce our submission to the Waymo 3D Detection leaderboard.

3D Object Detection

Paper
Add Code

Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images

no code implementations • 27 May 2022 • Zhi Tian, Xiangxiang Chu, Xiaoming Wang, Xiaolin Wei, Chunhua Shen

In this work, we tackle this challenging issue with a novel range view projection mechanism, and for the first time demonstrate the benefits of fusing multi-frame point clouds for a range-view based detector.

3D Object Detection Autonomous Driving +2

Paper
Add Code

Learn to Cluster Faces via Pairwise Classification

no code implementations • ICCV 2021 • Junfu Liu, Di Qiu, Pengfei Yan, Xiaolin Wei

However, they usually suffer from excessive memory consumption especially on large-scale graphs, and rely on empirical thresholds to determine the connectivities between samples in inference, which restricts their applications in various real-world scenes.

Classification Clustering +1

Paper
Add Code

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

2 code implementations • 30 Mar 2022 • Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.

Language Modelling Object

278

Paper
Code

InsCon:Instance Consistency Feature Representation via Self-Supervised Learning

no code implementations • 15 Mar 2022 • Junwei Yang, Ke Zhang, Zhaolin Cui, Jinming Su, Junfeng Luo, Xiaolin Wei

On the other hand, InsCon introduces the pull and push of cell-instance, which utilizes cell consistency to enhance fine-grained feature representation for precise boundary localization.

Contrastive Learning Image Classification +6

Paper
Add Code

Contrastive Attention Network with Dense Field Estimation for Face Completion

no code implementations • 20 Dec 2021 • Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Gengyun Jia, Zhenhua Chai, Xiaolin Wei

This multi-scale architecture is beneficial for the decoder to utilize discriminative representations learned from encoders into images.

Face Recognition Facial Inpainting

Paper
Add Code

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

1 code implementation • 9 Oct 2021 • Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma

Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression.

Image Segmentation Retrieval +2

Paper
Code

Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning

no code implementations • ICCV 2021 • Junkai Huang, Chaowei Fang, Weikai Chen, Zhenhua Chai, Xiaolin Wei, Pengxu Wei, Liang Lin, Guanbin Li

Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.

Binary Classification

Paper
Add Code

Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation

1 code implementation • CVPR 2021 • Tong Wu, Junshi Huang, Guangyu Gao, Xiaoming Wei, Xiaolin Wei, Xuan Luo, Chi Harold Liu

In inference, we directly use the activation masks from the DA layer as pseudo-labels for segmentation.

Segmentation Weakly supervised Semantic Segmentation +1

Paper
Code

Structure Guided Lane Detection

1 code implementation • 12 May 2021 • Jinming Su, Chao Chen, Ke Zhang, Junfeng Luo, Xiaoming Wei, Xiaolin Wei

Next, multi-level structural constraints are used to improve the perception of lanes.

Ranked #29 on Lane Detection on CULane

Autonomous Driving Lane Detection

Paper
Code

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

8 code implementations • NeurIPS 2021 • Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.

Ranked #48 on Semantic Segmentation on ADE20K val

Image Classification Semantic Segmentation

29,789

Paper
Code

Rethinking BiSeNet For Real-time Semantic Segmentation

6 code implementations • CVPR 2021 • Mingyuan Fan, Shenqi Lai, Junshi Huang, Xiaoming Wei, Zhenhua Chai, Junfeng Luo, Xiaolin Wei

BiSeNet has been proved to be a popular two-stream network for real-time segmentation.

Ranked #8 on Real-Time Semantic Segmentation on Cityscapes test

Dichotomous Image Segmentation Image Classification +3

8,260

Paper
Code

Large Scale Visual Food Recognition

no code implementations • 30 Mar 2021 • Weiqing Min, Zhiling Wang, Yuxin Liu, Mengjiang Luo, Liping Kang, Xiaoming Wei, Xiaolin Wei, Shuqiang Jiang

Food2K can be further explored to benefit more food-relevant tasks including emerging and more complex ones (e. g., nutritional understanding of food), and the trained models on Food2K can be expected as backbones to improve the performance of more food-relevant tasks.

Fine-Grained Visual Recognition Food Recognition +3

Paper
Add Code

Rethinking the Optimization of Average Precision: Only Penalizing Negative Instances before Positive Ones is Enough

2 code implementations • 9 Feb 2021 • Zhuo Li, Weiqing Min, Jiajun Song, Yaohui Zhu, Liping Kang, Xiaoming Wei, Xiaolin Wei, Shuqiang Jiang

Limited by the definition of AP, such methods consider both negative and positive instances ranking before each positive instance.

Ranked #3 on Vehicle Re-Identification on VehicleID Large

Image Retrieval Retrieval +1

Paper
Code

Scene Text Detection with Scribble Lines

no code implementations • 9 Dec 2020 • Wenqing Zhang, Yang Qiu, Minghui Liao, Rui Zhang, Xiaolin Wei, Xiang Bai

It is a general labeling method for texts with various shapes and requires low labeling costs.

Scene Text Detection Text Detection

Paper
Add Code

Free-Form Image Inpainting via Contrastive Attention Network

no code implementations • 29 Oct 2020 • Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei, Ran He

It is difficult for encoders to capture such powerful representations under this complex situation.

Image Inpainting

Paper
Add Code

DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

1 code implementation • ICLR 2021 • Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun Lu, Xiaolin Wei, Junchi Yan

We call this approach DARTS-.

Ranked #20 on Neural Architecture Search on NAS-Bench-201, CIFAR-10

Neural Architecture Search

Paper
Code

Query Twice: Dual Mixture Attention Meta Learning for Video Summarization

no code implementations • 19 Aug 2020 • Junyan Wang, Yang Bai, Yang Long, Bingzhang Hu, Zhenhua Chai, Yu Guan, Xiaolin Wei

Video summarization aims to select representative frames to retain high-level information, which is usually solved by predicting the segment-wise importance score via a softmax function.

Meta-Learning Video Summarization

Paper
Add Code

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

no code implementations • 13 Aug 2020 • Weiqing Min, Linhu Liu, Zhiling Wang, Zhengdong Luo, Xiaoming Wei, Xiaolin Wei, Shuqiang Jiang

To encourage further progress in food recognition, we introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399, 726 images, a more comprehensive food dataset that surpasses existing popular benchmark datasets by category coverage and data volume.

Food Recognition Management

Paper
Add Code

FedOCR: Communication-Efficient Federated Learning for Scene Text Recognition

no code implementations • 22 Jul 2020 • Wenqing Zhang, Yang Qiu, Song Bai, Rui Zhang, Xiaolin Wei, Xiang Bai

In this paper, we study how to make use of decentralized datasets for training a robust scene text recognizer while keeping them stay on local devices.

Federated Learning Privacy Preserving +1

Paper
Add Code

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

no code implementations • 5 Apr 2020 • Qi Song, Qianyi Jiang, Nan Li, Rui Zhang, Xiaolin Wei

In this paper, we elaborately design a Rectified Attentional Double Supervised Network (ReADS) for general scene text recognition.

Scene Text Recognition valid

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.