Search Results for author: Xianbiao Qi

Found 29 papers, 9 papers with code

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

no code implementations • 18 Mar 2024 • Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi, Qianyu Chen, Bin Liang, Kam-Fai Wong, Lei Zhang

To this end, this paper proposes a novel text-guided video inpainting model that achieves better consistency, controllability and compatibility.

Image Inpainting Video Alignment +2

Paper
Add Code

AcademicGPT: Empowering Academic Research

no code implementations • 21 Nov 2023 • Shufa Wei, Xiaolong Xu, Xianbiao Qi, Xi Yin, Jun Xia, Jingyi Ren, Peijun Tang, Yuxiang Zhong, Yihao Chen, Xiaoqin Ren, Yuxin Liang, Liankai Huang, Kai Xie, Weikang Gui, Wei Tan, Shuanglong Sun, Yongquan Hu, Qinxian Liu, Nanjin Li, Chihao Dai, Lihua Wang, Xiaohui Liu, Lei Zhang, Yutao Xie

Our training corpus mainly consists of academic papers, thesis, content from some academic domain, high-quality Chinese data and others.

General Knowledge Question Answering

Paper
Add Code

TOSS:High-quality Text-guided Novel View Synthesis from a Single Image

no code implementations • 16 Oct 2023 • Yukai Shi, Jianan Wang, He Cao, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang, Shilong Liu, Lei Zhang, Heung-Yeung Shum

In this paper, we present TOSS, which introduces text to the task of novel view synthesis (NVS) from just a single RGB image.

Image-to-Image Translation Novel View Synthesis

Paper
Add Code

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

no code implementations • 5 Sep 2023 • Bojia Zi, Xianbiao Qi, Lingzhi Wang, Jianan Wang, Kam-Fai Wong, Lei Zhang

In this paper, we present Delta-LoRA, which is a novel parameter-efficient approach to fine-tune large language models (LLMs).

Paper
Add Code

DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation

no code implementations • 21 Jun 2023 • Yukun Huang, Jianan Wang, Yukai Shi, Xianbiao Qi, Zheng-Jun Zha, Lei Zhang

Text-to-image diffusion models pre-trained on billions of image-text pairs have recently enabled text-to-3D content creation by optimizing a randomly initialized Neural Radiance Fields (NeRF) with score distillation.

Image Generation Text to 3D

Paper
Add Code

Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant

no code implementations • 15 Jun 2023 • Xianbiao Qi, Jianan Wang, Lei Zhang

This article provides a comprehensive understanding of optimization in deep learning, with a primary focus on the challenges of gradient vanishing and gradient exploding, which normally lead to diminished model representational ability and training instability, respectively.

Paper
Add Code

detrex: Benchmarking Detection Transformers

1 code implementation • 12 Jun 2023 • Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

1,820

Paper
Code

LipsFormer: Introducing Lipschitz Continuity to Vision Transformers

1 code implementation • 19 Apr 2023 • Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi, Lei Zhang

In contrast to previous practical tricks that address training instability by learning rate warmup, layer normalization, attention formulation, and weight initialization, we show that Lipschitz continuity is a more essential property to ensure training stability.

Paper
Code

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

1 code implementation • CVPR 2023 • Yihao Chen, Xianbiao Qi, Jianan Wang, Lei Zhang

In this way, we can reduce the GPU memory consumption of contrastive loss computation from $\bigO(B^2)$ to $\bigO(\frac{B^2}{N})$, where $B$ and $N$ are the batch size and the number of GPUs used for training.

Contrastive Learning

Paper
Code

Exploring Vision Transformers as Diffusion Learners

no code implementations • 28 Dec 2022 • He Cao, Jianan Wang, Tianhe Ren, Xianbiao Qi, Yihao Chen, Yuan YAO, Lei Zhang

We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND).

Paper
Add Code

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

7 code implementations • ICLR 2022 • Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR.

Ranked #11 on 2D Object Detection on SARDet-100K

Object Detection

1,820

Paper
Code

1st Place Solution for ICDAR 2021 Competition on Mathematical Formula Detection

1 code implementation • 12 Jul 2021 • Yuxiang Zhong, Xianbiao Qi, Shanjun Li, Dengyi Gu, Yihao Chen, Peiyang Ning, Rong Xiao

In this technical report, we present our 1st place solution for the ICDAR 2021 competition on mathematical formula detection (MFD).

118

Paper
Code

Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model

no code implementations • 24 Jun 2021 • Yixuan Qiao, Hao Chen, Jun Wang, Yihao Chen, Xianbin Ye, Ziliang Li, Xianbiao Qi, Peng Gao, Guotong Xie

TextVQA requires models to read and reason about text in images to answer questions about them.

Language Modelling Masked Language Modeling +2

Paper
Add Code

PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex

no code implementations • 5 May 2021 • Yelin He, Xianbiao Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, Rong Xiao

This paper presents our solution for the ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX.

Data Augmentation Scene Text Recognition

Paper
Add Code

PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML

2 code implementations • 5 May 2021 • Jiaquan Ye, Xianbiao Qi, Yelin He, Yihao Chen, Dengyi Gu, Peng Gao, Rong Xiao

In our method, we divide the table content recognition task into foursub-tasks: table structure recognition, text line detection, text line recognition, and box assignment. Our table structure recognition algorithm is customized based on MASTER [1], a robust image textrecognition algorithm.

Ranked #1 on Table Recognition on PubTabNet

Line Detection Table Recognition

38,490

Paper
Code

Learning Graph Normalization for Graph Neural Networks

1 code implementation • 24 Sep 2020 • Yihao Chen, Xin Tang, Xianbiao Qi, Chun-Guang Li, Rong Xiao

We conduct extensive experiments on benchmark datasets for different tasks, including node classification, link prediction, graph classification and graph regression, and confirm that the learned graph normalization leads to competitive results and that the learned weights suggest the appropriate normalization techniques for the specific task.

Graph Classification Graph Regression +2

117

Paper
Code

Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition

no code implementations • 23 Sep 2020 • Bingcong Li, Xin Tang, Xianbiao Qi, Yihao Chen, Rong Xiao

Thus, we propose a lightweight scene text recognition model named Hamming OCR.

Optical Character Recognition (OCR) Scene Text Recognition

Paper
Add Code

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

2 code implementations • 16 Apr 2020 • Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, Rong Xiao

Computer vision with state-of-the-art deep learning models has achieved huge success in the field of Optical Character Recognition (OCR) including text detection and recognition tasks recently.

Graph Learning Key Information Extraction +3

541

Paper
Code

Neural Mesh Refiner for 6-DoF Pose Estimation

no code implementations • 17 Mar 2020 • Di Wu, Yihao Chen, Xianbiao Qi, Yongjian Yu, Weixuan Chen, Rong Xiao

We utilise the overlay between the accurate mask prediction and less accurate mesh prediction to iteratively optimise the direct regressed 6D pose information with a focus on translation estimation.

Autonomous Driving Instance Segmentation +4

Paper
Add Code

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

7 code implementations • 7 Oct 2019 • Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, Xiang Bai

Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture.

Scene Text Recognition

4,075

Paper
Code

Self-Supervised Convolutional Subspace Clustering Network

no code implementations • CVPR 2019 • Junjian Zhang, Chun-Guang Li, Chong You, Xianbiao Qi, Honggang Zhang, Jun Guo, Zhouchen Lin

However, the applicability of subspace clustering has been limited because practical visual data in raw form do not necessarily lie in such linear subspaces.

Ranked #2 on Image Clustering on Extended Yale-B

Clustering Image Clustering

Paper
Add Code

Homocentric Hypersphere Feature Embedding for Person Re-identification

no code implementations • 24 Apr 2018 • Wangmeng Xiang, Jianqiang Huang, Xianbiao Qi, Xian-Sheng Hua, Lei Zhang

Person re-identification (Person ReID) is a challenging task due to the large variations in camera viewpoint, lighting, resolution, and human pose.

Person Re-Identification

Paper
Add Code

Face Recognition via Centralized Coordinate Learning

no code implementations • 17 Jan 2018 • Xianbiao Qi, Lei Zhang

Owe to the rapid development of deep neural network (DNN) techniques and the emergence of large scale face databases, face recognition has achieved a great success in recent years.

Classification Face Recognition +1

Paper
Add Code

3D Surface Detail Enhancement From a Single Normal Map

no code implementations • ICCV 2017 • Wuyuan Xie, Miaohui Wang, Xianbiao Qi, Lei Zhang

In 3D reconstruction, the obtained surface details are mainly limited to the visual sensor due to sampling and quantization in the digitalization process.

3D Reconstruction Quantization

Paper
Add Code

Probing the Intra-Component Correlations within Fisher Vector for Material Classification

no code implementations • 15 Apr 2016 • Xiaopeng Hong, Xianbiao Qi, Guoying Zhao, Matti Pietikäinen

Fisher vector (FV) has become a popular image representation.

General Classification Material Classification

Paper
Add Code

HEp-2 Cell Classification: The Role of Gaussian Scale Space Theory as A Pre-processing Approach

no code implementations • 8 Sep 2015 • Xianbiao Qi, Guoying Zhao, Jie Chen, Matti Pietikäinen

We validate the GSS pre-processing under the Local Binary Pattern (LBP) and the Bag-of-Words (BoW) frameworks.

General Classification

Paper
Add Code

LOAD: Local Orientation Adaptive Descriptor for Texture and Material Classification

no code implementations • 22 Apr 2015 • Xianbiao Qi, Guoying Zhao, Linlin Shen, Qingquan Li, Matti Pietikainen

It is worth to mention that we achieve a 65. 4\% classification accuracy-- which is, to the best of our knowledge, the highest record by far --on Flickr Material Database by using a single feature.

General Classification Material Classification +2

Paper
Add Code

HEp-2 Cell Classification via Fusing Texture and Shape Information

no code implementations • 16 Feb 2015 • Xianbiao Qi, Guoying Zhao, Chun-Guang Li, Jun Guo, Matti Pietikäinen

Indirect Immunofluorescence (IIF) HEp-2 cell image is an effective evidence for diagnosis of autoimmune diseases.

Classification General Classification

Paper
Add Code

Dynamic texture and scene classification by transferring deep image features

no code implementations • 1 Feb 2015 • Xianbiao Qi, Chun-Guang Li, Guoying Zhao, Xiaopeng Hong, Matti Pietikäinen

Moreover we explore two different implementations of the TCoF scheme, i. e., the \textit{spatial} TCoF and the \textit{temporal} TCoF, in which the mean-removed frames and the difference between two adjacent frames are used as the inputs of the ConvNet, respectively.

Classification General Classification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.