Search Results for author: Deqiang Jiang

Found 19 papers, 5 papers with code

Hierarchical Multi-label Text Classification with Horizontal and Vertical Category Correlations

no code implementations • EMNLP 2021 • Linli Xu, Sijie Teng, Ruoyu Zhao, Junliang Guo, Chi Xiao, Deqiang Jiang, Bo Ren

Hierarchical multi-label text classification (HMTC) deals with the challenging task where an instance can be assigned to multiple hierarchically structured categories at the same time.

Multi Label Text Classification Multi-Label Text Classification +1

Paper
Add Code

Semantic-Preserving Abstractive Text Summarization with Siamese Generative Adversarial Net

no code implementations • Findings (NAACL) 2022 • Xin Sheng, Linli Xu, Yinlong Xu, Deqiang Jiang, Bo Ren

We propose a novel siamese generative adversarial net for abstractive text summarization (SSPGAN), which can preserve the main semantics of the source text.

Abstractive Text Summarization

Paper
Add Code

HRVDA: High-Resolution Visual Document Assistant

no code implementations • 10 Apr 2024 • Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Linli Xu

In addition, we construct a document-oriented visual instruction tuning dataset and apply a multi-stage training strategy to enhance the model's document modeling capabilities.

document understanding

Paper
Add Code

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

no code implementations • 29 Feb 2024 • Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun

It can represent that the contrastive learning between the visual holistic representations and the multimodal fine-grained features of document objects can assist the vision encoder in acquiring more effective visual cues, thereby enhancing the comprehension of text-rich documents in LVLMs.

Contrastive Learning document understanding

Paper
Add Code

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

2 code implementations • 19 Dec 2023 • Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

Visual Reasoning

8,973

Paper
Code

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

no code implementations • ICCV 2023 • Haoyu Cao, Changcun Bao, Chaohu Liu, Huang Chen, Kun Yin, Hao liu, Yinsong Liu, Deqiang Jiang, Xing Sun

We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation.

document understanding Retrieval +1

Paper
Add Code

Looking and Listening: Audio Guided Text Recognition

1 code implementation • 6 Jun 2023 • Wenwen Yu, MingYu Liu, Biao Yang, Enming Zhang, Deqiang Jiang, Xing Sun, Yuliang Liu, Xiang Bai

Text recognition in the wild is a long-standing problem in computer vision.

Scene Text Recognition

Paper
Code

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

1 code implementation • 12 May 2023 • Jianfeng Kuang, Wei Hua, Dingkang Liang, Mingkun Yang, Deqiang Jiang, Bo Ren, Xiang Bai

We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities.

Contrastive Learning Optical Character Recognition (OCR)

Paper
Code

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

no code implementations • 16 Mar 2023 • Hao liu, Xin Li, Mingming Gong, Bing Liu, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Xing Sun

Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community.

Paper
Add Code

Turning a CLIP Model into a Scene Text Detector

1 code implementation • CVPR 2023 • Wenwen Yu, Yuliang Liu, Wei Hua, Deqiang Jiang, Bo Ren, Xiang Bai

Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection.

Domain Adaptation Scene Text Detection +1

149

Paper
Code

TaCo: Textual Attribute Recognition via Contrastive Learning

no code implementations • 22 Aug 2022 • Chang Nie, Yiqing Hu, Yanqiu Qu, Hao liu, Deqiang Jiang, Bo Ren

To realize this goal, we design the learning paradigm from three perspectives: 1) generating attribute views, 2) extracting subtle but crucial details, and 3) exploiting valued view pairs for learning, to fully unlock the pre-training potential.

Attribute Contrastive Learning

Paper
Add Code

GMN: Generative Multi-modal Network for Practical Document Information Extraction

no code implementations • NAACL 2022 • Haoyu Cao, Jiefeng Ma, Antai Guo, Yiqing Hu, Hao liu, Deqiang Jiang, Yinsong Liu, Bo Ren

Document Information Extraction (DIE) has attracted increasing attention due to its various advanced applications in the real world.

Optical Character Recognition (OCR)

Paper
Add Code

OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification

no code implementations • 4 Jul 2022 • Ye Liu, Lingfeng Qiao, Di Yin, Zhuoxuan Jiang, Xinghua Jiang, Deqiang Jiang, Bo Ren

In this paper, from an alternate perspective to overcome the above challenges, we unite these two tasks into one task by a new form of predicting shots link: a link connects two adjacent shots, indicating that they belong to the same scene or category.

Scene Segmentation

Paper
Add Code

Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation

no code implementations • 22 May 2022 • Jiquan Li, Junliang Guo, Yongxin Zhu, Xin Sheng, Deqiang Jiang, Bo Ren, Linli Xu

The task of Grammatical Error Correction (GEC) has received remarkable attention with wide applications in Natural Language Processing (NLP) in recent years.

Grammatical Error Correction Sentence

Paper
Add Code

Relational Representation Learning in Visually-Rich Documents

no code implementations • 5 May 2022 • Xin Li, Yan Zheng, Yiqing Hu, Haoyu Cao, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Bo Ren

To deal with the unpredictable definition of relations, we propose a novel contrastive learning task named Relational Consistency Modeling (RCM), which harnesses the fact that existing relations should be consistent in differently augmented positive views.

Contrastive Learning Key Information Extraction +3

Paper
Add Code

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

no code implementations • 18 Apr 2022 • Hao liu, Xinghua Jiang, Xin Li, Antai Guo, Deqiang Jiang, Bo Ren

The self-supervised Masked Image Modeling (MIM) schema, following "mask-and-reconstruct" pipeline of recovering contents from masked image, has recently captured the increasing interest in the multimedia community, owing to the excellent ability of learning visual representation from unlabeled data.

Paper
Add Code

Neural Collaborative Graph Machines for Table Structure Recognition

no code implementations • CVPR 2022 • Hao liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren

We also show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues, which is vital for diversified table cases.

Ranked #6 on Table Recognition on PubTabNet

Table Recognition

Paper
Add Code

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition

1 code implementation • CVPR 2022 • Hao liu, Xinghua Jiang, Xin Li, Zhimin Bao, Deqiang Jiang, Bo Ren

For the sake of trade-off between efficiency and performance, a group of works merely perform SA operation within local patches, whereas the global contextual information is abandoned, which would be indispensable for visual recognition tasks.

object-detection Object Detection +1

Paper
Code

PuzzleNet: Scene Text Detection by Segment Context Graph Learning

no code implementations • 26 Feb 2020 • Hao Liu, Antai Guo, Deqiang Jiang, Yiqing Hu, Bo Ren

Recently, a series of decomposition-based scene text detection methods has achieved impressive progress by decomposing challenging text regions into pieces and linking them in a bottom-up manner.

Graph Learning Scene Text Detection +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.