Search Results for author: Tengchao Lv

Found 14 papers, 8 papers with code

XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding

no code implementations • Findings (ACL) 2022 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.

document understanding

Paper
Add Code

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

no code implementations • 28 Nov 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text.

Language Modelling Large Language Model +1

Paper
Add Code

Kosmos-2.5: A Multimodal Literate Model

no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei

We present Kosmos-2. 5, a multimodal literate model for machine reading of text-intensive images.

Reading Comprehension Text Generation

Paper
Add Code

TextDiffuser: Diffusion Models as Text Painters

no code implementations • NeurIPS 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text.

Optical Character Recognition (OCR)

Paper
Add Code

Language Is Not All You Need: Aligning Perception with Language Models

1 code implementation • NeurIPS 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.

Image Captioning Language Modelling +4

18,434

Paper
Code

XDoc: Unified Pre-training for Cross-Format Document Understanding

1 code implementation • 6 Oct 2022 • Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei

The surge of pre-training has witnessed the rapid development of document understanding recently.

Ranked #7 on Semantic entity labeling on FUNSD

document understanding Semantic entity labeling

18,450

Paper
Code

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

2 code implementations • 18 Apr 2022 • Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

Ranked #1 on Key Information Extraction on EPHOIE

Document AI Document Image Classification +10

125,725

Paper
Code

DiT: Self-supervised Pre-training for Document Image Transformer

3 code implementations • 4 Mar 2022 • Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

Ranked #1 on Table Detection on ICDAR 2019

Document AI Document Image Classification +4

125,725

Paper
Code

Document AI: Benchmarks, Models and Applications

no code implementations • 16 Nov 2021 • Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei

Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents.

Document AI Document Image Classification +3

Paper
Add Code

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

2 code implementations • 21 Sep 2021 • Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei

Text recognition is a long-standing research problem for document digitalization.

Ranked #3 on Handwritten Text Recognition on IAM

Handwritten Text Recognition Language Modelling +4

125,725

Paper
Code

VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization

1 code implementation • 10 Jun 2021 • Tengchao Lv, Lei Cui, Momcilo Vasilijevic, Furu Wei

Video transcript summarization is a fundamental task for video understanding.

Segmentation Text Summarization +1

Paper
Code

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

6 code implementations • 18 Apr 2021 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Ranked #13 on Document Image Classification on RVL-CDIP

Document Image Classification document understanding

125,725

Paper
Code

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

5 code implementations • ACL 2021 • Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Ranked #1 on Key Information Extraction on SROIE

Document Image Classification Document Layout Analysis +6

125,725

Paper
Code

Hierarchical Attention Prototypical Networks for Few-Shot Text Classification

no code implementations • IJCNLP 2019 • Shengli Sun, Qingfeng Sun, Kevin Zhou, Tengchao Lv

Most of the current effective methods for text classification tasks are based on large-scale labeled data and a great number of parameters, but when the supervised training data are few and difficult to be collected, these models are not available.

Few-Shot Text Classification General Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.