TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Document Layout Analysis	PubLayNet val	DETR	Text	0.947	# 4
Document Layout Analysis	PubLayNet val	DETR	Title	0.918	# 4
Document Layout Analysis	PubLayNet val	DETR	List	0.964	# 3
Document Layout Analysis	PubLayNet val	DETR	Table	0.981	# 1
Document Layout Analysis	PubLayNet val	DETR	Figure	0.975	# 1
Document Layout Analysis	PubLayNet val	DETR	Overall	0.957	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bridging-the-performance-gap-between-detr-and/document-layout-analysis-on-publaynet-val)](https://paperswithcode.com/sota/document-layout-analysis-on-publaynet-val?p=bridging-the-performance-gap-between-detr-and)`

Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

23 Jun 2023 · Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Marcus Liwicki, Muhammad Zeshan Afzal ·

This paper takes an important step in bridging the performance gap between DETR and R-CNN for graphical object detection. Existing graphical object detection approaches have enjoyed recent enhancements in CNN-based object detection methods, achieving remarkable progress. Recently, Transformer-based detectors have considerably boosted the generic object detection performance, eliminating the need for hand-crafted features or post-processing steps such as Non-Maximum Suppression (NMS) using object queries. However, the effectiveness of such enhanced transformer-based detection algorithms has yet to be verified for the problem of graphical object detection. Essentially, inspired by the latest advancements in the DETR, we employ the existing detection transformer with few modifications for graphical object detection. We modify object queries in different ways, using points, anchor boxes and adding positive and negative noise to the anchors to boost performance. These modifications allow for better handling of objects with varying sizes and aspect ratios, more robustness to small variations in object positions and sizes, and improved image discrimination between objects and non-objects. We evaluate our approach on the four graphical datasets: PubTables, TableBank, NTable and PubLaynet. Upon integrating query modifications in the DETR, we outperform prior works and achieve new state-of-the-art results with the mAP of 96.9\%, 95.7\% and 99.3\% on TableBank, PubLaynet, PubTables, respectively. The results from extensive ablations show that transformer-based methods are more effective for document analysis analogous to other applications. We hope this study draws more attention to the research of using detection transformers in document image analysis.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Document Layout Analysis

Object

object-detection

Object Detection

Datasets

MS COCO PubLayNet

TableBank

Results from the Paper

Edit

Ranked #3 on Document Layout Analysis on PubLayNet val

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Document Layout Analysis	PubLayNet val	DETR	Text	0.947	# 4	Compare
			Title	0.918	# 4	Compare
			List	0.964	# 3	Compare
			Table	0.981	# 1	Compare
			Figure	0.975	# 1	Compare
			Overall	0.957	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Detr • Dropout • Feedforward Network • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove