CORD-r

We introduce FUNSD-r and CORD-r in Token Path Prediction, the revised VrD-NER datasets to reflect the real-world scenarios of NER on scanned VrDs.

In FUNSD and CORD, segment layout annotations are aligned with labeled entities, which makes them not reflect the reading order issue of NER on scanned VrDs, and thus are unsuitable for evaluating current methods. In FUNSD-r and CORD-r, we automatically reannotate the layouts using PP-OCRv3 OCR system, and manually reannotate the named entities as word sequences based on the new layout annotations. Their segment layout annotations are aligned with real-world situations and entity mentions are labeled on words.

The proposed CORD-r consists of 999 document samples including the image, layout annotation of segments and words, and labeled entities of 30 categories. For the detailed summary statistics, please refer to the original paper.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Named Entity Recognition (NER)	CORD-r	TPP

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Named Entity Recognition (NER)

Similar Datasets

EPHOIE

FUNSD

CORD

ReadingBank

Usage

License

CC-BY-4.0

Modalities

Images
Texts

Languages

English