CORD-r

We introduce FUNSD-r and CORD-r in Token Path Prediction, the revised VrD-NER datasets to reflect the real-world scenarios of NER on scanned VrDs.

In FUNSD and CORD, segment layout annotations are aligned with labeled entities, which makes them not reflect the reading order issue of NER on scanned VrDs, and thus are unsuitable for evaluating current methods. In FUNSD-r and CORD-r, we automatically reannotate the layouts using PP-OCRv3 OCR system, and manually reannotate the named entities as word sequences based on the new layout annotations. Their segment layout annotations are aligned with real-world situations and entity mentions are labeled on words.

The proposed CORD-r consists of 999 document samples including the image, layout annotation of segments and words, and labeled entities of 30 categories. For the detailed summary statistics, please refer to the original paper.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • CC-BY-4.0

Modalities


Languages