POIE (Products for OCR and Information Extraction)

Products for OCR and Information Extraction (POIE) dataset derives from camera images of various products in the real world. The images are carefully selected and manually annotated. Our labeling team consists of 8 experienced labelers. We first crop the nutrition tables from product images and adopt multiple commercial OCR engines (Azure and Baidu OCR) for pre-labeling. Then we use LabelMe to manually check the annotation of the location as well as transcription of every text box, and the values of entities for all the text in the images and repaired the OCR errors found. After discarding low-quality and blurred images, we obtain 3,000 images with 111,155 text instances.

from https://github.com/jfkuang/cfam

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • Unknown

Modalities


Languages