ST-VQA aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the VQA process.
74 PAPERS • NO BENCHMARKS YET
TextOCR is a dataset to benchmark text recognition on arbitrary shaped scene-text. TextOCR requires models to perform text-recognition on arbitrary shaped scene-text present on natural images. TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning.
21 PAPERS • NO BENCHMARKS YET
Twitter100k is a large-scale dataset for weakly supervised cross-media retrieval.
4 PAPERS • NO BENCHMARKS YET
5 domains: synthetic domain, document domain, street view domain, handwritten domain, and car license domain over five million images
2 PAPERS • 2 BENCHMARKS
Optical images of printed circuit boards as well as detailed annotations of any text, logos, and surface-mount devices (SMDs). There are several hundred samples spanning a wide variety of manufacturing locations, sizes, node technology, applications, and more.
1 PAPER • NO BENCHMARKS YET
The UTRSet-Synth dataset is introduced as a complementary training resource to the UTRSet-Real Dataset, specifically designed to enhance the effectiveness of Urdu OCR models. It is a high-quality synthetic dataset comprising 20,000 lines that closely resemble real-world representations of Urdu text.