HierText

HierText is the first dataset featuring hierarchical annotations of text in natural scenes and documents. The dataset contains 11639 images selected from the Open Images dataset, providing high quality word (~1.2M), line, and paragraph level annotations. Text lines are defined as connected sequences of words that are aligned in spatial proximity and are logically connected. Text lines that belong to the same semantic topic and are geometrically coherent form paragraphs. Images in HierText are rich in text, with average of more than 100 words per image.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Hierarchical Text Segmentation	HierText	Hi-SAM

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Hierarchical Text Segmentation

Similar Datasets

TextSeg

Usage

License

Unknown

HierText

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

TextSeg

Usage

License

Modalities

Languages

HierText

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

TextSeg

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages