TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Hierarchical Text Segmentation	HierText	Hi-SAM	F-score (average)	81.87	# 1
Hierarchical Text Segmentation	HierText	Hi-SAM	F-score (stroke)	83.36	# 1
Hierarchical Text Segmentation	HierText	Hi-SAM	F-score (word)	82.86	# 1
Hierarchical Text Segmentation	HierText	Hi-SAM	F-score (text-line)	85.30	# 1
Hierarchical Text Segmentation	HierText	Hi-SAM	F-score (para., layout)	75.97	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hi-sam-marrying-segment-anything-model-for/hierarchical-text-segmentation-on-hiertext)](https://paperswithcode.com/sota/hierarchical-text-segmentation-on-hiertext?p=hi-sam-marrying-segment-anything-model-for)`

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

31 Jan 2024 · Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, BaoCai Yin, Cong Liu, Bo Du, DaCheng Tao ·

The Segment Anything Model (SAM), a profound vision foundation model pre-trained on a large-scale dataset, breaks the boundaries of general segmentation and sparks various downstream applications. This paper introduces Hi-SAM, a unified model leveraging SAM for hierarchical text segmentation. Hi-SAM excels in text segmentation across four hierarchies, including stroke, word, text-line, and paragraph, while realizing layout analysis as well. Specifically, we first turn SAM into a high-quality text stroke segmentation (TSS) model through a parameter-efficient fine-tuning approach. We use this TSS model to iteratively generate the text stroke labels in a semi-automatical manner, unifying labels across the four text hierarchies in the HierText dataset. Subsequently, with these complete labels, we launch the end-to-end trainable Hi-SAM based on the TSS architecture with a customized hierarchical mask decoder. During inference, Hi-SAM offers both automatic mask generation (AMG) mode and promptable segmentation mode. In terms of the AMG mode, Hi-SAM segments text stroke foreground masks initially, then samples foreground points for hierarchical text mask generation and achieves layout analysis in passing. As for the promptable mode, Hi-SAM provides word, text-line, and paragraph masks with a single point click. Experimental results show the state-of-the-art performance of our TSS model: 84.86% fgIOU on Total-Text and 88.96% fgIOU on TextSeg for text stroke segmentation. Moreover, compared to the previous specialist for joint hierarchical detection and layout analysis on HierText, Hi-SAM achieves significant improvements: 4.73% PQ and 5.39% F1 on the text-line level, 5.49% PQ and 7.39% F1 on the paragraph level layout analysis, requiring 20x fewer training epochs. The code is available at https://github.com/ymy-k/Hi-SAM.

PDF Abstract

Code

Add Remove Mark official

ymy-k/hi-sam official

127

Tasks

Add Remove

Hierarchical Text Segmentation

Segmentation

Text Segmentation

Datasets

Total-Text

TextSeg HierText

Results from the Paper

Add Remove

Ranked #1 on Hierarchical Text Segmentation on HierText

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Hierarchical Text Segmentation	HierText	Hi-SAM	F-score (average)	81.87	# 1	Compare
			F-score (stroke)	83.36	# 1	Compare
			F-score (word)	82.86	# 1	Compare
			F-score (text-line)	85.30	# 1	Compare
			F-score (para., layout)	75.97	# 1	Compare

Methods

Add Remove

SAM

Edit Social Preview

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove