Scene Text Detection
91 papers with code • 9 benchmarks • 15 datasets
Scene Text Detection is a computer vision task that involves automatically identifying and localizing text within natural images or videos. The goal of scene text detection is to develop algorithms that can robustly detect and and label text with bounding boxes in uncontrolled and complex environments, such as street signs, billboards, or license plates.
Source: ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection
Libraries
Use these libraries to find Scene Text Detection models and implementationsDatasets
Most implemented papers
UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World
Synthetic data has been a critical tool for training scene text detection and recognition models.
SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task.
Scene Text Detection with Supervised Pyramid Context Network
We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.
ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views
Hence, we collect and annotate the ShopSign dataset to advance research in Chinese scene text detection and recognition.
PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network
With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations involved, which guarantees high efficiency.
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
We propose an accurate and efficient scene text detection framework, termed FAST (i. e., faster arbitrarily-shaped text detector).
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
End-to-end scene text spotting has attracted great attention in recent years due to the success of excavating the intrinsic synergy of the scene text detection and recognition.
Towards End-to-End Unified Scene Text Detection and Layout Analysis
In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis.
Vision-Language Pre-Training for Boosting Scene Text Detectors
In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.
SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression
In light of this, we constrain the incorporation of segmentation branches to the first few decoder layers and employ progressive regression refinement in subsequent layers, achieving performance gains while minimizing computational load from the mask. Furthermore, we propose a Mask-informed Query Enhancement module.