no code implementations • 24 Feb 2024 • Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
1 code implementation • 21 Feb 2024 • Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai
By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance.
1 code implementation • 12 May 2023 • Jianfeng Kuang, Wei Hua, Dingkang Liang, Mingkun Yang, Deqiang Jiang, Bo Ren, Xiang Bai
We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities.
1 code implementation • 1 Jul 2022 • Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai
Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method.
no code implementations • CVPR 2022 • Jingqun Tang, Wenqing Zhang, Hongye Liu, Mingkun Yang, Bo Jiang, Guanglong Hu, Xiang Bai
Different from previous approaches that learn robust deep representations of scene text in a holistic manner, our method performs scene text detection based on a few representative features, which avoids the disturbance by background and reduces the computational cost.
Ranked #21 on Object Detection In Aerial Images on DOTA (using extra training data)
no code implementations • CVPR 2021 • Jing Wang, Jinhui Tang, Mingkun Yang, Xiang Bai, Jiebo Luo
Under the guidance of the geometrical relationship between OCR tokens, our LSTM-R capitalizes on a newly-devised relation-aware pointer network to select OCR tokens from the scene text for OCR-based image captioning.
no code implementations • 20 May 2021 • Ran Zhu, Mingkun Yang, Wang Liu, Rujun Song, Bo Yan, Zhuoling Xiao
The technology for Visual Odometry (VO) that estimates the position and orientation of the moving object through analyzing the image sequences captured by on-board cameras, has been well investigated with the rising interest in autonomous driving.
1 code implementation • CVPR 2021 • Hao Wang, Xiang Bai, Mingkun Yang, Shenggao Zhu, Jing Wang, Wenyu Liu
Such a task is usually realized by matching a query text to the recognized words, outputted by an end-to-end scene text spotter.
1 code implementation • ECCV 2020 • Hui Zhang, Quanming Yao, Mingkun Yang, Yongchao Xu, Xiang Bai
In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance.
no code implementations • 20 Dec 2019 • Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar
21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.
no code implementations • 21 Nov 2019 • Hao Wang, Pu Lu, HUI ZHANG, Mingkun Yang, Xiang Bai, Yongchao Xu, Mengchao He, Yongpan Wang, Wenyu Liu
Recently, end-to-end text spotting that aims to detect and recognize text from cluttered images simultaneously has received particularly growing interest in computer vision.
no code implementations • ICCV 2019 • MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai
Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.
3 code implementations • good 2018 • Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai
SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications.
Ranked #21 on Scene Text Recognition on ICDAR2015
Optical Character Recognition Optical Character Recognition (OCR) +1
1 code implementation • 29 Nov 2017 • Xiang Bai, Mingkun Yang, Tengteng Huang, Zhiyong Dou, Rui Yu, Yongchao Xu
Recently, many methods of person re-identification (Re-ID) rely on part-based feature representation to learn a discriminative pedestrian descriptor.
5 code implementations • 31 Aug 2017 • Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, Xiang Bai
This report introduces RCTW, a new competition that focuses on Chinese text reading.
no code implementations • 15 Apr 2017 • Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu, Jiebo Luo
Then, we combine the word embedding of the recognized words and the deep visual features into a single representation, which is optimized by a convolutional neural network for fine-grained image classification.