Search Results for author: Cong Yao

Found 57 papers, 30 papers with code

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

1 code implementation28 Mar 2024 Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

no code implementations20 Mar 2024 Yuyi Zhang, Yuanzhi Zhu, Dezhi Peng, Peirong Zhang, Zhenhua Yang, Zhibo Yang, Cong Yao, Lianwen Jin

Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary.

Zero-Shot Learning

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

no code implementations3 Jan 2024 Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang

We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network.

regression

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

1 code implementation19 Dec 2023 Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin

Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images.

Contrastive Learning Denoising +3

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

1 code implementation19 Oct 2023 Cong Yao

In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines.

Document Layout Analysis document understanding +4

Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition

1 code implementation25 Jul 2023 Cheng Da, Peng Wang, Cong Yao

Specifically, MGP-STR achieves an average recognition accuracy of $94\%$ on standard benchmarks for scene text recognition.

Language Modelling Optical Character Recognition (OCR) +1

Conditional Text Image Generation with Diffusion Models

no code implementations CVPR 2023 Yuanzhi Zhu, Zhaohai Li, Tianwei Wang, Mengchao He, Cong Yao

Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating enough real text images.

Domain Adaptation Image Generation

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

1 code implementation CVPR 2023 Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao

Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.

Document AI entity_extraction +3

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

no code implementations CVPR 2023 Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao

As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common.

Text Spotting

LORE: Logical Location Regression Network for Table Structure Recognition

1 code implementation7 Mar 2023 Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats.

regression Table Recognition

Levenshtein OCR

2 code implementations8 Sep 2022 Cheng Da, Peng Wang, Cong Yao

A novel scene text recognizer based on Vision-Language Transformer (VLT) is presented.

Imitation Learning Optical Character Recognition (OCR) +1

Multi-Granularity Prediction for Scene Text Recognition

2 code implementations8 Sep 2022 Peng Wang, Cheng Da, Cong Yao

In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods.

 Ranked #1 on Scene Text Recognition on Uber-Text (using extra training data)

Language Modelling Optical Character Recognition (OCR) +1

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding

no code implementations27 Jun 2022 Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen Jin, Chenliang Li, Yang Xue, Luo Si

Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks.

Document Classification document understanding +2

Vision-Language Pre-Training for Boosting Scene Text Detectors

2 code implementations CVPR 2022 Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao

In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.

Contrastive Learning Language Modelling +4

Revisiting Document Image Dewarping by Grid Regularization

no code implementations CVPR 2022 Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Gui-Song Xia

This paper addresses the problem of document image dewarping, which aims at eliminating the geometric distortion in document images for document digitization.

Local Distortion Optical Flow Estimation

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

5 code implementations21 Feb 2022 Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai

By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.

Binarization Model Optimization +3

Facial Attribute Transformers for Precise and Robust Makeup Transfer

no code implementations7 Apr 2021 Zhaoyi Wan, Haoran Chen, Jielei Zhang, Wentao Jiang, Cong Yao, Jiebo Luo

In this paper, we address the problem of makeup transfer, which aims at transplanting the makeup from the reference face to the source face while preserving the identity of the source.

Attribute Face Generation

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

no code implementations CVPR 2021 Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai

Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.

Scene Text Detection Text Detection

Slender Object Detection: Diagnoses and Improvements

1 code implementation17 Nov 2020 Zhaoyi Wan, Yimin Chen, Sutao Deng, Kunpeng Chen, Cong Yao, Jiebo Luo

In this paper, we are concerned with the detection of a particular type of objects with extreme aspect ratios, namely \textbf{slender objects}.

Object object-detection +1

Differentiable Feature Aggregation Search for Knowledge Distillation

no code implementations ECCV 2020 Yushuo Guan, Pengyu Zhao, Bingxuan Wang, Yuanxing Zhang, Cong Yao, Kaigui Bian, Jian Tang

To tackle with both the efficiency and the effectiveness of knowledge distillation, we introduce the feature aggregation to imitate the multi-teacher distillation in the single-teacher distillation framework by extracting informative supervision from multiple teacher feature maps.

Knowledge Distillation Model Compression +1

On Vocabulary Reliance in Scene Text Recognition

no code implementations CVPR 2020 Zhaoyi Wan, Jielei Zhang, Liang Zhang, Jiebo Luo, Cong Yao

This remedy alleviates the problem of vocabulary reliance and improves the overall scene text recognition performance.

Scene Text Recognition

A New Perspective for Flexible Feature Gathering in Scene Text Recognition Via Character Anchor Pooling

no code implementations10 Feb 2020 Shangbang Long, Yushuo Guan, Kaigui Bian, Cong Yao

Irregular scene text recognition has attracted much attention from the research community, mainly due to the complexity of shapes of text in natural scene.

Scene Text Recognition

Real-time Scene Text Detection with Differentiable Binarization

15 code implementations20 Nov 2019 Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.

Binarization Optical Character Recognition (OCR) +3

Rethinking Irregular Scene Text Recognition

1 code implementation30 Aug 2019 Shangbang Long, Yushuo Guan, Bingxuan Wang, Kaigui Bian, Cong Yao

Reading text from natural images is challenging due to the great variety in text font, color, size, complex background and etc..

Scene Text Detection

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation ECCV 2018 Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Scene Text Recognition Semantic Segmentation +2

Symmetry-constrained Rectification Network for Scene Text Recognition

no code implementations ICCV 2019 MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai

Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.

Scene Text Recognition

2D-CTC for Scene Text Recognition

no code implementations23 Jul 2019 Zhaoyi Wan, Fengming Xie, Yibo Liu, Xiang Bai, Cong Yao

Scene text recognition has been an important, active research topic in computer vision for years.

Scene Text Recognition speech-recognition +1

SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds

1 code implementation13 Jul 2019 Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai

Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety.

Image Generation Scene Text Detection +1

Scene Text Detection with Supervised Pyramid Context Network

2 code implementations21 Nov 2018 Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li

We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.

Instance Segmentation Scene Text Detection +2

Scene Text Detection and Recognition: The Deep Learning Era

1 code implementation10 Nov 2018 Shangbang Long, Xin He, Cong Yao

As an important research area in computer vision, scene text detection and recognition has been inescapably influenced by this wave of revolution, consequentially entering the era of deep learning.

Scene Text Detection Text Detection

Scene Text Recognition from Two-Dimensional Perspective

no code implementations18 Sep 2018 Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai

Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.

Scene Text Recognition Semantic Segmentation +4

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

3 code implementations ECCV 2018 Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, Cong Yao

Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks.

Curved Text Detection Text Detection

Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis

no code implementations27 Jun 2017 Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, Wenyu Liu

In this paper, we investigate the Chinese calligraphy synthesis problem: synthesizing Chinese calligraphy images with specified style from standard font(eg.

Image-to-Image Translation Translation

Point Linking Network for Object Detection

no code implementations12 Jun 2017 Xinggang Wang, Kaibing Chen, Zilong Huang, Cong Yao, Wenyu Liu

The deep ConvNets based object detectors mainly focus on regressing the coordinates of bounding box, e. g., Faster-R-CNN, YOLO and SSD.

Object object-detection +1

Training Bit Fully Convolutional Network for Fast Semantic Segmentation

no code implementations1 Dec 2016 He Wen, Shuchang Zhou, Zhe Liang, Yuxiang Zhang, Dieqiao Feng, Xinyu Zhou, Cong Yao

Fully convolutional neural networks give accurate, per-pixel prediction for input images and have applications like semantic segmentation.

Segmentation Semantic Segmentation

Effective Quantization Methods for Recurrent Neural Networks

2 code implementations30 Nov 2016 Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, Yuheng Zou

In addition, we propose balanced quantization methods for weights to further reduce performance degradation.

Quantization

Scene Text Detection via Holistic, Multi-Channel Prediction

no code implementations29 Jun 2016 Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, Zhimin Cao

Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge.

Scene Text Detection Semantic Segmentation +1

Incidental Scene Text Understanding: Recent Progresses on ICDAR 2015 Robust Reading Competition Challenge 4

no code implementations30 Nov 2015 Cong Yao, Jia-Nan Wu, Xinyu Zhou, Chi Zhang, Shuchang Zhou, Zhimin Cao, Qi Yin

Different from focused texts present in natural images, which are captured with user's intention and intervention, incidental texts usually exhibit much more diversity, variability and complexity, thus posing significant difficulties and challenges for scene text detection and recognition algorithms.

Scene Text Detection Text Detection

Relaxed Multiple-Instance SVM with Application to Object Discovery

no code implementations ICCV 2015 Xinggang Wang, Zhuotun Zhu, Cong Yao, Xiang Bai

Multiple-instance learning (MIL) has served as an important tool for a wide range of vision applications, for instance, image classification, object detection, and visual tracking.

General Classification Image Classification +6

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

83 code implementations21 Jul 2015 Baoguang Shi, Xiang Bai, Cong Yao

In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition.

Optical Character Recognition (OCR) Scene Text Recognition

ICDAR 2015 Text Reading in the Wild Competition

no code implementations10 Jun 2015 Xinyu Zhou, Shuchang Zhou, Cong Yao, Zhimin Cao, Qi Yin

Recently, text detection and recognition in natural scenes are becoming increasing popular in the computer vision community as well as the document analysis community.

Text Detection

Symmetry-Based Text Line Detection in Natural Scenes

no code implementations CVPR 2015 Zheng Zhang, Wei Shen, Cong Yao, Xiang Bai

Recently, a variety of real-world applications have triggered huge demand for techniques that can extract textual information from natural scenes.

Line Detection Scene Text Detection +1

Automatic Script Identification in the Wild

no code implementations12 May 2015 Baoguang Shi, Cong Yao, Chengquan Zhang, Xiaowei Guo, Feiyue Huang, Xiang Bai

With the rapid increase of transnational communication and cooperation, people frequently encounter multilingual scenarios in various situations.

General Classification Image Classification

Deep Learning Representation using Autoencoder for 3D Shape Retrieval

no code implementations25 Sep 2014 Zhuotun Zhu, Xinggang Wang, Song Bai, Cong Yao, Xiang Bai

By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.

3D Shape Classification 3D Shape Recognition +5

Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition

no code implementations CVPR 2014 Cong Yao, Xiang Bai, Baoguang Shi, Wenyu Liu

Driven by the wide range of applications, scene text detection and recognition have become active research topics in computer vision.

Scene Text Detection Scene Text Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.