Search Results for author: Bozhi Luan

Found 2 papers, 1 papers with code

TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding

1 code implementation15 Apr 2024 Bozhi Luan, Hao Feng, Hong Chen, Yonghui Wang, Wengang Zhou, Houqiang Li

The image overview stage provides a comprehensive understanding of the global scene information, and the coarse localization stage approximates the image area containing the answer based on the question asked.

Question Answering Visual Question Answering (VQA)

Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model

no code implementations31 May 2023 Haisong Ding, Bozhi Luan, Dongnan Gui, Kai Chen, Qiang Huo

This model conditions on a printed glyph image and creates mappings between printed characters and handwritten images, thus enabling the generation of photo-realistic handwritten samples with diverse styles and unseen text contents.

Denoising Optical Character Recognition (OCR)

Cannot find the paper you are looking for? You can Submit a new open access paper.