no code implementations • COLING 2022 • Xin Sheng, Linli Xu, Yinlong Xu, Changcun Bao, Huang Chen, Bo Ren
The discriminator of CoCGAN discriminates the authenticity of given samples and optimizes a contrastive learning objective to capture both more flexible data-to-class relations and data-to-data relations among training samples.
no code implementations • ICCV 2023 • Haoyu Cao, Changcun Bao, Chaohu Liu, Huang Chen, Kun Yin, Hao liu, Yinsong Liu, Deqiang Jiang, Xing Sun
We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation.
no code implementations • 4 Apr 2023 • Yongxin Zhu, Zhen Liu, Yukang Liang, Xin Li, Hao liu, Changcun Bao, Linli Xu
Different to conventional STVQA models which take the linguistic semantics and visual semantics in scene text as two separate features, in this paper, we propose a paradigm of "Locate Then Generate" (LTG), which explicitly unifies this two semantics with the spatial bounding box as a bridge connecting them.