1 code implementation • 9 Jun 2023 • Fuxiao Liu, Hao Tan, Chris Tensmeyer
In this work, we propose DocumentCLIP, a salience-aware contrastive learning framework to enforce vision-language pretraining models to comprehend the interaction between images and longer text within documents.
no code implementations • 27 Nov 2022 • Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu
In contrast, region-level models attempt to encode regions corresponding to paragraphs or text blocks into a single embedding, but they perform worse with additional word-level features.
2 code implementations • 30 Mar 2022 • Brian Davis, Bryan Morse, Bryan Price, Chris Tensmeyer, Curtis Wigington, Vlad Morariu
Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks.
Ranked #31 on Visual Question Answering (VQA) on DocVQA test
no code implementations • CVPR 2022 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality text-image pairs.
2 code implementations • 27 Nov 2021 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.
Ranked #2 on Text-to-Image Generation on Multi-Modal-CelebA-HQ
no code implementations • 18 Apr 2021 • Kai Li, Curtis Wigington, Chris Tensmeyer, Vlad I. Morariu, Handong Zhao, Varun Manjunatha, Nikolaos Barmpalios, Yun Fu
Contrasted with prior work, this paper provides a complementary solution to align domains by learning the same auxiliary tasks in both domains simultaneously.
1 code implementation • 1 Sep 2020 • Brian Davis, Chris Tensmeyer, Brian Price, Curtis Wigington, Bryan Morse, Rajiv Jain
This paper presents a GAN for generating images of handwritten lines conditioned on arbitrary text and latent style vectors.
1 code implementation • CVPR 2020 • Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu
We establish a benchmark suite consisting of different types of PDF document datasets that can be utilized for cross-domain DOD model training and evaluation.
3 code implementations • 5 Sep 2019 • Brian Davis, Bryan Morse, Scott Cohen, Brian Price, Chris Tensmeyer
Automatic, template-free extraction of information from form images is challenging due to the variety of form layouts.
1 code implementation • ECCV 2018 • Curtis Wigington, Chris Tensmeyer, Brian Davis, William Barrett, Brian Price, Scott Cohen
Despite decades of research, offline handwriting recognition (HWR) of degraded historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives.
Ranked #12 on Handwritten Text Recognition on IAM
no code implementations • 4 Aug 2018 • Chris Tensmeyer, Curtis Wigington, Brian Davis, Seth Stewart, Tony Martinez, William Barrett
Training state-of-the-art offline handwriting recognition (HWR) models requires large labeled datasets, but unfortunately such datasets are not available in all languages and domains due to the high cost of manual labeling. We address this problem by showing how high resource languages can be leveraged to help train models for low resource languages. We propose a transfer learning methodology where we adapt HWR models trained on a source language to a target language that uses the same writing script. This methodology only requires labeled data in the source language, unlabeled data in the target language, and a language model of the target language.
3 code implementations • 5 Sep 2017 • Chris Tensmeyer, Brian Davis, Curtis Wigington, Iain Lee, Bill Barrett
When digitizing a document into an image, it is common to include a surrounding border region to visually indicate that the entire document is present in the image.
no code implementations • 11 Aug 2017 • Chris Tensmeyer, Daniel Saunders, Tony Martinez
This same method also achieves the highest reported accuracy of 86. 6% in predicting paleographic scribal script classes at the page level on medieval Latin manuscripts.
no code implementations • 10 Aug 2017 • Chris Tensmeyer, Tony Martinez
Convolutional Neural Networks (CNNs) are state-of-the-art models for document image classification tasks.
Ranked #28 on Document Image Classification on RVL-CDIP
no code implementations • 10 Aug 2017 • Chris Tensmeyer, Tony Martinez
Binarization of degraded historical manuscript images is an important pre-processing step for many document processing tasks.