Search Results for author: Abhishek Arora

Found 6 papers, 2 papers with code

EfficientOCR: An Extensible, Open-Source Package for Efficiently Digitizing World Knowledge

no code implementations16 Oct 2023 Tom Bryan, Jacob Carlson, Abhishek Arora, Melissa Dell

Given the diversity and sheer quantity of public domain texts, liberating them at scale requires optical character recognition (OCR) that is accurate, extremely cheap to deploy, and sample-efficient to customize to novel collections, languages, and character sets.

Image Retrieval Language Modelling +3

LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models

1 code implementation2 Sep 2023 Abhishek Arora, Melissa Dell

By combining transformer language models with intuitive APIs that will be familiar to many users of popular string matching packages, LinkTransformer aims to democratize the benefits of LLMs among those who may be less familiar with deep learning frameworks.

Blocking Language Modelling +3

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

no code implementations NeurIPS 2023 Melissa Dell, Jacob Carlson, Tom Bryan, Emily Silcock, Abhishek Arora, Zejiang Shen, Luca D'Amico-Wong, Quan Le, Pablo Querubin, Leander Heldring

The resulting American Stories dataset provides high quality data that could be used for pre-training a large language model to achieve better understanding of historical English and historical world knowledge.

Language Modelling Large Language Model +3

Quantifying Character Similarity with Vision Transformers

1 code implementation24 May 2023 Xinmei Yang, Abhishek Arora, Shao-Yu Jheng, Melissa Dell

Not all character substitutions are equally probable, and for some settings there are widely used handcrafted lists denoting which string substitutions are more likely, that improve the accuracy of string matching.

Optical Character Recognition (OCR)

Linking Representations with Multimodal Contrastive Learning

no code implementations7 Apr 2023 Abhishek Arora, Xinmei Yang, Shao-Yu Jheng, Melissa Dell

CLIPPINGS outperforms widely used string matching methods by a wide margin and also outperforms unimodal methods.

Contrastive Learning Optical Character Recognition (OCR)

Cannot find the paper you are looking for? You can Submit a new open access paper.