Search Results for author: Abhishek Arora

Found 6 papers, 2 papers with code

Subtl.ai at the FinSBD-2 task: Document Structure Identification by Paying Attention

no code implementations • FinNLP (COLING) 2020 • Abhishek Arora, Aman Khullar, Sarath Chandra Pakala, Vishnu Ramesh, Manish Shrivastava

Paper
Add Code

EfficientOCR: An Extensible, Open-Source Package for Efficiently Digitizing World Knowledge

no code implementations • 16 Oct 2023 • Tom Bryan, Jacob Carlson, Abhishek Arora, Melissa Dell

Given the diversity and sheer quantity of public domain texts, liberating them at scale requires optical character recognition (OCR) that is accurate, extremely cheap to deploy, and sample-efficient to customize to novel collections, languages, and character sets.

Image Retrieval Language Modelling +3

Paper
Add Code

LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models

1 code implementation • 2 Sep 2023 • Abhishek Arora, Melissa Dell

By combining transformer language models with intuitive APIs that will be familiar to many users of popular string matching packages, LinkTransformer aims to democratize the benefits of LLMs among those who may be less familiar with deep learning frameworks.

Blocking Language Modelling +3

Paper
Code

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

no code implementations • NeurIPS 2023 • Melissa Dell, Jacob Carlson, Tom Bryan, Emily Silcock, Abhishek Arora, Zejiang Shen, Luca D'Amico-Wong, Quan Le, Pablo Querubin, Leander Heldring

The resulting American Stories dataset provides high quality data that could be used for pre-training a large language model to achieve better understanding of historical English and historical world knowledge.

Language Modelling Large Language Model +3

Paper
Add Code

Quantifying Character Similarity with Vision Transformers

1 code implementation • 24 May 2023 • Xinmei Yang, Abhishek Arora, Shao-Yu Jheng, Melissa Dell

Not all character substitutions are equally probable, and for some settings there are widely used handcrafted lists denoting which string substitutions are more likely, that improve the accuracy of string matching.

Optical Character Recognition (OCR)

Paper
Code

Linking Representations with Multimodal Contrastive Learning

no code implementations • 7 Apr 2023 • Abhishek Arora, Xinmei Yang, Shao-Yu Jheng, Melissa Dell

CLIPPINGS outperforms widely used string matching methods by a wide margin and also outperforms unimodal methods.

Contrastive Learning Optical Character Recognition (OCR)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.