no code implementations • EMNLP (insights) 2020 • Catherine Finegan-Dollak, Ashish Verma
Clustering documents by type—grouping invoices with invoices and articles with articles—is a desirable first step for organizing large collections of document scans.
1 code implementation • 26 May 2023 • Fnu Mohbat, Mohammed J. Zaki, Catherine Finegan-Dollak, Ashish Verma
Visual document classifiers have shown impressive performance on in-distribution test sets.
no code implementations • 1 Sep 2021 • Anik Saha, Catherine Finegan-Dollak, Ashish Verma
Natural language processing for document scans and PDFs has the potential to enormously improve the efficiency of business processes.
no code implementations • ACL 2020 • Michael Desmond, Catherine Finegan-Dollak, Jeff Boston, Matt Arnold
Label noise{---}incorrectly or ambiguously labeled training examples{---}can negatively impact model performance.
1 code implementation • ACL 2018 • Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, Dragomir Radev
Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work.
Ranked #1 on SQL Parsing on IMDb
no code implementations • NAACL 2018 • Youxuan Jiang, Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Walter Lasecki
Most summarization research focuses on summarizing the entire given text, but in practice readers are often interested in only one aspect of the document or conversation.