Search Results for author: Peter Staar

Found 13 papers, 5 papers with code

INDUS: Effective and Efficient Language Models for Scientific Applications

no code implementations • 17 May 2024 • Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey, Kayleen Bugbee, Mike Little, Elizabeth Fancher, Lauren Sanders, Sylvain Costes, Sergi Blanco-Cuaresma, Kelly Lockhart, Thomas Allen, Felix Grazes, Megan Ansdel, Alberto Accomazzi, Yousef El-Kurdi, Davis Wertheimer, Birgit Pfitzmann, Cesar Berrospi Ramis, Michele Dolfi, Rafael Teixeira de Lima, Panos Vegenas, S. Karthik Mukkavilli, Peter Staar, Sanaz Vahidinia, Ryan McGranaghan, Armin Mehrabian, Tsendgar Lee

Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks.

Paper
Add Code

KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents

1 code implementation • 1 May 2024 • Oshri Naparstek, Roi Pony, Inbar Shapira, Foad Abo Dahood, Ophir Azulai, Yevgeny Yaroker, Nadav Rubinstein, Maksym Lysak, Peter Staar, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Elad Amrani, Idan Friedman, Orit Prince, Yevgeny Burshtein, Adi Raz Goldfarb, Udi Barzelay

In recent years, the challenge of extracting information from business documents has emerged as a critical task, finding applications across numerous domains.

Key Information Extraction

Paper
Code

ESG Accountability Made Easy: DocQA at Your Service

no code implementations • 30 Nov 2023 • Lokesh Mishra, Cesar Berrospi, Kasper Dinkla, Diego Antognini, Francesco Fusco, Benedikt Bothur, Maksym Lysak, Nikolaos Livathinos, Ahmed Nassar, Panagiotis Vagenas, Lucas Morin, Christoph Auer, Michele Dolfi, Peter Staar

We present Deep Search DocQA.

Question Answering

Paper
Add Code

MolGrapher: Graph-based Visual Recognition of Chemical Structures

1 code implementation • ICCV 2023 • Lucas Morin, Martin Danelljan, Maria Isabel Agea, Ahmed Nassar, Valery Weber, Ingmar Meijer, Peter Staar, Fisher Yu

In addition, we introduce a large-scale benchmark of annotated real molecule images, USPTO-30K, to spur research on this critical topic.

Paper
Code

ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents

no code implementations • 24 May 2023 • Christoph Auer, Ahmed Nassar, Maksym Lysak, Michele Dolfi, Nikolaos Livathinos, Peter Staar

The results demonstrate substantial progress towards achieving robust and highly generalizing methods for document layout understanding.

Data Augmentation

Paper
Add Code

Optimized Table Tokenization for Table Structure Recognition

no code implementations • 5 May 2023 • Maksym Lysak, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Peter Staar

The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average.

Paper
Add Code

Unsupervised Term Extraction for Highly Technical Domains

no code implementations • 24 Oct 2022 • Francesco Fusco, Peter Staar, Diego Antognini

Developing term extractors that are able to generalize across very diverse and potentially highly technical domains is challenging, as annotations for domains requiring in-depth expertise are scarce and expensive to obtain.

Sentence Term Extraction

Paper
Add Code

BusiNet -- a Light and Fast Text Detection Network for Business Documents

no code implementations • 4 Jul 2022 • Oshri Naparstek, Ophir Azulai, Daniel Rotman, Yevgeny Burshtein, Peter Staar, Udi Barzelay

Business documents often include sensitive information and as such they cannot be uploaded to a cloud service for OCR.

Optical Character Recognition Optical Character Recognition (OCR) +1

Paper
Add Code

TableFormer: Table Structure Understanding with Transformers

1 code implementation • CVPR 2022 • Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar

In this way, we can obtain the content of the table-cells from programmatic PDF's directly from the PDF source and avoid the training of the custom OCR decoders.

Decoder object-detection +2

110

Paper
Code

pNLP-Mixer: an Efficient all-MLP Architecture for Language

1 code implementation • 9 Feb 2022 • Francesco Fusco, Damian Pascual, Peter Staar, Diego Antognini

Large pre-trained language models based on transformer architecture have drastically changed the natural language processing (NLP) landscape.

intent-classification Intent Classification +3

Paper
Code

Unsupervised Domain Generalization by Learning a Bridge Across Domains

1 code implementation • CVPR 2022 • Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky

The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system.

Domain Generalization Self-Supervised Learning

Paper
Code

Robust PDF Document Conversion Using Recurrent Neural Networks

no code implementations • 18 Feb 2021 • Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, Peter Staar

In this paper, we present a novel approach to document structure recovery in PDF using recurrent neural networks to process the low-level PDF data representation directly, instead of relying on a visual re-interpretation of the rendered PDF page, as has been proposed in previous literature.

Feature Engineering Information Retrieval +1

Paper
Add Code

An Information Extraction and Knowledge Graph Platform for Accelerating Biochemical Discoveries

no code implementations • 19 Jul 2019 • Matteo Manica, Christoph Auer, Valery Weber, Federico Zipoli, Michele Dolfi, Peter Staar, Teodoro Laino, Costas Bekas, Akihiro Fujita, Hiroki Toda, Shuichi Hirose, Yasumitsu Orii

Information extraction and data mining in biochemical literature is a daunting task that demands resource-intensive computation and appropriate means to scale knowledge ingestion.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.