Search Results for author: Vitor Jeronymo

Found 10 papers, 7 papers with code

InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval

1 code implementation • 10 Jul 2023 • Hugo Abonizio, Luiz Bonifacio, Vitor Jeronymo, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira

Our toolkit not only reproduces the InPars method and partially reproduces Promptagator, but also provides a plug-and-play functionality allowing the use of different LLMs, exploring filtering methods and finetuning various reranker models on the generated data.

Information Retrieval Retrieval +1

156

Paper
Code

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

no code implementations • 3 Apr 2023 • Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang

The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another.

Cross-Lingual Information Retrieval Retrieval

Paper
Add Code

NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

1 code implementation • 28 Mar 2023 • Vitor Jeronymo, Roberto Lotufo, Rodrigo Nogueira

This paper reports on a study of cross-lingual information retrieval (CLIR) using the mT5-XXL reranker on the NeuCLIR track of TREC 2022.

Cross-Lingual Information Retrieval Retrieval

Paper
Code

InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval

1 code implementation • 4 Jan 2023 • Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira

Recently, InPars introduced a method to efficiently use large language models (LLMs) in information retrieval tasks: via few-shot examples, an LLM is induced to generate relevant queries for documents.

Information Retrieval Retrieval

156

Paper
Code

In Defense of Cross-Encoders for Zero-Shot Retrieval

1 code implementation • 12 Dec 2022 • Guilherme Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models.

Retrieval

Paper
Code

mRobust04: A Multilingual Version of the TREC Robust 2004 Benchmark

no code implementations • 27 Sep 2022 • Vitor Jeronymo, Mauricio Nascimento, Roberto Lotufo, Rodrigo Nogueira

Robust 2004 is an information retrieval benchmark whose large number of judgments per query make it a reliable evaluation dataset.

Information Retrieval Retrieval

Paper
Add Code

A Boring-yet-effective Approach for the Product Ranking Task of the Amazon KDD Cup 2022

no code implementations • 9 Aug 2022 • Vitor Jeronymo, Guilherme Rosa, Surya Kallumadi, Roberto Lotufo, Rodrigo Nogueira

In this work we describe our submission to the product ranking task of the Amazon KDD Cup 2022.

Retrieval

Paper
Add Code

No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval

1 code implementation • 6 Jun 2022 • Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

This has made distilled and dense models, due to latency constraints, the go-to choice for deployment in real-world retrieval applications.

Ranked #1 on Citation Prediction on SciDocs (BEIR)

Argument Retrieval Biomedical Information Retrieval +9

Paper
Code

Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task

1 code implementation • 30 May 2022 • Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Roberto Lotufo, Rodrigo Nogueira

Recent work has shown that language models scaled to billions of parameters, such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios.

Language Modelling

Paper
Code

mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset

1 code implementation • 31 Aug 2021 • Luiz Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

In this work, we present mMARCO, a multilingual version of the MS MARCO passage ranking dataset comprising 13 languages that was created using machine translation.

Information Retrieval Machine Translation +4

128

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.