Search Results for author: Ehsan Kamalloo

Found 14 papers, 12 papers with code

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

1 code implementation • 18 Dec 2023 • Nandan Thakur, Luiz Bonifacio, Xinyu Zhang, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Boxing Chen, Mehdi Rezagholizadeh, Jimmy Lin

We measure LLM robustness using two metrics: (i) hallucination rate, measuring model tendency to hallucinate an answer, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.

Hallucination Language Modelling +2

Paper
Code

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

1 code implementation • 31 Jul 2023 • Ehsan Kamalloo, Aref Jafari, Xinyu Zhang, Nandan Thakur, Jimmy Lin

In this paper, we introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) for building end-to-end generative information-seeking models that are capable of retrieving candidate quotes and generating attributed explanations.

Information Retrieval Informativeness +1

Paper
Code

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

2 code implementations • 13 Jun 2023 • Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, Jimmy Lin

BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across 18 different domain/task combinations.

Information Retrieval Representation Learning +1

1,380

Paper
Code

Evaluating Open-Domain Question Answering in the Era of Large Language Models

1 code implementation • 11 May 2023 • Ehsan Kamalloo, Nouha Dziri, Charles L. A. Clarke, Davood Rafiei

The recent success of large language models (LLMs) for QA aggravates lexical matching failures since candidate answers become longer, thereby making matching with the gold answers even more challenging.

Open-Domain Question Answering

Paper
Code

Evaluating Embedding APIs for Information Retrieval

no code implementations • 10 May 2023 • Ehsan Kamalloo, Xinyu Zhang, Odunayo Ogundepo, Nandan Thakur, David Alfonso-Hermelo, Mehdi Rezagholizadeh, Jimmy Lin

The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs.

Domain Generalization Information Retrieval +2

Paper
Add Code

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

no code implementations • 3 Apr 2023 • Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang

The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another.

Cross-Lingual Information Retrieval Retrieval

Paper
Add Code

Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

1 code implementation • 18 Oct 2022 • Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, Jimmy Lin

MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around the world.

Information Retrieval Retrieval

130

Paper
Code

Probing the Robustness of Pre-trained Language Models for Entity Matching

1 code implementation • ACM International Conference on Information & Knowledge Management (CIKM) 2022 • Mehdi Akbarian Rastaghi, Ehsan Kamalloo, Davood Rafiei

The paradigm of fine-tuning Pre-trained Language Models (PLMs) has been successful in Entity Matching (EM).

Ranked #3 on Entity Resolution on Amazon-Google

Data Augmentation Domain Generalization +1

Paper
Code

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

1 code implementation • 22 Apr 2022 • Nouha Dziri, Ehsan Kamalloo, Sivan Milton, Osmar Zaiane, Mo Yu, Edoardo M. Ponti, Siva Reddy

The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources.

Dialogue Generation Hallucination

Paper
Code

When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation

1 code implementation • Findings (ACL) 2022 • Ehsan Kamalloo, Mehdi Rezagholizadeh, Ali Ghodsi

From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA.

Data Augmentation Knowledge Distillation

Paper
Code

Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax

1 code implementation • Findings (ACL) 2021 • Ehsan Kamalloo, Mehdi Rezagholizadeh, Peyman Passban, Ali Ghodsi

We exploit a semi-supervised approach based on KD to train a model on augmented data.

Data Augmentation Knowledge Distillation +2

Paper
Code

Evaluating Coherence in Dialogue Systems using Entailment

1 code implementation • NAACL 2019 • Nouha Dziri, Ehsan Kamalloo, Kory W. Mathewson, Osmar Zaiane

Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers.

Dialogue Evaluation Natural Language Inference +1

Paper
Code

Augmenting Neural Response Generation with Context-Aware Topical Attention

1 code implementation • WS 2019 • Nouha Dziri, Ehsan Kamalloo, Kory W. Mathewson, Osmar Zaiane

Our model is built upon the basic Seq2Seq model by augmenting it with a hierarchical joint attention mechanism that incorporates topical concepts and previous interactions into the response generation.

Open-Domain Dialog Response Generation +1

111

Paper
Code

A Coherent Unsupervised Model for Toponym Resolution

1 code implementation • 4 May 2018 • Ehsan Kamalloo, Davood Rafiei

The evaluation shows that our method outperforms the unsupervised technique as well as Reuters OpenCalais and Google Cloud Natural Language API on all three corpora; also, our method shows a performance close to that of the state-of-the-art supervised method and outperforms it when the test data has 40% or more toponyms that are not seen in the training data.

Toponym Resolution

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.