Search Results for author: Jamie Callan

Found 35 papers, 18 papers with code

Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval

no code implementations • 5 Apr 2024 • João Coelho, Bruno Martins, João Magalhães, Jamie Callan, Chenyan Xiong

This study investigates the existence of positional biases in Transformer-based models for text representation learning, particularly in the context of web document retrieval.

Language Modelling Representation Learning +1

Paper
Add Code

Building Retrieval Systems for the ClueWeb22-B Corpus

no code implementations • 6 Feb 2024 • Harshit Mehrotra, Jamie Callan, Zhen Fan

The ClueWeb22 dataset containing nearly 10 billion documents was released in 2022 to support academic and industry research.

Retrieval

Paper
Add Code

Active Retrieval Augmented Generation

1 code implementation • 11 May 2023 • Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, Graham Neubig

In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation.

Retrieval Sentence

519

Paper
Code

Precise Zero-Shot Dense Retrieval without Relevance Labels

2 code implementations • 20 Dec 2022 • Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan

Given a query, HyDE first zero-shot instructs an instruction-following language model (e. g. InstructGPT) to generate a hypothetical document.

Fact Verification Instruction Following +3

369

Paper
Code

Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer

1 code implementation • 5 Dec 2022 • Zhengbao Jiang, Luyu Gao, Jun Araki, Haibo Ding, Zhiruo Wang, Jamie Callan, Graham Neubig

Systems for knowledge-intensive tasks such as open-domain question answering (QA) usually consist of two stages: efficient retrieval of relevant documents from a large corpus and detailed reading of the selected documents to generate answers.

Ranked #1 on Passage Retrieval on Natural Questions

Open-Domain Question Answering Passage Retrieval +1

Paper
Code

ClueWeb22: 10 Billion Web Documents with Visual and Semantic Information

no code implementations • 29 Nov 2022 • Arnold Overwijk, Chenyan Xiong, Xiao Liu, Cameron VandenBerg, Jamie Callan

ClueWeb22, the newest iteration of the ClueWeb line of datasets, provides 10 billion web pages affiliated with rich information.

document understanding Retrieval

Paper
Add Code

PAL: Program-aided Language Models

2 code implementations • 18 Nov 2022 • Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, PengFei Liu, Yiming Yang, Jamie Callan, Graham Neubig

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.

Ranked #17 on Arithmetic Reasoning on GSM8K

Arithmetic Reasoning GSM8K +2

1,169

Paper
Code

Long Document Re-ranking with Modular Re-ranker

1 code implementation • 9 May 2022 • Luyu Gao, Jamie Callan

In this paper, we propose instead to model full query-to-document interaction, leveraging the attention operation and modular Transformer re-ranker framework.

Document Ranking Re-Ranking

Paper
Code

Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval

1 code implementation • 11 Mar 2022 • Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan

In this paper, we present Tevatron, a dense retrieval toolkit optimized for efficiency, flexibility, and code simplicity.

Retrieval

387

Paper
Code

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

2 code implementations • 30 Aug 2021 • HongChien Yu, Chenyan Xiong, Jamie Callan

This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval.

Retrieval

Paper
Code

Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval

1 code implementation • ACL 2022 • Luyu Gao, Jamie Callan

Recent research demonstrates the effectiveness of using fine-tuned language models~(LM) for dense retrieval.

Language Modelling Passage Retrieval +1

238

Paper
Code

Condenser: a Pre-training Architecture for Dense Retrieval

1 code implementation • EMNLP 2021 • Luyu Gao, Jamie Callan

Pre-trained Transformer language models (LM) have become go-to text representation encoders.

Language Modelling Retrieval +1

238

Paper
Code

COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List

1 code implementation • NAACL 2021 • Luyu Gao, Zhuyun Dai, Jamie Callan

Classical information retrieval systems such as BM25 rely on exact lexical match and carry out search efficiently with inverted list index.

Information Retrieval Retrieval

139

Paper
Code

Rethink Training of BERT Rerankers in Multi-Stage Retrieval Pipeline

1 code implementation • 21 Jan 2021 • Luyu Gao, Zhuyun Dai, Jamie Callan

Pre-trained deep language models~(LM) have advanced the state-of-the-art of text retrieval.

Retrieval Text Retrieval

214

Paper
Code

Assessing the Benefits of Model Ensembles in Neural Re-Ranking for Passage Retrieval

no code implementations • 21 Jan 2021 • Luís Borges, Bruno Martins, Jamie Callan

Our work aimed at experimentally assessing the benefits of model ensembling within the context of neural methods for passage reranking.

Learning-To-Rank Passage Retrieval +2

Paper
Add Code

PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer

1 code implementation • 20 Jan 2021 • HongChien Yu, Zhuyun Dai, Jamie Callan

Most research on pseudo relevance feedback (PRF) has been done in vector space and probabilistic retrieval models.

Retrieval

Paper
Code

Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup

5 code implementations • ACL (RepL4NLP) 2021 • Luyu Gao, Yunyi Zhang, Jiawei Han, Jamie Callan

Contrastive learning has been applied successfully to learn vector representations of text.

Contrastive Learning

305

Paper
Code

Making Information Seeking Easier: An Improved Pipeline for Conversational Search

no code implementations • Findings of the Association for Computational Linguistics 2020 • Vaibhav Kumar, Jamie Callan

Given an input question, it uses a BERT-based classifier (trained with weak supervision) to de-contextualize the input by selecting relevant terms from the dialog history.

Conversational Search Passage Ranking +2

Paper
Add Code

Generating Categories for Sets of Entities

no code implementations • 19 Aug 2020 • Shuo Zhang, Krisztian Balog, Jamie Callan

Category systems are central components of knowledge bases, as they provide a hierarchical grouping of semantically related concepts and entities.

Abstractive Text Summarization Specificity

Paper
Add Code

Ranking Clarification Questions via Natural Language Inference

no code implementations • 18 Aug 2020 • Vaibhav Kumar, Vikas Raunak, Jamie Callan

Given a natural language query, teaching machines to ask clarifying questions is of immense utility in practical natural language processing systems.

Natural Language Inference Reading Comprehension

Paper
Add Code

Understanding BERT Rankers Under Distillation

no code implementations • 21 Jul 2020 • Luyu Gao, Zhuyun Dai, Jamie Callan

Deep language models such as BERT pre-trained on large corpus have given a huge performance boost to the state-of-the-art information retrieval ranking systems.

Information Retrieval Retrieval

Paper
Add Code

Summarizing and Exploring Tabular Data in Conversational Search

1 code implementation • 23 May 2020 • Shuo Zhang, Zhuyun Dai, Krisztian Balog, Jamie Callan

We propose to generate natural language summaries as answers to describe the complex information contained in a table.

Conversational Search

Paper
Code

Complementing Lexical Retrieval with Semantic Residual Embedding

no code implementations • 29 Apr 2020 • Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Benjamin Van Durme, Jamie Callan

This paper presents CLEAR, a retrieval model that seeks to complement classical lexical exact-match models such as BM25 with semantic matching signals from a neural embedding matching model.

Information Retrieval Retrieval

Paper
Add Code

Modularized Transfomer-based Ranking Framework

no code implementations • EMNLP 2020 • Luyu Gao, Zhuyun Dai, Jamie Callan

Recent innovations in Transformer-based ranking models have advanced the state-of-the-art in information retrieval.

Information Retrieval Retrieval

Paper
Add Code

TREC CAsT 2019: The Conversational Assistance Track Overview

1 code implementation • 30 Mar 2020 • Jeffrey Dalton, Chenyan Xiong, Jamie Callan

A common theme through the runs is the use of BERT-based neural reranking methods.

Conversational Search Learning-To-Rank +2

Paper
Code

Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval

2 code implementations • 23 Oct 2019 • Zhuyun Dai, Jamie Callan

When applied to passages, DeepCT-Index produces term weights that can be stored in an ordinary inverted index for passage retrieval.

Passage Retrieval Retrieval +1

310

Paper
Code

Deeper Text Understanding for IR with Contextual Neural Language Modeling

1 code implementation • 22 May 2019 • Zhuyun Dai, Jamie Callan

Neural networks provide new possibilities to automatically learn complex language patterns and query-document relations.

Ranked #5 on Ad-Hoc Information Retrieval on TREC Robust04

Ad-Hoc Information Retrieval Language Modelling +2

162

Paper
Code

Consistency and Variation in Kernel Neural Ranking Model

no code implementations • 27 Sep 2018 • Mary Arpita Pyreddy, Varshini Ramaseshan, Narendra Nath Joshi, Zhuyun Dai, Chenyan Xiong, Jamie Callan, Zhiyuan Liu

This paper studies the consistency of the kernel-based neural ranking model K-NRM, a recent state-of-the-art neural IR model, which is important for reproducible research and deployment in the industry.

Word Embeddings

Paper
Add Code

Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

no code implementations • 3 May 2018 • Chenyan Xiong, Zhengzhong Liu, Jamie Callan, Tie-Yan Liu

The salience model also improves ad hoc search accuracy, providing effective ranking features by modeling the salience of query entities in candidate documents.

Retrieval

Paper
Add Code

Convolutional Neural Networks for Soft Matching N-Grams in Ad-hoc Search

no code implementations • WSDM 2018 2018 • Zhuyun Dai, Chenyan Xiong, Jamie Callan, Zhiyuan Liu

This paper presents Conv-KNRM, a Convolutional Kernel-based Neural Ranking Model that models n-gram soft matches for ad-hoc search.

Learning-To-Rank

Paper
Add Code

Word-Entity Duet Representations for Document Ranking

no code implementations • 20 Jun 2017 • Chenyan Xiong, Jamie Callan, Tie-Yan Liu

This paper presents a word-entity duet framework for utilizing knowledge bases in ad-hoc retrieval.

Document Ranking Learning-To-Rank +1

Paper
Add Code

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

1 code implementation • 20 Jun 2017 • Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, Russell Power

Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score.

Document Ranking Learning-To-Rank +2

197

Paper
Code

WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction

no code implementations • 1 Jul 2013 • Bhavana Dalvi, William W. Cohen, Jamie Callan

We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus.

Clustering Information Retrieval

Paper
Add Code

Exploratory Learning

no code implementations • 1 Jul 2013 • Bhavana Dalvi, William W. Cohen, Jamie Callan

In multiclass semi-supervised learning (SSL), it is sometimes the case that the number of classes present in the data is not known, and hence no labeled examples are provided for some classes.

Clustering

Paper
Add Code

Collectively Representing Semi-Structured Data from the Web

no code implementations • WS 2012 • Bhavana Dalvi, William Cohen, Jamie Callan

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.