Search Results for author: Sean MacAvaney

Found 52 papers, 26 papers with code

TBD3: A Thresholding-Based Dynamic Depression Detection from Social Media for Low-Resource Users

1 code implementation • LREC 2022 • Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder

To complement this evaluation, we propose a dynamic thresholding technique that adjusts the classifier’s sensitivity as a function of the number of posts a user has.

Depression Detection

Paper
Code

Community-level Research on Suicidality Prediction in a Secure Environment: Overview of the CLPsych 2021 Shared Task

no code implementations • NAACL (CLPsych) 2021 • Sean MacAvaney, Anjali Mittu, Glen Coppersmith, Jeff Leintz, Philip Resnik

Progress on NLP for mental health — indeed, for healthcare in general — is hampered by obstacles to shared, community-level access to relevant data.

Paper
Add Code

Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023

1 code implementation • 2 May 2024 • Andrew Parry, Thomas Jaenich, Sean MacAvaney, Iadh Ounis

In re-ranking, we investigate operating points of adaptive re-ranking with different first stages to find the point in graph traversal where the first stage no longer has an effect on the performance of the overall retrieval pipeline.

Language Modelling Large Language Model +2

Paper
Code

On the Evaluation of Machine-Generated Reports

no code implementations • 2 May 2024 • James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler

Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of users.

Document Ranking Text Generation

Paper
Add Code

Exploiting Positional Bias for Query-Agnostic Generative Content in Search

1 code implementation • 1 May 2024 • Andrew Parry, Sean MacAvaney, Debasis Ganguly

We demonstrate such defects by showing that non-relevant text--such as promotional content--can be easily injected into a document without adversely affecting its position in search results.

Position Text Retrieval

Paper
Code

A Reproducibility Study of PLAID

no code implementations • 23 Apr 2024 • Sean MacAvaney, Nicola Tonellotto

The PLAID (Performance-optimized Late Interaction Driver) algorithm for ColBERTv2 uses clustered term representations to retrieve and progressively prune documents for final (exact) document scoring.

Re-Ranking Retrieval

Paper
Add Code

Overview of the TREC 2023 NeuCLIR Track

no code implementations • 11 Apr 2024 • Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

The principal tasks are ranked retrieval of news in one of the three languages, using English topics.

Information Retrieval Retrieval

Paper
Add Code

Shallow Cross-Encoders for Low-Latency Retrieval

1 code implementation • 29 Mar 2024 • Aleksandr V. Petrov, Sean MacAvaney, Craig Macdonald

However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window.

Passage Ranking Retrieval +1

Paper
Code

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

1 code implementation • 22 Mar 2024 • Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini

First, we introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for helping IR models learn to better follow real-world instructions.

Information Retrieval Retrieval +1

Paper
Code

Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models

1 code implementation • 12 Mar 2024 • Andrew Parry, Maik Fröbe, Sean MacAvaney, Martin Potthast, Matthias Hagen

Modern sequence-to-sequence relevance models like monoT5 can effectively capture complex textual interactions between queries and documents through cross-encoding.

Retrieval

Paper
Code

Evaluating the Explainability of Neural Rankers

no code implementations • 4 Mar 2024 • Saran Pandian, Debasis Ganguly, Sean MacAvaney

While the increasing complexity of the search models have been able to demonstrate improvements in effectiveness (measured in terms of relevance of top-retrieved results), a question worthy of a thorough inspection is - "how explainable are these models?

Information Retrieval Sentence

Paper
Add Code

A Deep Learning Approach for Selective Relevance Feedback

no code implementations • 20 Jan 2024 • Suchana Datta, Debasis Ganguly, Sean MacAvaney, Derek Greene

Additionally, to further improve retrieval effectiveness with this selective PRF approach, we make use of the model's confidence estimates to combine the information from the original and expanded queries.

Retrieval

Paper
Add Code

On the Effects of Regional Spelling Conventions in Retrieval Models

1 code implementation • 1 Aug 2023 • Andreas Chari, Sean MacAvaney, Iadh Ounis

One advantage of neural ranking models is that they are meant to generalise well in situations of synonymity i. e. where two words have similar or identical meanings.

Retrieval

Paper
Code

Generative Query Reformulation for Effective Adhoc Search

no code implementations • 1 Aug 2023 • Xiao Wang, Sean MacAvaney, Craig Macdonald, Iadh Ounis

GenQR directly reformulates the user's input query, while GenPRF provides additional context for the query by making use of pseudo-relevance feedback information.

Information Retrieval Retrieval

Paper
Add Code

Lexically-Accelerated Dense Retrieval

no code implementations • 31 Jul 2023 • Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder

We introduce 'LADR' (Lexically-Accelerated Dense Retrieval), a simple-yet-effective approach that improves the efficiency of existing dense retrieval models without compromising on retrieval effectiveness.

Retrieval

Paper
Add Code

Adaptive Latent Entity Expansion for Document Retrieval

no code implementations • 29 Jun 2023 • Iain Mackie, Shubham Chatterjee, Sean MacAvaney, Jeffrey Dalton

First, we demonstrate that applying a strong neural re-ranker before sparse or dense PRF can improve the retrieval effectiveness by 5-8%.

Re-Ranking Retrieval

Paper
Add Code

Online Distillation for Pseudo-Relevance Feedback

no code implementations • 16 Jun 2023 • Sean MacAvaney, Xi Wang

Model distillation has emerged as a prominent technique to improve neural search models.

Re-Ranking Retrieval

Paper
Add Code

The Information Retrieval Experiment Platform

1 code implementation • 30 May 2023 • Maik Fröbe, Jan Heinrich Reimer, Sean MacAvaney, Niklas Deckers, Simon Reich, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

Standardization is achieved when a retrieval approach implements PyTerrier's interfaces and the input and output of an experiment are compatible with ir_datasets and ir_measures.

Information Retrieval Retrieval

Paper
Code

Adapting Learned Sparse Retrieval for Long Documents

1 code implementation • 29 May 2023 • Thong Nguyen, Sean MacAvaney, Andrew Yates

We investigate existing aggregation approaches for adapting LSR to longer documents and find that proximal scoring is crucial for LSR to handle long documents.

Language Modelling Masked Language Modeling +1

Paper
Code

Overview of the TREC 2022 NeuCLIR Track

no code implementations • 24 Apr 2023 • Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval.

Information Retrieval Retrieval

Paper
Add Code

A Unified Framework for Learned Sparse Retrieval

1 code implementation • 23 Mar 2023 • Thong Nguyen, Sean MacAvaney, Andrew Yates

We then reproduce all prominent methods using a common codebase and re-train them in the same environment, which allows us to quantify how components of the framework affect effectiveness and efficiency.

Retrieval

Paper
Code

One-Shot Labeling for Automatic Relevance Estimation

1 code implementation • 22 Feb 2023 • Sean MacAvaney, Luca Soldaini

We then explore various approaches for predicting the relevance of unjudged documents with respect to a query and the known relevant document, including nearest neighbor, supervised, and prompting techniques.

Retrieval

Paper
Code

Doc2Query--: When Less is More

1 code implementation • 9 Jan 2023 • Mitko Gospodinov, Sean MacAvaney, Craig Macdonald

Doc2Query -- the process of expanding the content of a document before indexing using a sequence-to-sequence model -- has emerged as a prominent technique for improving the first-stage retrieval effectiveness of search engines.

Hallucination Retrieval

Paper
Code

Adaptive Re-Ranking with a Corpus Graph

1 code implementation • 18 Aug 2022 • Sean MacAvaney, Nicola Tonellotto, Craig Macdonald

Search systems often employ a re-ranking pipeline, wherein documents (or passages) from an initial pool of candidates are assigned new ranking scores.

Passage Ranking Re-Ranking +1

Paper
Code

CODEC: Complex Document and Entity Collection

2 code implementations • 9 May 2022 • Iain Mackie, Paul Owoicho, Carlos Gemmell, Sophie Fischer, Sean MacAvaney, Jeffrey Dalton

We also show that the manual query reformulations significantly improve document ranking and entity ranking performance.

Document Ranking Re-Ranking +1

Paper
Code

On Survivorship Bias in MS MARCO

1 code implementation • 27 Apr 2022 • Prashansa Gupta, Sean MacAvaney

We observe that this bias could be present in the popular MS MARCO dataset, given that annotators could not find answers to 38--45% of the queries, leading to these queries being discarded in training and evaluation processes.

valid

Paper
Code

Reproducing Personalised Session Search over the AOL Query Log

no code implementations • 21 Jan 2022 • Sean MacAvaney, Craig Macdonald, Iadh Ounis

Given that web documents are prone to change over time, we study the differences present between a version of the corpus containing documents as they appeared in 2017 (which has been used by several recent works) and a new version we construct that includes documents close to as they appeared at the time the query log was produced (2006).

Session Search

Paper
Add Code

Streamlining Evaluation with ir-measures

no code implementations • 26 Nov 2021 • Sean MacAvaney, Craig Macdonald, Iadh Ounis

We present ir-measures, a new tool that makes it convenient to calculate a diverse set of evaluation measures used in information retrieval.

Information Retrieval Retrieval

Paper
Add Code

Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations

no code implementations • 31 Aug 2021 • Shameem A. Puthiya Parambath, Christos Anagnostopoulos, Roderick Murray-Smith, Sean MacAvaney, Evangelos Zervas

We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms.

Multi-Armed Bandits

Paper
Add Code

IntenT5: Search Result Diversification using Causal Language Models

no code implementations • 9 Aug 2021 • Sean MacAvaney, Craig Macdonald, Roderick Murray-Smith, Iadh Ounis

Existing approaches often rely on massive query logs and interaction data to generate a variety of possible query intents, which then can be used to re-rank documents.

Causal Language Modeling Language Modelling +1

Paper
Add Code

Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review

no code implementations • 3 May 2021 • Eugene Yang, Sean MacAvaney, David D. Lewis, Ophir Frieder

We indeed find that the pre-trained BERT model reduces review cost by 10% to 15% in TAR workflows simulated on the RCV1-v2 newswire collection.

Active Learning Language Modelling +4

Paper
Add Code

Simplified Data Wrangling with ir_datasets

1 code implementation • 3 Mar 2021 • Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, Nazli Goharian

Managing the data for Information Retrieval (IR) experiments can be challenging.

Information Retrieval Retrieval

297

Paper
Code

ToxCCIn: Toxic Content Classification with Interpretability

no code implementations • EACL (WASSA) 2021 • Tong Xiang, Sean MacAvaney, Eugene Yang, Nazli Goharian

Despite the recent successes of transformer-based models in terms of effectiveness on a variety of tasks, their decisions often remain opaque to humans.

Classification General Classification

Paper
Add Code

ABNIRML: Analyzing the Behavior of Neural IR Models

2 code implementations • 2 Nov 2020 • Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan

Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search.

Language Modelling Sentence

297

Paper
Code

SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

no code implementations • EMNLP 2020 • Sean MacAvaney, Arman Cohan, Nazli Goharian

With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of scientific literature on the virus.

Re-Ranking

Paper
Add Code

PARADE: Passage Representation Aggregation for Document Reranking

1 code implementation • 20 Aug 2020 • Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, Yingfei Sun

In this work, we explore strategies for aggregating relevance signals from a document's passages into a final ranking score.

Ranked #2 on Ad-Hoc Information Retrieval on TREC Robust04

Document Ranking Knowledge Distillation

Paper
Code

GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection

no code implementations • SEMEVAL 2020 • Sajad Sotudeh, Tong Xiang, Hao-Ren Yao, Sean MacAvaney, Eugene Yang, Nazli Goharian, Ophir Frieder

Offensive language detection is an important and challenging task in natural language processing.

Language Modelling

Paper
Add Code

Interaction Matching for Long-Tail Multi-Label Classification

no code implementations • 18 May 2020 • Sean MacAvaney, Franck Dernoncourt, Walter Chang, Nazli Goharian, Ophir Frieder

We present an elegant and effective approach for addressing limitations in existing multi-label classification models by incorporating interaction matching, a concept shown to be useful for ad-hoc search result ranking.

Classification General Classification +1

Paper
Add Code

SLEDGE: A Simple Yet Effective Baseline for COVID-19 Scientific Knowledge Search

1 code implementation • 5 May 2020 • Sean MacAvaney, Arman Cohan, Nazli Goharian

In this work, we present a search system called SLEDGE, which utilizes SciBERT to effectively re-rank articles.

Paper
Code

Expansion via Prediction of Importance with Contextualization

1 code implementation • 29 Apr 2020 • Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches.

Language Modelling Passage Ranking +2

Paper
Code

Efficient Document Re-Ranking for Transformers by Precomputing Term Representations

1 code implementation • 29 Apr 2020 • Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking.

Document Ranking Question Answering +1

Paper
Code

Training Curricula for Open Domain Answer Re-Ranking

1 code implementation • 29 Apr 2020 • Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

We show that the proposed heuristics can be used to build a training curriculum that down-weights difficult samples early in the training process.

Re-Ranking

Paper
Code

Ranking Significant Discrepancies in Clinical Reports

no code implementations • 18 Jan 2020 • Sean MacAvaney, Arman Cohan, Nazli Goharian, Ross Filice

This allows medical practitioners to easily identify and learn from the reports in which their interpretation most substantially differed from that of the attending physician (who finalized the report).

Paper
Add Code

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

1 code implementation • 30 Dec 2019 • Sean MacAvaney, Luca Soldaini, Nazli Goharian

While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages.

Ad-Hoc Information Retrieval Information Retrieval +2

Paper
Code

Ontology-Aware Clinical Abstractive Summarization

no code implementations • 14 May 2019 • Sean MacAvaney, Sajad Sotudeh, Arman Cohan, Nazli Goharian, Ish Talati, Ross W. Filice

Automatically generating accurate summaries from clinical reports could save a clinician's time, improve summary coverage, and reduce errors.

Abstractive Text Summarization

Paper
Add Code

CEDR: Contextualized Embeddings for Document Ranking

7 code implementations • 15 Apr 2019 • Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian

We call this joint approach CEDR (Contextualized Embeddings for Document Ranking).

Ranked #3 on Ad-Hoc Information Retrieval on TREC Robust04

Document Ranking General Classification

156

Paper
Code

RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses

no code implementations • WS 2018 • Sean MacAvaney, Bart Desmet, Arman Cohan, Luca Soldaini, Andrew Yates, Ayah Zirikly, Nazli Goharian

Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media.

General Classification

Paper
Add Code

SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions

no code implementations • COLING 2018 • Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney, Nazli Goharian

Mental health is a significant and growing public health concern.

text-classification Text Classification

Paper
Add Code

A Deeper Look into Dependency-Based Word Embeddings

no code implementations • NAACL 2018 • Sean MacAvaney, Amir Zeldes

We investigate the effect of various dependency-based word embeddings on distinguishing between functional and domain similarity, word similarity rankings, and two downstream tasks in English.

Word Embeddings Word Similarity

Paper
Add Code

GU IRLAB at SemEval-2018 Task 7: Tree-LSTMs for Scientific Relation Classification

1 code implementation • SEMEVAL 2018 • Sean MacAvaney, Luca Soldaini, Arman Cohan, Nazli Goharian

SemEval 2018 Task 7 focuses on relation ex- traction and classification in scientific literature.

Classification General Classification +2

Paper
Code

GUIR at SemEval-2017 Task 12: A Framework for Cross-Domain Clinical Temporal Information Extraction

no code implementations • SEMEVAL 2017 • Sean MacAvaney, Arman Cohan, Nazli Goharian

Clinical TempEval 2017 (SemEval 2017 Task 12) addresses the task of cross-domain temporal extraction from clinical text.

Domain Adaptation Information Retrieval +2

Paper
Add Code

Content-Based Weak Supervision for Ad-Hoc Re-Ranking

1 code implementation • 1 Jul 2017 • Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder

One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training.

Information Retrieval Re-Ranking

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.