1 code implementation • LREC 2022 • Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder
To complement this evaluation, we propose a dynamic thresholding technique that adjusts the classifier’s sensitivity as a function of the number of posts a user has.
no code implementations • NAACL (CLPsych) 2021 • Sean MacAvaney, Anjali Mittu, Glen Coppersmith, Jeff Leintz, Philip Resnik
Progress on NLP for mental health — indeed, for healthcare in general — is hampered by obstacles to shared, community-level access to relevant data.
1 code implementation • 2 May 2024 • Andrew Parry, Thomas Jaenich, Sean MacAvaney, Iadh Ounis
In re-ranking, we investigate operating points of adaptive re-ranking with different first stages to find the point in graph traversal where the first stage no longer has an effect on the performance of the overall retrieval pipeline.
no code implementations • 2 May 2024 • James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler
Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of users.
1 code implementation • 1 May 2024 • Andrew Parry, Sean MacAvaney, Debasis Ganguly
We demonstrate such defects by showing that non-relevant text--such as promotional content--can be easily injected into a document without adversely affecting its position in search results.
no code implementations • 23 Apr 2024 • Sean MacAvaney, Nicola Tonellotto
The PLAID (Performance-optimized Late Interaction Driver) algorithm for ColBERTv2 uses clustered term representations to retrieve and progressively prune documents for final (exact) document scoring.
no code implementations • 11 Apr 2024 • Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang
The principal tasks are ranked retrieval of news in one of the three languages, using English topics.
1 code implementation • 29 Mar 2024 • Aleksandr V. Petrov, Sean MacAvaney, Craig Macdonald
However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window.
1 code implementation • 22 Mar 2024 • Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini
First, we introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for helping IR models learn to better follow real-world instructions.
1 code implementation • 12 Mar 2024 • Andrew Parry, Maik Fröbe, Sean MacAvaney, Martin Potthast, Matthias Hagen
Modern sequence-to-sequence relevance models like monoT5 can effectively capture complex textual interactions between queries and documents through cross-encoding.
no code implementations • 4 Mar 2024 • Saran Pandian, Debasis Ganguly, Sean MacAvaney
While the increasing complexity of the search models have been able to demonstrate improvements in effectiveness (measured in terms of relevance of top-retrieved results), a question worthy of a thorough inspection is - "how explainable are these models?
no code implementations • 20 Jan 2024 • Suchana Datta, Debasis Ganguly, Sean MacAvaney, Derek Greene
Additionally, to further improve retrieval effectiveness with this selective PRF approach, we make use of the model's confidence estimates to combine the information from the original and expanded queries.
1 code implementation • 1 Aug 2023 • Andreas Chari, Sean MacAvaney, Iadh Ounis
One advantage of neural ranking models is that they are meant to generalise well in situations of synonymity i. e. where two words have similar or identical meanings.
no code implementations • 1 Aug 2023 • Xiao Wang, Sean MacAvaney, Craig Macdonald, Iadh Ounis
GenQR directly reformulates the user's input query, while GenPRF provides additional context for the query by making use of pseudo-relevance feedback information.
no code implementations • 31 Jul 2023 • Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder
We introduce 'LADR' (Lexically-Accelerated Dense Retrieval), a simple-yet-effective approach that improves the efficiency of existing dense retrieval models without compromising on retrieval effectiveness.
no code implementations • 29 Jun 2023 • Iain Mackie, Shubham Chatterjee, Sean MacAvaney, Jeffrey Dalton
First, we demonstrate that applying a strong neural re-ranker before sparse or dense PRF can improve the retrieval effectiveness by 5-8%.
no code implementations • 16 Jun 2023 • Sean MacAvaney, Xi Wang
Model distillation has emerged as a prominent technique to improve neural search models.
1 code implementation • 30 May 2023 • Maik Fröbe, Jan Heinrich Reimer, Sean MacAvaney, Niklas Deckers, Simon Reich, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast
Standardization is achieved when a retrieval approach implements PyTerrier's interfaces and the input and output of an experiment are compatible with ir_datasets and ir_measures.
1 code implementation • 29 May 2023 • Thong Nguyen, Sean MacAvaney, Andrew Yates
We investigate existing aggregation approaches for adapting LSR to longer documents and find that proximal scoring is crucial for LSR to handle long documents.
no code implementations • 24 Apr 2023 • Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang
This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval.
1 code implementation • 23 Mar 2023 • Thong Nguyen, Sean MacAvaney, Andrew Yates
We then reproduce all prominent methods using a common codebase and re-train them in the same environment, which allows us to quantify how components of the framework affect effectiveness and efficiency.
1 code implementation • 22 Feb 2023 • Sean MacAvaney, Luca Soldaini
We then explore various approaches for predicting the relevance of unjudged documents with respect to a query and the known relevant document, including nearest neighbor, supervised, and prompting techniques.
1 code implementation • 9 Jan 2023 • Mitko Gospodinov, Sean MacAvaney, Craig Macdonald
Doc2Query -- the process of expanding the content of a document before indexing using a sequence-to-sequence model -- has emerged as a prominent technique for improving the first-stage retrieval effectiveness of search engines.
1 code implementation • 18 Aug 2022 • Sean MacAvaney, Nicola Tonellotto, Craig Macdonald
Search systems often employ a re-ranking pipeline, wherein documents (or passages) from an initial pool of candidates are assigned new ranking scores.
2 code implementations • 9 May 2022 • Iain Mackie, Paul Owoicho, Carlos Gemmell, Sophie Fischer, Sean MacAvaney, Jeffrey Dalton
We also show that the manual query reformulations significantly improve document ranking and entity ranking performance.
1 code implementation • 27 Apr 2022 • Prashansa Gupta, Sean MacAvaney
We observe that this bias could be present in the popular MS MARCO dataset, given that annotators could not find answers to 38--45% of the queries, leading to these queries being discarded in training and evaluation processes.
no code implementations • 21 Jan 2022 • Sean MacAvaney, Craig Macdonald, Iadh Ounis
Given that web documents are prone to change over time, we study the differences present between a version of the corpus containing documents as they appeared in 2017 (which has been used by several recent works) and a new version we construct that includes documents close to as they appeared at the time the query log was produced (2006).
no code implementations • 26 Nov 2021 • Sean MacAvaney, Craig Macdonald, Iadh Ounis
We present ir-measures, a new tool that makes it convenient to calculate a diverse set of evaluation measures used in information retrieval.
no code implementations • 31 Aug 2021 • Shameem A. Puthiya Parambath, Christos Anagnostopoulos, Roderick Murray-Smith, Sean MacAvaney, Evangelos Zervas
We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms.
no code implementations • 9 Aug 2021 • Sean MacAvaney, Craig Macdonald, Roderick Murray-Smith, Iadh Ounis
Existing approaches often rely on massive query logs and interaction data to generate a variety of possible query intents, which then can be used to re-rank documents.
no code implementations • 3 May 2021 • Eugene Yang, Sean MacAvaney, David D. Lewis, Ophir Frieder
We indeed find that the pre-trained BERT model reduces review cost by 10% to 15% in TAR workflows simulated on the RCV1-v2 newswire collection.
1 code implementation • 3 Mar 2021 • Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, Nazli Goharian
Managing the data for Information Retrieval (IR) experiments can be challenging.
no code implementations • EACL (WASSA) 2021 • Tong Xiang, Sean MacAvaney, Eugene Yang, Nazli Goharian
Despite the recent successes of transformer-based models in terms of effectiveness on a variety of tasks, their decisions often remain opaque to humans.
2 code implementations • 2 Nov 2020 • Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan
Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search.
no code implementations • EMNLP 2020 • Sean MacAvaney, Arman Cohan, Nazli Goharian
With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of scientific literature on the virus.
1 code implementation • 20 Aug 2020 • Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, Yingfei Sun
In this work, we explore strategies for aggregating relevance signals from a document's passages into a final ranking score.
Ranked #2 on Ad-Hoc Information Retrieval on TREC Robust04
no code implementations • SEMEVAL 2020 • Sajad Sotudeh, Tong Xiang, Hao-Ren Yao, Sean MacAvaney, Eugene Yang, Nazli Goharian, Ophir Frieder
Offensive language detection is an important and challenging task in natural language processing.
no code implementations • 18 May 2020 • Sean MacAvaney, Franck Dernoncourt, Walter Chang, Nazli Goharian, Ophir Frieder
We present an elegant and effective approach for addressing limitations in existing multi-label classification models by incorporating interaction matching, a concept shown to be useful for ad-hoc search result ranking.
1 code implementation • 5 May 2020 • Sean MacAvaney, Arman Cohan, Nazli Goharian
In this work, we present a search system called SLEDGE, which utilizes SciBERT to effectively re-rank articles.
1 code implementation • 29 Apr 2020 • Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder
We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches.
1 code implementation • 29 Apr 2020 • Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder
Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking.
1 code implementation • 29 Apr 2020 • Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder
We show that the proposed heuristics can be used to build a training curriculum that down-weights difficult samples early in the training process.
no code implementations • 18 Jan 2020 • Sean MacAvaney, Arman Cohan, Nazli Goharian, Ross Filice
This allows medical practitioners to easily identify and learn from the reports in which their interpretation most substantially differed from that of the attending physician (who finalized the report).
1 code implementation • 30 Dec 2019 • Sean MacAvaney, Luca Soldaini, Nazli Goharian
While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages.
no code implementations • 14 May 2019 • Sean MacAvaney, Sajad Sotudeh, Arman Cohan, Nazli Goharian, Ish Talati, Ross W. Filice
Automatically generating accurate summaries from clinical reports could save a clinician's time, improve summary coverage, and reduce errors.
7 code implementations • 15 Apr 2019 • Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian
We call this joint approach CEDR (Contextualized Embeddings for Document Ranking).
Ranked #3 on Ad-Hoc Information Retrieval on TREC Robust04
no code implementations • WS 2018 • Sean MacAvaney, Bart Desmet, Arman Cohan, Luca Soldaini, Andrew Yates, Ayah Zirikly, Nazli Goharian
Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media.
no code implementations • COLING 2018 • Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney, Nazli Goharian
Mental health is a significant and growing public health concern.
no code implementations • NAACL 2018 • Sean MacAvaney, Amir Zeldes
We investigate the effect of various dependency-based word embeddings on distinguishing between functional and domain similarity, word similarity rankings, and two downstream tasks in English.
1 code implementation • SEMEVAL 2018 • Sean MacAvaney, Luca Soldaini, Arman Cohan, Nazli Goharian
SemEval 2018 Task 7 focuses on relation ex- traction and classification in scientific literature.
no code implementations • SEMEVAL 2017 • Sean MacAvaney, Arman Cohan, Nazli Goharian
Clinical TempEval 2017 (SemEval 2017 Task 12) addresses the task of cross-domain temporal extraction from clinical text.
1 code implementation • 1 Jul 2017 • Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder
One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training.