Search Results for author: Allan Hanbury

Found 43 papers, 26 papers with code

Benchmark for Research Theme Classification of Scholarly Documents

1 code implementation sdp (COLING) 2022 Óscar E. Mendoza, Wojciech Kusa, Alaa El-Ebshihy, Ronin Wu, David Pride, Petr Knoth, Drahomira Herrmannova, Florina Piroi, Gabriella Pasi, Allan Hanbury

We present a new gold-standard dataset and a benchmark for the Research Theme Identification task, a sub-task of the Scholarly Knowledge Graph Generation shared task, at the 3rd Workshop on Scholarly Document Processing.

Classification Graph Generation

DreamDrug - A crowdsourced NER dataset for detecting drugs in darknet markets

no code implementations WNUT (ACL) 2021 Johannes Bogensperger, Sven Schlarb, Allan Hanbury, Gábor Recski

We present DreamDrug, a crowdsourced dataset for detecting mentions of drugs in noisy user-generated item listings from darknet markets.

NER

Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

no code implementations12 Sep 2023 Sophia Althammer, Guido Zuccon, Sebastian Hofstätter, Suzan Verberne, Allan Hanbury

We further find that gains provided by AL strategies come at the expense of more assessments (thus higher annotation costs) and AL strategies underperform random selection when comparing effectiveness given a fixed annotation cost.

Active Learning Domain Adaptation

CRUISE-Screening: Living Literature Reviews Toolbox

1 code implementation4 Sep 2023 Wojciech Kusa, Petr Knoth, Allan Hanbury

To this end, we developed CRUISE-Screening, a web-based application for conducting living literature reviews - a type of literature review that is continuously updated to reflect the latest research in a particular field.

Question Answering text-classification +1

Effective Matching of Patients to Clinical Trials using Entity Extraction and Neural Re-ranking

no code implementations1 Jul 2023 Wojciech Kusa, Óscar E. Mendoza, Petr Knoth, Gabriella Pasi, Allan Hanbury

Our approach involves two key components in a pipeline-based model: (i) a data enrichment technique for enhancing both queries and documents during the first retrieval stage, and (ii) a novel re-ranking schema that uses a Transformer network in a setup adapted to this task by leveraging the structure of the CT documents.

Descriptive named-entity-recognition +5

Outcome-based Evaluation of Systematic Review Automation

no code implementations30 Jun 2023 Wojciech Kusa, Guido Zuccon, Petr Knoth, Allan Hanbury

We find that accounting for the difference in review outcomes leads to a different assessment of the quality of a system than if traditional evaluation measures were used.

TAR

Statute-enhanced lexical retrieval of court cases for COLIEE 2022

no code implementations17 Apr 2023 Tobias Fink, Gabor Recski, Wojciech Kusa, Allan Hanbury

We discuss our experiments for COLIEE Task 1, a court case retrieval competition using cases from the Federal Court of Canada.

Retrieval

Are We There Yet? A Decision Framework for Replacing Term Based Retrieval with Dense Retrieval Systems

no code implementations26 Jun 2022 Sebastian Hofstätter, Nick Craswell, Bhaskar Mitra, Hamed Zamani, Allan Hanbury

Recently, several dense retrieval (DR) models have demonstrated competitive performance to term-based retrieval that are ubiquitous in search systems.

Retrieval

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

no code implementations24 Mar 2022 Sebastian Hofstätter, Omar Khattab, Sophia Althammer, Mete Sertkan, Allan Hanbury

Recent progress in neural information retrieval has demonstrated large gains in effectiveness, while often sacrificing the efficiency and interpretability of the neural model compared to classical approaches.

Information Retrieval Retrieval

Automation of Citation Screening for Systematic Literature Reviews using Neural Networks: A Replicability Study

1 code implementation19 Jan 2022 Wojciech Kusa, Allan Hanbury, Petr Knoth

In this work, we conduct a replicability study of the first two deep learning papers for citation screening and evaluate their performance on 23 publicly available datasets.

Document Classification Word Embeddings

PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

1 code implementation5 Jan 2022 Sophia Althammer, Sebastian Hofstätter, Mete Sertkan, Suzan Verberne, Allan Hanbury

However in the web domain we are in a setting with large amounts of training data and a query-to-passage or a query-to-document retrieval task.

Passage Retrieval Retrieval

Establishing Strong Baselines for TripClick Health Retrieval

2 code implementations2 Jan 2022 Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, Allan Hanbury

We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection.

Re-Ranking Retrieval

A Time-Optimized Content Creation Workflow for Remote Teaching

1 code implementation11 Oct 2021 Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, Allan Hanbury

We describe our workflow to create an engaging remote learning experience for a university course, while minimizing the post-production time of the educators.

DoSSIER@COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval

1 code implementation9 Aug 2021 Sophia Althammer, Arian Askari, Suzan Verberne, Allan Hanbury

We address this challenge by combining lexical and dense retrieval methods on the paragraph-level of the cases for the first stage retrieval.

Passage Retrieval Re-Ranking +1

Linguistically Informed Masking for Representation Learning in the Patent Domain

1 code implementation10 Jun 2021 Sophia Althammer, Mark Buckley, Sebastian Hofstätter, Allan Hanbury

Domain-specific contextualized language models have demonstrated substantial effectiveness gains for domain-specific downstream tasks, like similarity matching, entity recognition or information retrieval.

Domain Adaptation Information Retrieval +2

Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

1 code implementation20 May 2021 Sebastian Hofstätter, Bhaskar Mitra, Hamed Zamani, Nick Craswell, Allan Hanbury

An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e. g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers.

Document Ranking Knowledge Distillation +1

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

4 code implementations14 Apr 2021 Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, Allan Hanbury

A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows.

Re-Ranking Retrieval +2

Mitigating the Position Bias of Transformer Models in Passage Re-Ranking

1 code implementation18 Jan 2021 Sebastian Hofstätter, Aldo Lipani, Sophia Althammer, Markus Zlabinger, Allan Hanbury

In this work we analyze position bias on datasets, the contextualized representations, and their effect on retrieval results.

Passage Re-Ranking Position +4

Cross-domain Retrieval in the Legal and Patent Domains: a Reproducibility Study

1 code implementation21 Dec 2020 Sophia Althammer, Sebastian Hofstätter, Allan Hanbury

For reproducibility and transparency as well as to benefit the community we make our source code and the trained models publicly available.

Information Retrieval Language Modelling +1

Effective Crowd-Annotation of Participants, Interventions, and Outcomes in the Text of Clinical Trial Reports

2 code implementations Findings of the Association for Computational Linguistics 2020 Markus Zlabinger, Marta Sabou, Sebastian Hofst{\"a}tter, Allan Hanbury

Obtaining such a corpus from crowdworkers, however, has been shown to be ineffective since (i) workers usually lack domain-specific expertise to conduct the task with sufficient quality, and (ii) the standard approach of annotating entire abstracts of trial reports as one task-instance (i. e. HIT) leads to an uneven distribution in task effort.

Sentence text similarity

Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

1 code implementation6 Oct 2020 Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, Allan Hanbury

Based on this finding, we propose a cross-architecture training procedure with a margin focused loss (Margin-MSE), that adapts knowledge distillation to the varying score output distributions of different BERT and non-BERT passage ranking architectures.

Knowledge Distillation Passage Ranking +3

Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

1 code implementation12 Aug 2020 Sebastian Hofstätter, Markus Zlabinger, Mete Sertkan, Michael Schröder, Allan Hanbury

We extend the ranked retrieval annotations of the Deep Learning track of TREC 2019 with passage and word level graded relevance annotations for all relevant documents.

Document Ranking Question Answering +1

DEXA: Supporting Non-Expert Annotators with Dynamic Examples from Experts

1 code implementation17 May 2020 Markus Zlabinger, Marta Sabou, Sebastian Hofstätter, Mete Sertkan, Allan Hanbury

of 0. 68 to experts in DEXA vs. 0. 40 in CONTROL); (ii) already three per majority voting aggregated annotations of the DEXA approach reach substantial agreements to experts of 0. 78/0. 75/0. 69 for P/I/O (in CONTROL 0. 73/0. 58/0. 46).

Avg Sentence +1

Local Self-Attention over Long Text for Efficient Document Retrieval

1 code implementation11 May 2020 Sebastian Hofstätter, Hamed Zamani, Bhaskar Mitra, Nick Craswell, Allan Hanbury

In this work, we propose a local self-attention which considers a moving window over the document terms and for each term attends only to other terms in the same window.

Document Ranking Retrieval

Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking

1 code implementation4 Feb 2020 Sebastian Hofstätter, Markus Zlabinger, Allan Hanbury

In addition, to gain insight into TK, we perform a clustered query analysis of TK's results, highlighting its strengths and weaknesses on queries with different types of information need and we show how to interpret the cause of ranking differences of two documents by comparing their internal scores.

Re-Ranking Word Embeddings

DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations

no code implementations15 Jan 2020 Markus Zlabinger, Sebastian Hofstätter, Navid Rekabsaz, Allan Hanbury

While existing disease-symptom relationship extraction methods are used as the foundation in the various medical tasks, no collection is available to systematically evaluate the performance of such methods.

Medical Diagnosis Word Embeddings

Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-Ranking Results

1 code implementation10 Dec 2019 Sebastian Hofstätter, Markus Zlabinger, Allan Hanbury

In this paper we look beyond metrics-based evaluation of Information Retrieval systems, to explore the reasons behind ranking results.

Information Retrieval Re-Ranking +1

TU Wien @ TREC Deep Learning '19 -- Simple Contextualization for Re-ranking

1 code implementation3 Dec 2019 Sebastian Hofstätter, Markus Zlabinger, Allan Hanbury

The usage of neural network models puts multiple objectives in conflict with each other: Ideally we would like to create a neural model that is effective, efficient, and interpretable at the same time.

Document Ranking Passage Ranking +2

Deep Learning architectures for generalized immunofluorescence based nuclear image segmentation

1 code implementation30 Jul 2019 Florian Kromp, Lukas Fischer, Eva Bozsaky, Inge Ambros, Wolfgang Doerr, Sabine Taschner-Mandl, Peter Ambros, Allan Hanbury

In this work, we aim to evaluate the performance of state-of-the-art deep learning architectures to segment nuclei in fluorescence images of various tissue origins and sample preparation types without post-processing.

Image Segmentation object-detection +3

Let's measure run time! Extending the IR replicability infrastructure to include performance aspects

no code implementations10 Jul 2019 Sebastian Hofstätter, Allan Hanbury

Establishing a docker-based replicability infrastructure offers the community a great opportunity: measuring the run time of information retrieval systems.

Information Retrieval Re-Ranking +1

On the Effect of Low-Frequency Terms on Neural-IR Models

1 code implementation29 Apr 2019 Sebastian Hofstätter, Navid Rekabsaz, Carsten Eickhoff, Allan Hanbury

Low-frequency terms are a recurring challenge for information retrieval models, especially neural IR frameworks struggle with adequately capturing infrequently observed words.

Passage Retrieval Retrieval +1

Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence

no code implementations13 Dec 2018 Navid Rekabsaz, Robert West, James Henderson, Allan Hanbury

The common approach to measuring such biases using a corpus is by calculating the similarities between the embedding vector of a word (like nurse) and the vectors of the representative words of the concepts of interest (such as genders).

Word Embeddings

Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian

no code implementations16 Nov 2017 Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Andres Duque

We explore the use of unsupervised methods in Cross-Lingual Word Sense Disambiguation (CL-WSD) with the application of English to Persian.

Semantic Similarity Semantic Textual Similarity +1

Toward Incorporation of Relevant Documents in word2vec

no code implementations20 Jul 2017 Navid Rekabsaz, Bhaskar Mitra, Mihai Lupu, Allan Hanbury

As an alternative, explicit word representations propose vectors whose dimensions are easily interpretable, and recent methods show competitive performance to the dense vectors.

Information Retrieval Retrieval +1

Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

no code implementations20 Jun 2016 Navid Rekabsaz, Mihai Lupu, Allan Hanbury

Word embedding, specially with its recent developments, promises a quantification of the similarity between terms.

Information Retrieval Retrieval

Standard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation

1 code implementation LREC 2016 Navid Rekabsaz, Serwah Sabetghadam, Mihai Lupu, Linda Andersson, Allan Hanbury

In this paper, we address the shortage of evaluation benchmarks on Persian (Farsi) language by creating and making available a new benchmark for English to Persian Cross Lingual Word Sense Disambiguation (CL-WSD).

Word Sense Disambiguation

Cannot find the paper you are looking for? You can Submit a new open access paper.