Search Results for author: Alistair Willis

Found 10 papers, 1 papers with code

Identifying Annotator Bias: A new IRT-based method for bias identification

no code implementations • COLING 2020 • Jacopo Amidei, Paul Piwek, Alistair Willis

Our interpretation of IRT offers an original bias identification method that can be used to compare annotators{'} bias and characterise annotation disagreement.

Paper
Add Code

Agreement is overrated: A plea for correlation to assess human evaluation reliability

no code implementations • WS 2019 • Jacopo Amidei, Paul Piwek, Alistair Willis

Following Sampson and Babarczy (2008), Lommel et al. (2014), Joshi et al. (2016) and Amidei et al. (2018b), such phenomena can be explained in terms of irreducible human language variability.

nlg evaluation

Paper
Add Code

The use of rating and Likert scales in Natural Language Generation human evaluation tasks: A review and some recommendations

no code implementations • WS 2019 • Jacopo Amidei, Paul Piwek, Alistair Willis

Rating and Likert scales are widely used in evaluation experiments to measure the quality of Natural Language Generation (NLG) systems.

nlg evaluation Text Generation

Paper
Add Code

Evaluation methodologies in Automatic Question Generation 2013-2018

no code implementations • WS 2018 • Jacopo Amidei, Paul Piwek, Alistair Willis

In the last few years Automatic Question Generation (AQG) has attracted increasing interest.

Question Generation Question-Generation +1

Paper
Add Code

Rethinking the Agreement in Human Evaluation Tasks

no code implementations • COLING 2018 • Jacopo Amidei, Paul Piwek, Alistair Willis

For this reason, we believe that annotation schemes for natural language generation tasks that are aimed at evaluating language quality need to be treated with great care.

Dialogue Generation Question Generation +1

Paper
Add Code

Search Personalization with Embeddings

1 code implementation • 12 Dec 2016 • Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song, Alistair Willis

Recent research has shown that the performance of search personalization depends on the richness of user profiles which normally represent the user's topical interests.

141

Paper
Code