WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context

We present WiC-TSV, a new multi-domain evaluation benchmark for Word Sense Disambiguation. More specifically, we introduce a framework for Target Sense Verification of Words in Context which grounds its uniqueness in the formulation as a binary classification task thus being independent of external sense inventories, and the coverage of various domains. This makes the dataset highly flexible for the evaluation of a diverse set of models and systems in and across domains. WiC-TSV provides three different evaluation settings, depending on the input signals provided to the model. We set baseline performance on the dataset using state-of-the-art language models. Experimental results show that even though these models can perform decently on the task, there remains a gap between machine and human performance, especially in out-of-domain settings. WiC-TSV data is available at https://competitions.codalab.org/competitions/23683

PDF Abstract EACL 2021 PDF EACL 2021 Abstract

Datasets


Introduced in the Paper:

WiC-TSV

Used in the Paper:

SuperGLUE WiC

Results from the Paper


 Ranked #1 on Entity Linking on WiC-TSV (Task 3 Accuracy: all metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Entity Linking WiC-TSV Human Task 3 Accuracy: all 85.3 # 1
Task 3 Accuracy: general purpose 82.1 # 1
Task 3 Accuracy: domain specific 89.2 # 1
Word Sense Disambiguation WiC-TSV FastText Task 1 Accuracy: all 53.7 # 6
Task 1 Accuracy: general purpose 56.2 # 5
Task 1 Accuracy: domain specific 50.6 # 6
Task 2 Accuracy: all 52.7 # 4
Task 2 Accuracy: general purpose 56.8 # 4
Task 2 Accuracy: domain specific 47.7 # 4
Task 3 Accuracy: all 53.4 # 6
Task 3 Accuracy: general purpose 57.1 # 5
Task 3 Accuracy: domain specific 49.0 # 6
Word Sense Disambiguation WiC-TSV Unsupervised Bert Task 1 Accuracy: all 54.4 # 5
Task 1 Accuracy: general purpose 49.2 # 7
Task 1 Accuracy: domain specific 60.6 # 5
Task 2 Accuracy: all 62.8 # 3
Task 2 Accuracy: general purpose 57.6 # 3
Task 2 Accuracy: domain specific 69.1 # 3
Task 3 Accuracy: all 60.5 # 5
Task 3 Accuracy: general purpose 54.4 # 6
Task 3 Accuracy: domain specific 67.9 # 4
Word Sense Disambiguation WiC-TSV All true Task 1 Accuracy: all 50.8 # 7
Task 1 Accuracy: general purpose 53.8 # 6
Task 1 Accuracy: domain specific 47.0 # 7
Task 2 Accuracy: all 50.8 # 5
Task 2 Accuracy: general purpose 53.8 # 5
Task 2 Accuracy: domain specific 47.0 # 5
Task 3 Accuracy: all 50.8 # 7
Task 3 Accuracy: general purpose 53.8 # 7
Task 3 Accuracy: domain specific 47.0 # 7
Word Sense Disambiguation WiC-TSV Bert-base Task 1 Accuracy: all 75.3 # 4
Task 1 Accuracy: general purpose 73.3 # 4
Task 1 Accuracy: domain specific 77.9 # 3
Task 2 Accuracy: all 71.7 # 2
Task 2 Accuracy: general purpose 68.6 # 1
Task 2 Accuracy: domain specific 74.7 # 2
Task 3 Accuracy: all 76.6 # 3
Task 3 Accuracy: general purpose 73.5 # 3
Task 3 Accuracy: domain specific 80.4 # 3
Word Sense Disambiguation WiC-TSV Human Task 3 Accuracy: all 85.3 # 1
Task 3 Accuracy: general purpose 82.1 # 1
Task 3 Accuracy: domain specific 89.2 # 1
Entity Linking WiC-TSV Unsupervised Bert Task 1 Accuracy: all 54.4 # 5
Task 1 Accuracy: general purpose 49.2 # 7
Task 1 Accuracy: domain specific 60.6 # 5
Task 2 Accuracy: all 62.8 # 3
Task 2 Accuracy: general purpose 57.6 # 3
Task 2 Accuracy: domain specific 69.1 # 3
Task 3 Accuracy: all 60.5 # 5
Task 3 Accuracy: general purpose 54.4 # 6
Task 3 Accuracy: domain specific 67.9 # 4
Entity Linking WiC-TSV All true Task 1 Accuracy: all 50.8 # 7
Task 1 Accuracy: general purpose 53.8 # 6
Task 1 Accuracy: domain specific 47.0 # 7
Task 2 Accuracy: all 50.8 # 5
Task 2 Accuracy: general purpose 53.8 # 5
Task 2 Accuracy: domain specific 47.0 # 5
Task 3 Accuracy: all 50.8 # 7
Task 3 Accuracy: general purpose 53.8 # 7
Task 3 Accuracy: domain specific 47.0 # 7
Entity Linking WiC-TSV FastText Task 1 Accuracy: all 53.7 # 6
Task 1 Accuracy: general purpose 56.2 # 5
Task 1 Accuracy: domain specific 50.6 # 6
Task 2 Accuracy: all 52.7 # 4
Task 2 Accuracy: general purpose 56.8 # 4
Task 2 Accuracy: domain specific 47.7 # 4
Task 3 Accuracy: all 53.4 # 6
Task 3 Accuracy: general purpose 57.1 # 5
Task 3 Accuracy: domain specific 49.0 # 6
Entity Linking WiC-TSV Bert-base Task 1 Accuracy: all 75.3 # 4
Task 1 Accuracy: general purpose 73.3 # 4
Task 1 Accuracy: domain specific 77.9 # 3
Task 2 Accuracy: all 71.7 # 2
Task 2 Accuracy: general purpose 68.6 # 1
Task 2 Accuracy: domain specific 74.7 # 2
Task 3 Accuracy: all 76.6 # 3
Task 3 Accuracy: general purpose 73.5 # 3
Task 3 Accuracy: domain specific 80.4 # 3

Methods