Search Results for author: Lucas Bandarkar

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs).

303

Paper
Code

The Shuffle Test is the most common task to evaluate whether NLP models can measure coherence in text.

Paper
Add Code

The Shuffle Test is the most common task to evaluate whether NLP models can measure coherence in text.

Paper
Code

Recent progress in Natural Language Understanding (NLU) has seen the latest models outperform human performance on many standard tasks.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.