1 code implementation • 18 Apr 2024 • Xenia Ohmer, Elia Bruni, Dieuwke Hupkes
The staggering pace with which the capabilities of large language models (LLMs) are increasing, as measured by a range of commonly used natural language understanding (NLU) benchmarks, raises many questions regarding what "understanding" means for a language model and how it compares to human understanding.
no code implementations • 15 Nov 2023 • Serwan Jassim, Mario Holubar, Annika Richter, Cornelius Wolff, Xenia Ohmer, Elia Bruni
Our evaluation reveals significant shortcomings in the language grounding and intuitive physics capabilities of these models.
1 code implementation • 21 Sep 2023 • Leon Ackermann, Xenia Ohmer
We show that prompts tuned for a specific task are transferable to tasks of the same type but are not very robust to adversarial data.
1 code implementation • 19 May 2023 • Xenia Ohmer, Elia Bruni, Dieuwke Hupkes
At the staggering pace with which the capabilities of large language models (LLMs) are increasing, creating future-proof evaluation sets to assess their understanding becomes more and more challenging.
1 code implementation • COLING 2022 • Xenia Ohmer, Marko Duda, Elia Bruni
We develop a novel communication game, the hierarchical reference game, to study the emergence of such reference systems in artificial agents.