1 code implementation • ACL 2018 • Rotem Dror, Gili Baumer, Segev Shlomov, Roi Reichart
We establish the fundamental concepts of significance testing and discuss the specific aspects of NLP tasks, experimental setups and evaluation measures that affect the choice of significance tests in NLP research.
1 code implementation • TACL 2017 • Rotem Dror, Gili Baumer, Marina Bogomolov, Roi Reichart
With the ever-growing amounts of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure consistent performance across heterogeneous setups.