Search Results for author: Tsetsuukhei Delgerbaatar

Found 1 papers, 1 papers with code

Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge

1 code implementation20 Apr 2024 Khuyagbaatar Batsuren, Ekaterina Vylomova, Verna Dankers, Tsetsuukhei Delgerbaatar, Omri Uzan, Yuval Pinter, Gábor Bella

Our empirical findings show that the accuracy of UniMorph Labeller is 98%, and that, in all language models studied (including ALBERT, BERT, RoBERTa, and DeBERTa), alien tokenization leads to poorer generalizations compared to morphological tokenization for semantic compositionality of word meanings.

text-classification Text Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.