TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Morpheme Segmentaiton	UniMorph 4.0	Subword-ULM transformer (DeepSPIN-3; soft-attention, 1-5 entmax)	macro avg (subtask 1)	97.29	# 1
Morpheme Segmentaiton	UniMorph 4.0	Char LSTM (DeepSPIN-2; soft-attention, 1-5 entmax)	macro avg (subtask 1)	97.15	# 2
Morpheme Segmentaiton	UniMorph 4.0	Char LSTM (DeepSPIN-1; soft-attention)	macro avg (subtask 1)	96.32	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-characters-subword-level-morpheme/morpheme-segmentaiton-on-unimorph-4-0)](https://paperswithcode.com/sota/morpheme-segmentaiton-on-unimorph-4-0?p=beyond-characters-subword-level-morpheme)`

Beyond Characters: Subword-level Morpheme Segmentation

NAACL (SIGMORPHON) 2022 · Ben Peters, Andre F. T. Martins ·

This paper presents DeepSPIN’s submissions to the SIGMORPHON 2022 Shared Task on Morpheme Segmentation. We make three submissions, all to the word-level subtask. First, we show that entmax-based sparse sequence-tosequence models deliver large improvements over conventional softmax-based models, echoing results from other tasks. Then, we challenge the assumption that models for morphological tasks should be trained at the character level by building a transformer that generates morphemes as sequences of unigram language model-induced subwords. This subword transformer outperforms all of our character-level models and wins the word-level subtask. Although we do not submit an official submission to the sentence-level subtask, we show that this subword-based approach is highly effective there as well.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Language Modelling

Morpheme Segmentaiton

Segmentation

Sentence

Datasets

UniMorph 4.0

Results from the Paper

Add Remove

Ranked #1 on Morpheme Segmentaiton on UniMorph 4.0

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Morpheme Segmentaiton	UniMorph 4.0	Subword-ULM transformer (DeepSPIN-3; soft-attention, 1-5 entmax)	macro avg (subtask 1)	97.29	# 1	Compare
Morpheme Segmentaiton	UniMorph 4.0	Char LSTM (DeepSPIN-2; soft-attention, 1-5 entmax)	macro avg (subtask 1)	97.15	# 2	Compare
Morpheme Segmentaiton	UniMorph 4.0	Char LSTM (DeepSPIN-1; soft-attention)	macro avg (subtask 1)	96.32	# 4	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Beyond Characters: Subword-level Morpheme Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove