Tokenization

39 papers with code · Natural Language Processing

Benchmarks

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018

WS 2018 awslabs/sockeye

In total we improve by 6. 8{\%} BLEU over our last year{'}s submission and by 4. 8{\%} BLEU over the winning system of the 2017 German→English task.

MACHINE TRANSLATION TOKENIZATION

BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages

LREC 2018 bheinzerling/bpemb

We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE).

ENTITY TYPING TOKENIZATION WORD EMBEDDINGS

A Call for Clarity in Reporting BLEU Scores

WS 2018 mjpost/sacreBLEU

The field of machine translation faces an under-recognized problem because of inconsistency in the reporting of scores from its dominant metric.

MACHINE TRANSLATION TOKENIZATION

NLP-Cube: End-to-End Raw Text Processing With Neural Networks

CONLL 2018 adobe/NLP-Cube

We introduce NLP-Cube: an end-to-end Natural Language Processing framework, evaluated in CoNLL{'}s {``}Multilingual Parsing from Raw Text to Universal Dependencies 2018{''} Shared Task.

LEMMATIZATION TOKENIZATION

Juman++: A Morphological Analysis Toolkit for Scriptio Continua

EMNLP 2018 ku-nlp/jumanpp

We present a three-part toolkit for developing morphological analyzers for languages without natural word boundaries.

ART ANALYSIS LANGUAGE MODELLING MORPHOLOGICAL ANALYSIS PART-OF-SPEECH TAGGING TOKENIZATION