4 dataset results for Word Embeddings AND Texts AND German

WikiMatrix is a dataset of parallel sentences in the textual content of Wikipedia for all possible language pairs. The mined data consists of:

85 PAPERS • NO BENCHMARKS YET

CELEX

CELEX database comprises three different searchable lexical databases, Dutch, English and German. The lexical data contained in each database is divided into five categories: orthography, phonology, morphology, syntax (word class) and word frequency.

57 PAPERS • NO BENCHMARKS YET

WikiNEuRal

WikiNEuRal is a high-quality automatically-generated dataset for Multilingual Named Entity Recognition.

5 PAPERS • NO BENCHMARKS YET

WikiSem500

The WikiSem500 dataset contains around 500 per-language cluster groups for English, Spanish, German, Chinese, and Japanese (a total of 13,314 test cases).

4 PAPERS • NO BENCHMARKS YET

Datasets

4 dataset results for Word Embeddings AND Texts AND German