Dialect Identification

32 papers with code • 0 benchmarks • 3 datasets

Dialectal Arabic Identification

Most implemented papers

GlotLID: Language Identification for Low-Resource Languages

cisnlp/glotlid 24 Oct 2023

Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages.

AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset

MohamedHadjAmeur/AraCOVID19-MFH 7 May 2021

This paper releases "AraCOVID19-MFH" a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset.

Automatic Dialect Detection in Arabic Broadcast Speech

Qatar-Computing-Research-Institute/dialectID 23 Sep 2015

We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.

Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

clips/dutchembeddings LREC 2016

With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.

A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects

boknilev/dsl-char-cnn WS 2016

Discriminating between closely-related language varieties is considered a challenging and important task.

Speech Recognition Challenge in the Wild: Arabic MGB-3

qcri/dialectID 21 Sep 2017

Two hours of audio per dialect were released for development and a further two hours were used for evaluation.

CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing

CAMeL-Lab/camel_tools LREC 2020

We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python.

Multi-Dialect Arabic BERT for Country-Level Dialect Identification

mawdoo3/Multi-dialect-Arabic-BERT COLING (WANLP) 2020

Our winning solution itself came in the form of an ensemble of different training iterations of our pre-trained BERT model, which achieved a micro-averaged F1-score of 26. 78% on the subtask at hand.

The Unreasonable Effectiveness of Machine Learning in Moldavian versus Romanian Dialect Identification

raduionescu/MOROCO-Tweets 30 Jul 2020

We conduct a subjective evaluation by human annotators, showing that humans attain much lower accuracy rates compared to machine learning (ML) models.

Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments

UBC-NLP/microdialects EMNLP 2020

Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties.