Language Identification

119 papers with code • 5 benchmarks • 18 datasets

Language identification is the task of determining the language of a text.

Libraries

Use these libraries to find Language Identification models and implementations
2 papers
29,027

Most implemented papers

AdelaideCyC at SemEval-2020 Task 12: Ensemble of Classifiers for Offensive Language Detection in Social Media

2024-MindSpore-1/Code6 SEMEVAL 2020

This paper describes the systems our team (AdelaideCyC) has developed for SemEval Task 12 (OffensEval 2020) to detect offensive language in social media.

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

pytorch/fairseq 17 Nov 2021

On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.

KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection

michalspiegel/imgtb 21 Feb 2024

SemEval-2024 Task 8 is focused on multigenerator, multidomain, and multilingual black-box machine-generated text detection.

Finding Structure in Text, Genome and Other Symbolic Sequences

rn123/japanese_text_analysis 8 Jul 2012

A variety of applications for these methods are examined in detail.

TweetCaT: a tool for building Twitter corpora of smaller languages

nljubesi/tweetcat LREC 2014

This paper presents TweetCaT, an open-source Python tool for building Twitter corpora that was designed for smaller languages.

Automatic Dialect Detection in Arabic Broadcast Speech

Qatar-Computing-Research-Institute/dialectID 23 Sep 2015

We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.

A Semisupervised Approach for Language Identification based on Ladder Networks

udibr/LRE 1 Apr 2016

In this study we address the problem of training a neuralnetwork for language identification using both labeled and unlabeled speech samples in the form of i-vectors.

Hierarchical Character-Word Models for Language Identification

ajaech/twitter_langid WS 2016

Social media messages' brevity and unconventional spelling pose a challenge to language identification.