Language Identification

123 papers with code • 6 benchmarks • 19 datasets

Language identification is the task of determining the language of a text.

Benchmarks

Add a Result

These leaderboards are used to track progress in Language Identification

Dataset	Best Model	Compare
VoxLingua107	XLS-R	See all
OpenSubtitles	Apple bi-LSTM	See all
Universal Dependencies	Apple bi-LSTM	See all
Nordic Language Identification	FastText	See all
GlotLID-C	GlotLID	See all
VoxForge	ConformerG-P	See all

Libraries

Use these libraries to find Language Identification models and implementations

facebookresearch/fairseq

2 papers

29,265

pytorch/fairseq

2 papers

29,264

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

HeLI, a Word-Based Backoff Method for Language Identification

tosaja/HeLI • WS 2016

The shared task comprised of a total of 8 tracks, of which we participated in 7.

Paper
Code

LanideNN: Multilingual Language Identification on Character Window

tomkocmi/LanideNN • • EACL 2017

In language identification, a common first step in natural language processing, we want to automatically determine the language of some input text.

Paper
Code

Discriminating between Similar Languages using Weighted Subword Features

adbar/vardial-experiments • WS 2017

The present contribution revolves around a contrastive subword n-gram model which has been tested in the Discriminating between Similar Languages shared task.

Paper
Code

Joint UD Parsing of Norwegian Bokm\aal and Nynorsk

erikve/bm-nn-parsing • WS 2017

Paper
Code

Language Identification Using Deep Convolutional Recurrent Neural Networks

HPI-DeepLearning/crnn-lid • • 16 Aug 2017

Language Identification (LID) systems are used to classify the spoken language from a given audio sample and are typically the first step for many spoken language processing tasks, such as Automatic Speech Recognition (ASR) systems.

Paper
Code

A study of N-gram and Embedding Representations for Native Language Identification

nishkalavallabhi/NLIST2017 • WS 2017

We report on our experiments with N-gram and embedding based feature representations for Native Language Identification (NLI) as a part of the NLI Shared Task 2017 (team name: NLI-ISU).

Paper
Code

Improved Text Language Identification for the South African Languages

praekelt/feersum-lid-shared-task • 1 Nov 2017

Virtual assistants and text chatbots have recently been gaining popularity.

Paper
Code

Automatic Language Identification in Texts: A Survey

Dagobert42/langID-NLP • 22 Apr 2018

Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in.

Paper
Code

Multilingual Multi-class Sentiment Classification Using Convolutional Neural Networks

SamihYounes/senti-cnn • LREC 2018

Paper
Code

What's in a Domain? Learning Domain-Robust Text Representations using Adversarial Training

lrank/Domain_Robust_Text_Representation • • NAACL 2018

Most real world language problems require learning from heterogenous corpora, raising the problem of learning robust models which generalise well to both similar (in domain) and dissimilar (out of domain) instances to those seen in training.

Paper
Code

Language Identification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result