Language Identification

123 papers with code • 6 benchmarks • 19 datasets

Language identification is the task of determining the language of a text.

Libraries

Use these libraries to find Language Identification models and implementations
2 papers
29,264

Most implemented papers

HeLI, a Word-Based Backoff Method for Language Identification

tosaja/HeLI WS 2016

The shared task comprised of a total of 8 tracks, of which we participated in 7.

LanideNN: Multilingual Language Identification on Character Window

tomkocmi/LanideNN EACL 2017

In language identification, a common first step in natural language processing, we want to automatically determine the language of some input text.

Discriminating between Similar Languages using Weighted Subword Features

adbar/vardial-experiments WS 2017

The present contribution revolves around a contrastive subword n-gram model which has been tested in the Discriminating between Similar Languages shared task.

Language Identification Using Deep Convolutional Recurrent Neural Networks

HPI-DeepLearning/crnn-lid 16 Aug 2017

Language Identification (LID) systems are used to classify the spoken language from a given audio sample and are typically the first step for many spoken language processing tasks, such as Automatic Speech Recognition (ASR) systems.

A study of N-gram and Embedding Representations for Native Language Identification

nishkalavallabhi/NLIST2017 WS 2017

We report on our experiments with N-gram and embedding based feature representations for Native Language Identification (NLI) as a part of the NLI Shared Task 2017 (team name: NLI-ISU).

Improved Text Language Identification for the South African Languages

praekelt/feersum-lid-shared-task 1 Nov 2017

Virtual assistants and text chatbots have recently been gaining popularity.

Automatic Language Identification in Texts: A Survey

Dagobert42/langID-NLP 22 Apr 2018

Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in.

What's in a Domain? Learning Domain-Robust Text Representations using Adversarial Training

lrank/Domain_Robust_Text_Representation NAACL 2018

Most real world language problems require learning from heterogenous corpora, raising the problem of learning robust models which generalise well to both similar (in domain) and dissimilar (out of domain) instances to those seen in training.