Language identification is the task of determining the language of a text.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Language Identification (LID) systems are used to classify the spoken language from a given audio sample and are typically the first step for many spoken language processing tasks, such as Automatic Speech Recognition (ASR) systems.
While applications of transfer learning are common in the fields of computer vision and natural language processing, audio- and speech processing are surprisingly lacking readily available and transferable models.
In this study we address the problem of training a neuralnetwork for language identification using both labeled and unlabeled speech samples in the form of i-vectors.
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.
In language identification, a common first step in natural language processing, we want to automatically determine the language of some input text.
Code-switching (CS) refers to a linguistic phenomenon where a speaker uses different languages in an utterance or between alternating utterances.
Most real world language problems require learning from heterogenous corpora, raising the problem of learning robust models which generalise well to both similar (in domain) and dissimilar (out of domain) instances to those seen in training.
Social media messages' brevity and unconventional spelling pose a challenge to language identification.