Hierarchical Character-Word Models for Language Identification

WS 2016 · Aaron Jaech, George Mulcaire, Shobhit Hathi, Mari Ostendorf, Noah A. Smith ·

Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching.