Prevalence of code mixing in semi-formal patient communication in low resource languages of South Africa

13 Nov 2019  ·  Monika Obrocka, Charles Copley, Themba Gqaza, Eli Grant ·

In this paper we address the problem of code-mixing in resource-poor language settings. We examine data consisting of 182k unique questions generated by users of the MomConnect helpdesk, part of a national scale public health platform in South Africa. We show evidence of code-switching at the level of approximately 10% within this dataset -- a level that is likely to pose challenges for future services. We use a natural language processing library (Polyglot) that supports detection of 196 languages and attempt to evaluate its performance at identifying English, isiZulu and code-mixed questions.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here