DLRG@DravidianLangTech-EACL2021: Transformer based approachfor Offensive Language Identification on Code-Mixed Tamil

EACL (DravidianLangTech) 2021 · Ratnavel Rajalakshmi, Yashwant Reddy, Lokesh Kumar ·

Internet advancements have made a huge impact on the communication pattern of people and their life style. People express their opinion on products, politics, movies etc. in social media. Even though, English is predominantly used, nowadays many people prefer to tweet in their native language and some- times by combining it with English. Sentiment analysis on such code-mixed tweets is challenging, due to large vocabulary, grammar and colloquial usage of many words. In this paper, the transformer based language model is applied to analyse the sentiment on Tanglish tweets, which is a combination of Tamil and English. This work has been submitted to the the shared task on DravidianLangTech- EACL2021. From the experimental results, it is shown that an F 1 score of 64% was achieved in detecting the hate speech in code-mixed Tamil-English tweets using bidirectional trans- former model.

PDF Abstract