Offensive language detection in Arabic using ULMFiT

LREC 2020  ·  Mohamed Abdellatif, Ahmed Elgammal ·

In this paper, we approach the shared task OffenseEval 2020 by Mubarak et al. (2020) using ULMFiT Howard and Ruder (2018) pre-trained on Arabic Wikipedia Khooli (2019) which we use as a starting point and use the target data-set to fine-tune it. The data set of the task is highly imbalanced. We train forward and backward models and ensemble the results. We report confusion matrix, accuracy, precision, recall and F1 of the development set and report summarized results of the test set. Transfer learning method using ULMFiT shows potential for Arabic text classification. Mubarak, K. Darwish,W. Magdy, T. Elsayed, and H. Al-Khalifa. Overview of osact4 arabic offensive language detection shared task. 4, 2020. Howard and S. Ruder. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146, 2018. Khooli. Applied data science. https://github.com/abedkhooli/ds2, 2019.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods