AROT-COV23: A Dataset of 500K Original Arabic Tweets on COVID-19

4th Workshop on African Natural Language Processing 2023 · Cheng Xu, Nan Yan ·

This paper presents a dataset called AROT-COV23 (ARabic Original Tweets on COVID-19 as of 2023) containing about 500,000 original Arabic COVID-19-related tweets from January 2020 to January 2023. The dataset has been analyzed using a corpus-based approach to identify common themes and trends in the data and gain insights into the ways in which Arabic Twitter users have discussed the pandemic. The results of the analysis are also presented and discussed in terms of their implications for the field of Natural Language Processing (NLP) in Africa and for understanding the role of Twitter in the spread of COVID-19-related information in the region.

PDF Abstract