Automatic Detecting for Health-related Twitter Data with BioBERT

SMM4H (COLING) 2020  ·  Yang Bai, Xiaobing Zhou ·

Social media used for health applications usually contains a large amount of data posted by users, which brings various challenges to NLP, such as spoken language, spelling errors, novel/creative phrases, etc. In this paper, we describe our system submitted to SMM4H 2020: Social Media Mining for Health Applications Shared Task which consists of five sub-tasks. We participate in subtask 1, subtask 2-English, and subtask 5. Our final submitted approach is an ensemble of various fine-tuned transformer-based models. We illustrate that these approaches perform well in imbalanced datasets (For example, the class ratio is 1:10 in subtask 2), but our model performance is not good in extremely imbalanced datasets (For example, the class ratio is 1:400 in subtask 1). Finally, in subtask 1, our result is lower than the average score, in subtask 2-English, our result is higher than the average score, and in subtask 5, our result achieves the highest score. The code is available online.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here