Detecting Bot Behaviour in Social Media using Digital DNA Compression

A major challenge faced by online social networks such as Facebook and Twitter is the remarkable rise of fake and automated bot accounts over the last few years. Some of these accounts have been reported to engage in undesirable activities such as spamming, political campaigning and spreading falsehood on the platform. We present an approach to detect bot-like behaviour among Twitter accounts by analyzing their past tweeting activity. We build upon an existing technique of analysis of Twitter accounts called Digital DNA. Digital DNA models the behaviour of Twitter accounts by encoding the post history of a user account as a sequence of characters analogous to an actual DNA sequence. In our approach, we employ a lossless compression algorithm on these Digital DNA sequences and use the compression statistics as a measure of predictability in the behaviour of a group of Twitter accounts. We leverage the information conveyed by the compression statistics to visually represent the posting behaviour by a simple two dimensional scatter plot and categorize the user accounts as bots and genuine users by using an off-the-shelf implementation of the logistic regression classification algorithm.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Twitter Bot Detection MIB Dataset DNA String Compression - Compression Ratio Accuracy 0.984 # 1

Methods