Echo Corpus

A large dataset of over 18,000,000 English tweets posted by ∼7K echo users was constructed in the following manner: 1. Base Corpus We have obtained access to a random sample of 10% of all public tweets posted in May and June 2016 – the peak use of the echo. 2. Raw Echo Corpus Searching the base corpus, we extracted all tweets containing the echo symbol, resulting in 803,539 tweets posted by 418,624 users. Filtering out non-English Tweets and users who used the echo less than three times we were left with ∼7K users. 3. Echo Corpus We used Twitter API to obtain the most recent tweets (up to 3.2K) of each of the users remainingin the English list. This process resulted in ∼18M tweets posted by 7,073 users. Some of the accounts we found using the echo were already suspended or deleted at the time of collection, thus their tweets were not retrievable. Relevant footnotes: - The echo is found in tweets written in multiple languages, particularly in East-Asian languages of which the user based is known for heavy use of ascii art and kaomoji (McCulloch 2019). - The data was collected in December 2016, amidst reports on the trending ‘echo’. Description taken from paper: Arviv, E., Hanouna, S., & Tsur, O. (2020). It's a Thin Line Between Love and Hate: Using the Echo in Modeling Dynamics of Racist Online Communities. ArXiv, abs/2012.01133.

Homepage