Full description
Abstract: In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with the following 4 types: news, ongoing events, memes, and commemoratives. While previous research has analyzed trending topics over the long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This allows us to provide a filtered subset of trends to end users. We experiment with a set of straightforward language-independent features based on the social spread of trends and categorize them using the typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real-time, or to quickly identify viral memes that might inform marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend-setters. This tar.gz Dataset package contains: README: A README file describing the collection (similar to this webpage). TT-annotations.csv: A comma-separated-value file containing the 1,036 annotated trending topics. Each line corresponds to a trending topic and has four columns: a md5 hash (used as ID to identify the tweets associated to each TT), the date when the TT has been crawled (in the yyyyMMdd format), the trending topic name (as it appears on Twitter) and the manual annotation. The manual annotation consists in one of the four classes in the taxonomy: news, ongoing-event, meme or commemorative. tweets: The tweets folder contains the tweets associated to each of the trending topics in the TT-annotations.csv file described above. Each file (named with a TT md5 hash) corresponds to one trending topic. The files in this folder are in a similar format as the TREC Microblog Corpus (tab-separated-value files where the first column contains the tweet ID and the second the author's screen name). In order to respect Twitter's TOS, tweets are not redistributed and only tweets ids and author screen names are provided. Tweet texts can be downloaded by using any of the following tools: SemEval-2013 Task 2 Download script (in Python) http://www.cs.york.ac.uk/semeval-2013/task2/index.php?id=data RepLab 2013 Twitter Texts Downloader (in Java) http://nlp.uned.es/replab2013/replab2013_twitter_texts_downloader_latest.tar.gz TREC Microblog Track (in Java) https://github.com/lintool/twitter-tools Subjects
Automatic classification |
Information and Computing Sciences |
Information Retrieval and Web Search |
Library and Information Studies |
Real time processing |
Taxonomy |
User Contributed Tags
Login to tag this record with meaningful keywords to make it easier to discover
Identifiers
- Local : eb5aef0a3b2dbc4b87da4b71f44e2028