Data

Real-Time Classification of Twitter Trends Dataset

RMIT University, Australia
Damiano Spina (Principal investigator)
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=http://nlp.uned.es/~damiano/datasets/TT-classification.html&rft.title=Real-Time Classification of Twitter Trends Dataset&rft.identifier=eb5aef0a3b2dbc4b87da4b71f44e2028&rft.publisher=RMIT University, Australia&rft.description=Abstract: In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with the following 4 types: news, ongoing events, memes, and commemoratives. While previous research has analyzed trending topics over the long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This allows us to provide a filtered subset of trends to end users. We experiment with a set of straightforward language-independent features based on the social spread of trends and categorize them using the typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real-time, or to quickly identify viral memes that might inform marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend-setters. This tar.gz Dataset package contains: README: A README file describing the collection (similar to this webpage). TT-annotations.csv: A comma-separated-value file containing the 1,036 annotated trending topics. Each line corresponds to a trending topic and has four columns: a md5 hash (used as ID to identify the tweets associated to each TT), the date when the TT has been crawled (in the yyyyMMdd format), the trending topic name (as it appears on Twitter) and the manual annotation. The manual annotation consists in one of the four classes in the taxonomy: news, ongoing-event, meme or commemorative. tweets: The tweets folder contains the tweets associated to each of the trending topics in the TT-annotations.csv file described above. Each file (named with a TT md5 hash) corresponds to one trending topic. The files in this folder are in a similar format as the TREC Microblog Corpus (tab-separated-value files where the first column contains the tweet ID and the second the author's screen name). In order to respect Twitter's TOS, tweets are not redistributed and only tweets ids and author screen names are provided. Tweet texts can be downloaded by using any of the following tools: SemEval-2013 Task 2 Download script (in Python) http://www.cs.york.ac.uk/semeval-2013/task2/index.php?id=data RepLab 2013 Twitter Texts Downloader (in Java) http://nlp.uned.es/replab2013/replab2013_twitter_texts_downloader_latest.tar.gz TREC Microblog Track (in Java) https://github.com/lintool/twitter-tools &rft.creator=Damiano Spina&rft.date=2018&rft.relation=https://dx.doi.org/10.1002/asi.23186&rft_rights=All rights reserved&rft_rights=CC BY-NC: Attribution-Noncommercial 3.0 AU http://creativecommons.org/licenses/by-nc/3.0/au&rft_subject=Automatic classification&rft_subject=Taxonomy&rft_subject=Real time processing&rft_subject=Information Retrieval and Web Search&rft_subject=INFORMATION AND COMPUTING SCIENCES&rft_subject=LIBRARY AND INFORMATION STUDIES&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Other view details
Unknown

CC BY-NC: Attribution-Noncommercial 3.0 AU
http://creativecommons.org/licenses/by-nc/3.0/au

All rights reserved

Access:

Other view details

Data available in link

Full description

Abstract: In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with the following 4 types: news, ongoing events, memes, and commemoratives. While previous research has analyzed trending topics over the long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This allows us to provide a filtered subset of trends to end users. We experiment with a set of straightforward language-independent features based on the social spread of trends and categorize them using the typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real-time, or to quickly identify viral memes that might inform marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend-setters.

This tar.gz Dataset package contains:

README: A README file describing the collection (similar to this webpage).
TT-annotations.csv: A comma-separated-value file containing the 1,036 annotated trending topics. Each line corresponds to a trending topic and has four columns: a md5 hash (used as ID to identify the tweets associated to each TT), the date when the TT has been crawled (in the yyyyMMdd format), the trending topic name (as it appears on Twitter) and the manual annotation. The manual annotation consists in one of the four classes in the taxonomy: news, ongoing-event, meme or commemorative.
tweets: The tweets folder contains the tweets associated to each of the trending topics in the TT-annotations.csv file described above. Each file (named with a TT md5 hash) corresponds to one trending topic. The files in this folder are in a similar format as the TREC Microblog Corpus (tab-separated-value files where the first column contains the tweet ID and the second the author's screen name).
In order to respect Twitter's TOS, tweets are not redistributed and only tweets ids and author screen names are provided. Tweet texts can be downloaded by using any of the following tools:

SemEval-2013 Task 2 Download script (in Python)
http://www.cs.york.ac.uk/semeval-2013/task2/index.php?id=data
RepLab 2013 Twitter Texts Downloader (in Java)
http://nlp.uned.es/replab2013/replab2013_twitter_texts_downloader_latest.tar.gz
TREC Microblog Track (in Java)
https://github.com/lintool/twitter-tools

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers
  • Local : eb5aef0a3b2dbc4b87da4b71f44e2028