Data from: A Corpus for Entity Profiling in Microblog Posts

RMIT University, Australia

Damiano Spina (Associated with)

Full description

In this page you can find the datasets presented in the paper A Corpus for Entity Profiling in Microblog Posts. It includes two manually annotated corpora to evaluate the task of identifying aspects on Twitter, both of them based upon the WePS-3 ORM task dataset. The aspects dataset has been annotated using a pooling methodology, for which we have implemented various methods for automatically extracting aspects from tweets that are relevant for an entity. The dataset is organized in the three following files: 1. aspects_terms_annotations.tsv: A tab-separated values file including the annotations. Each line corresponds to a term, while the columns include the entity name, the term itself, and the assesments given by the three judges (J1,J2 and J3). Assessments are encoded as follows: 1 = relevant, 2 = not relevant, 3 = competitor, 4 = unknown. 2. aspects_goldstandard_qrels: This file contains the terms annotated as relevant/competitor by two or more judges. It is a typical TREC qrels file, so it can be used as goldstandard in evaluation tools such as trec_eval. 3. aspects_queries_ids.tsv: A table that maps each query_id used in the qrels file above to the company name in the WePS-3 ORM task dataset.

This dataset is part of a larger collection

Click to explore relationships graph

Subjects

User Contributed Tags

Identifiers

Local : c0e59e9fe4c07c3976417b5f222f690b

Data from: A Corpus for Entity Profiling in Microblog Posts

Licence & Rights:

Access:

Full description

This dataset is part of a larger collection

User Contributed Tags

Data from: A Corpus for Entity Profiling in Microblog Posts

Licence & Rights:

Access:

Full description

This dataset is part of a larger collection

Related People

Related Websites

User Contributed Tags