Full description
In this page you can find the datasets presented in the paper A Corpus for Entity Profiling in Microblog Posts. It includes two manually annotated corpora to evaluate the task of identifying aspects on Twitter, both of them based upon the WePS-3 ORM task dataset. The aspects dataset has been annotated using a pooling methodology, for which we have implemented various methods for automatically extracting aspects from tweets that are relevant for an entity. The dataset is organized in the three following files: 1. aspects_terms_annotations.tsv: A tab-separated values file including the annotations. Each line corresponds to a term, while the columns include the entity name, the term itself, and the assesments given by the three judges (J1,J2 and J3). Assessments are encoded as follows: 1 = relevant, 2 = not relevant, 3 = competitor, 4 = unknown. 2. aspects_goldstandard_qrels: This file contains the terms annotated as relevant/competitor by two or more judges. It is a typical TREC qrels file, so it can be used as goldstandard in evaluation tools such as trec_eval. 3. aspects_queries_ids.tsv: A table that maps each query_id used in the qrels file above to the company name in the WePS-3 ORM task dataset. Subjects
Automatic classification |
Information and Computing Sciences |
Information Retrieval and Web Search |
Library and Information Studies |
Real time processing |
Taxonomy |
Twitter |
User Contributed Tags
Login to tag this record with meaningful keywords to make it easier to discover
Identifiers
- Local : c0e59e9fe4c07c3976417b5f222f690b