Data

Data from: A Corpus for Entity Profiling in Microblog Posts

RMIT University, Australia
Damiano Spina (Associated with)
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=http://nlp.uned.es/~damiano/datasets/entityProfiling_ORM_Twitter.html&rft.title=Data from: A Corpus for Entity Profiling in Microblog Posts&rft.identifier=c0e59e9fe4c07c3976417b5f222f690b&rft.publisher=RMIT University, Australia&rft.description=In this page you can find the datasets presented in the paper A Corpus for Entity Profiling in Microblog Posts. It includes two manually annotated corpora to evaluate the task of identifying aspects on Twitter, both of them based upon the WePS-3 ORM task dataset. The aspects dataset has been annotated using a pooling methodology, for which we have implemented various methods for automatically extracting aspects from tweets that are relevant for an entity. The dataset is organized in the three following files: 1. aspects_terms_annotations.tsv: A tab-separated values file including the annotations. Each line corresponds to a term, while the columns include the entity name, the term itself, and the assesments given by the three judges (J1,J2 and J3). Assessments are encoded as follows: 1 = relevant, 2 = not relevant, 3 = competitor, 4 = unknown. 2. aspects_goldstandard_qrels: This file contains the terms annotated as relevant/competitor by two or more judges. It is a typical TREC qrels file, so it can be used as goldstandard in evaluation tools such as trec_eval. 3. aspects_queries_ids.tsv: A table that maps each query_id used in the qrels file above to the company name in the WePS-3 ORM task dataset.&rft.creator=Anonymous&rft.date=2018&rft_rights=All rights reserved&rft_rights=CC BY-NC: Attribution-Noncommercial 3.0 AU http://creativecommons.org/licenses/by-nc/3.0/au&rft_subject=Twitter&rft_subject=Automatic classification&rft_subject=Taxonomy&rft_subject=Real time processing&rft_subject=Information Retrieval and Web Search&rft_subject=INFORMATION AND COMPUTING SCIENCES&rft_subject=LIBRARY AND INFORMATION STUDIES&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Other view details
Unknown

CC BY-NC: Attribution-Noncommercial 3.0 AU
http://creativecommons.org/licenses/by-nc/3.0/au

All rights reserved

Access:

Other view details

Data available in link

Full description

In this page you can find the datasets presented in the paper A Corpus for Entity Profiling in Microblog Posts. It includes two manually annotated corpora to evaluate the task of identifying aspects on Twitter, both of them based upon the WePS-3 ORM task dataset. The aspects dataset has been annotated using a pooling methodology, for which we have implemented various methods for automatically extracting aspects from tweets that are relevant for an entity. The dataset is organized in the three following files: 1. aspects_terms_annotations.tsv: A tab-separated values file including the annotations. Each line corresponds to a term, while the columns include the entity name, the term itself, and the assesments given by the three judges (J1,J2 and J3). Assessments are encoded as follows: 1 = relevant, 2 = not relevant, 3 = competitor, 4 = unknown. 2. aspects_goldstandard_qrels: This file contains the terms annotated as relevant/competitor by two or more judges. It is a typical TREC qrels file, so it can be used as goldstandard in evaluation tools such as trec_eval. 3. aspects_queries_ids.tsv: A table that maps each query_id used in the qrels file above to the company name in the WePS-3 ORM task dataset.

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers
  • Local : c0e59e9fe4c07c3976417b5f222f690b