Full description
The files in this repository form part of the UQV100 test collection and a set of additional related resources used in the creation of the collection. People will require their own access to the underlying ClueWeb12 corpus to access any of the documents referenced by id in this collection. The files in this test collection are as follows: uqv100-allfiles.zip : (zip archive) All of the files making up this archive and listed below, including this README.txt. uqv100-backstories.tsv : (tab-separated) The information narratives (backstories) created from TREC topic/subtopic descriptions, that were then used as seeds for surrogate users to provide queries and effort estimates. The first column contains the unique UQV100 id in the form of UQV100.xyz. An AuthorId column is present simply to retain original data sources. Other columns should be self explanatory by column name. uqv100-query-variations-and-estimates.tsv : (tab-separated) A master file containing raw and processed data that has an entry for each query variation obtained (post junk worker data cleaning), including raw, normalized, and normalized and spell- corrected variations of the queries. The individual document and query expectation estimates are provided, as well as per-backstory averages of these values. uqv100-item-data-raw-labels.tsv : (tab-separated) A file for all individual crowd judge labeled items, one line per judgment. This data was used as the input to the following aggregated label files. This data can also be used in experiments for computing alternative aggregate labels over the data; however, there is no gold standard "truth" to which these can be compared. Judge ids are anonymized, but consistent. (Caveat: judging of different pools of documents was done at different times, and with the additional qrels a single judge may end up with at least 2 anonymized ids, from the post-processing of their judging contributions to the pools.) The time taken for each judgment (in ms) is included. Items that were judged in the initial single-judge round for the depth 10 pooling of the Indri-BM25 system include the tag "notShown" for the RealUrl field; all subsequent judging also displayed the real URL of the corresponding document from ClueWeb12. All labeled documents are contained within the ClueWeb12-Category B dataset. uqv100-item-data-median-labels.tsv : (tab-separated) A file for all judged documents, with their median labels as aggregated from the up-to-3 crowd judges. Additional information is included, such as the backstory, and other potentially useful data such as the render and original URLs, judging label names, and the backstory. Documents from the pools that were unable to be labeled due to page load, login, foreign language or other issues are marked as -100 (Unjudged). All labeled documents are contained within the ClueWeb12-Category B dataset. uqv100-item-data-median-cbcc-bcc-majority-labels.tsv : (tab-separated) A file for all labeled documents, with their median labels as aggregated from the up-to-3 crowd judges. The output from three alternative label aggregation algorithms are also included. These are the Community-based Bayesian Classifier Combination (CommunityBCCLabel), the Bayesian Classifier Combination (BCCLabel), and a simple Majority Vote (MajorityVoteLabel). We are grateful to Matteo Venanzi for his implementation of these additional algorithms. The Community-based BCC algorithm estimated the ideal number of communities as 8. Additional information is included, such as the backstory, and other potentially useful data such as the render and original URLs, judging label names, and the backstory. In contrast to uqv100-item-data-median-labels.txt, only documents that were labeled are included; that is, there are no items with a -100 (Unjudged) tag associated with them. All labeled documents are contained within the ClueWeb12-Category B dataset. uqv100-item-data-median-cbcc-bcc-majority-labels-combined.tsv : (tab-separated) This file contains all the data from the previous file, plus the same format data applied to all the judged documents from the additional system runs included in this version of the test collection. (Indri-LM, Atire, Terrier-DFR and Terrier-PL2). Data is sorted by topic and docid. uqv100-qrels-median-labels.txt : (space-separated) A file in TREC Qrels format (no header) for the median labels as aggregated from up to 3 crowd judges. The backstory key (consisting of topic and subtopic or 0 if no subtopic), doc id, and label. Labels are expressed in the range -1 to 4, with a -100 if unjudged for some reason (e.g. page was not able to be loaded or required signin etc). All labeled documents are contained within the ClueWeb12- Category B dataset. uqv100-pool-docs-depth10.tsv : (tab-separated) A file containing the documents collected by the pooling algorithms, and including some additional data about the min rank position the doc was found at, and the total count of times it was provided to the pool from across the user query variations. Pooling was carried out from runs over ClueWeb12-Category B index. This pool file only relates to the initial Indri-BM25 system run. uqv100-pool-docs-depth11plus.tsv : (tab-separated) A file containing the documents collected by the pooling algorithms, post the initial depth-10 pooling. The primary selection mechanism used was to calculate those documents which would contribute most to reducing the residual uncertainty in the calculation of INST(T) for the runs. Pooling was carried out from runs over ClueWeb12-Category B index. This pool file only relates to the initial Indri-BM25 system run. uqv100-relevance-guidelines-v1.pdf : (PDF text) The initial set of guidelines used for an initial single-judge evaluation of the depth-10 pool of documents. This version of the guidelines is included only for reference; all subsequent judging used the following v2 guidelines document which showed the underlying document URL to the judge to help them form an assessment Abstract We describe the UQV100 test collection, designed to incorporate variability from users. Information need “backstories” were written for 100 topics (or sub-topics) from the TREC 2013 and 2014 Web Tracks. Crowd workers were asked to read the backstories, and provide the queries they would use; plus effort estimates of how many useful documents they would have to read to satisfy the need. A total of 10,835 queries were collected from 263 workers. After normalization and spell-correction, 5,764 unique variations remained; these were then used to construct a document pool via Indri-BM25 over the ClueWeb12 corpus. Qualified crowd workers made relevance judgments relative to the backstories, using a relevance scale similar to the original TREC approach; first to a pool depth of ten per query, then deeper on a set of targeted documents. The backstories, query variations, normalized and spell-corrected queries, effort estimates, run outputs, and relevance judgments are made available collectively as the UQV100 test collection. We also make available the judging guidelines and the gold hits we used for crowd-worker qualification and spam detection. We believe this test collection will unlock new opportunities for novel investigations and analysis, including for problems such as task-intent retrieval performance and consistency (independent of query variation), query clustering, query difficulty prediction, and relevance feedback, among others. Subjects
Information and Computing Sciences |
Information Retrieval and Web Search |
Library and Information Studies |
Normalized and spell-corrected queries |
Query clustering |
Query difficulty prediction |
Query variations |
Relevance feedback |
Relevance scale |
TREC |
Task-intent retrieval performance |
Test collection |
Variability |
User Contributed Tags
Login to tag this record with meaningful keywords to make it easier to discover
Identifiers
- Local : 35ac40f62ee4dd3a0cd4fafd446c0742
