Full description
The Wand and Block-Max Wand code used for experiments in the WSDM 2017 paper "A comparison of document-at-a-time and score-at-a-time query evaluation". Since all of the systems were implemented to use the same underlying index, which is an index from the ATIRE search engine, this ATIRE index must be built before the Wand/BMW indexes can be built. We have provided an end-to-end script which will build the appropriate ATIRE index, construct the Wand/BMW indexes from this, and then run the subsequent queries. Please see: run_gov2.sh and run_clueweb_09.sh. Note that the ATIRE syntax differs between GOV2 and ClueWeb collections, and that the run_clueweb_09.sh script will work for ClueWeb12 too. Abstract We present an empirical comparison between document-at-a-time (DAAT) and score-at-a-time (SAATt) document ranking strategies within a common framework. Although both strategies have been extensively explored, the literature lacks a fair, direct comparison: such a study has been difficult due to vastly different query evaluation mechanics and index organizations. Our work controls for score quantization, document processing, compression, implementation language, implementation effort, and a number of details, arriving at an empirical evaluation that fairly characterizes the performance of three specific techniques: WAND (DAAT), BMW (DAAT), and JASS (SAAT). Experiments reveal a number of interesting findings. The performance gap between WAND and BMW is not as clear as the literature suggests, and both methods are susceptible to tail queries that may take orders of magnitude longer than the median query to execute. Surprisingly, approximate query evaluation in WAND and BMW does not significantly reduce the risk of these tail queries. Overall, JASS is slightly slower than either WAND or BMW, but exhibits much lower variance in query latencies and is much less susceptible to tail query effects. Furthermore, JASS query latency is not particularly sensitive to the retrieval depth, making it an appealing solution for performance-sensitive applications where bounds on query latencies are desirable Subjects
Efficiency |
Experimentation |
Information and Computing Sciences |
Information Retrieval and Web Search |
Library and Information Studies |
Measurement |
User Contributed Tags
Login to tag this record with meaningful keywords to make it easier to discover
Identifiers
- Local : cb43266c2cd7e0630d0baed1a3f31dba