Decision Track web site
The document set for the track is
ClueWeb12-B13.
Test topics
file containing raw judgments (3 judgments per doc)
file containing just the relevance judgments for use with standard trec_eval
file containing three judgments where efficacy is mapped to correctness for use with extended trec_eval
accepted medical opinion regarding efficacy of treatments
Assessors made three judgments per document: a relevance judgment, an effectiveness judgment, and a credibility judgment. Relevance was judged on a three-way scale:
- 0: not relevant
- 1: relevant
- 2: highly relevant
The other two judgments were made only if the document was judged to be Relevant or Highly Relevant.
Effectiveness judgments can have the following values:
- -2: should have been judged but mistakenly was not
- -1: relevance was 0, so not judged
- 0: judged as no info
- 1: judged ineffective
- 2: judged inconclusive
- 3: judged effective
Credibility judgments can have the following values:
- -2: should have been judged, but mistakenly was not
- -1: relevance was 0, so not judged
- 0: not credible
- 1: credible
The raw judgment file is in the format
topicid 0 docid relevance-judgment effectiveness-judgment credibility-judgment
The qrels file containing just the relevance judgments ("2019qrels_relevance.txt")
is a standard trec_eval qrels file containing only the relevance judgment.
The file in which efficacy is mapped to correctness ("2019qrels_correctness.txt")
is a three-judgments qrels file where the efficacy judgment has been mapped
to a correctness aspect. This qrels file is the judgment file to use with the
extended trec_eval program that computes three-aspect measures. Correctness is
a match between the generally accepted medical opinion for the question asked
in the topic and the document's claim for that effectiveness.
In the file containing the accepted opinions ("2019topics_efficacy.txt"),
-1 means the treatment is believed to be Not Helpful, 0 means that the evidence
is Inconclusive, and 1 means the treatment is believed to be Helpful.
While there are 51 topics in the set of test topics, there are judgments for
only 50 topics (topic 14 was not assessed due to budget constraints).
|