Consistency
Mean Kendall t between system rankings produced from different qrel sets: .938
Similar results held for
different query sets
different evaluation measures
different assessor types
single opinion vs. group opinion judgments
Previous slide
Next slide
Back to first slide
View graphic version