Cranfield Tradition
Despite the abstraction, laboratory tests are useful
- evaluation technology is predictive (i.e., results transfer to operational settings)
- while different relevant sets produce different absolute scores, they almost always produce the same comparative score
- assumes comparing averages over sets of queries
- incomplete judgments ok if sample judged is unbiased with respect to systems tested