Cranfield Tradition
Note the emphasis on comparative !!
- absolute value of effectiveness measures not meaningful
- absolute value changes as relevance judgments change
- theoretical maximum of 1.0 for both recall and precision not obtainable by humans (inter-assessor judgments suggest 65% precision at 65% recall)
- evaluation results are only comparable when they are from the same collection
- a subset of a collection is a different collection
- comparisons between different TREC collections are invalid