TREC-9 Track Evaluation
One assessor judged pool for each question
- 3-way judgments: incorrect, correct, not supported
- for “strict” evaluation, not supported wrong
- mean pool size: 309.4 pairs
- same criteria as last year:
- no distracting information
- units required
- answers supported by document accepted even if document is wrong