Interactive Retrieval Evaluation
Very difficult to do well
Two particular problems
- modern systems are too good:
- effectiveness measures limited by user agreement with relevance judgments
- usually assumes naïve users
- variation among user performance enormous
- isn’t realistic