QA Judgments
Judging correctness, not relevance
Assessors have differences of opinions as to what constitutes a correct answer
granularity of names, dates
assumed context
Comparative evaluation stable despite those differences
Previous slide
Next slide
Back to first slide
View graphic version