The evaluation actually used for the TREC 2011 Entity track REF task differed from what the original guidelines indicated it would be. In particular, judgments of unsupported were not incorporated into the evaluation. Runs were evaluated using standard trec_eval with home pages of entities in the appropriate relationship as the source entity marked relevant and everything else marked not relevant. Note that this implies that a submission entry of [homepage,supportdoc] that was judged as unsupported was nonetheless counted as correct in the official evaluation. The judgments came from two sources: answers found by the assessors during topic development and pooled results from participants. System result pools were created to depth 30. All 12 runs submitted to the task were pooled: the union of distinct pairs in the top 30 results of a topic across all runs constituted the pool for that topic. (Note that this was a mistake: it should have been all distinct triples of that formed the pools.) There are two judgment files posted on the web site for the REF task. The standard trec_eval qrels file (refqrels.txt) is of the form topic 0 docid judgment where, as stated above, judgment is 1 if docid is a correct homepage and 0 otherwise. The "judgment file" (refjudge.txt) is of the form topic docid supportdoc judgment namestring nameclass namecorrectness In this case, judgment can be either 0: not a homepage of a correct entity, 1: a homepage of a correct entity and supportdoc makes that clear -2: a known homepage of a correct entity but supportdoc does not support that answer Because of the mistake when building the pools, the supportdoc that appears in the judgment file is an arbitrary choice among all of the documents returned as support for the same pair. That is, if multiple runs returned the same pair in the top 30 results but with different support documents, only one of those support documents will appear in the judgment file, and a judgment of unsupported/relevant applies only to that particular support document. 'namestring' is the name string supplied in the submission (or "dummy" if no name was supplied). 'namecorrectness' is 0: if that is not a correct name for the entity represented by the docid 1: if it is a correct, but inexact, name for the entity; 2: if it exactly correct. 'nameclass' is 0 unless multiple entities were judged relevant for the topic. If there were multiple relevant entities, the nameclass variable is used to indicate whether particular name-strings are synonyms for the same real-world entity. That is, if two distinct strings are assigned the same class value (when there are multiple class values) the strings are aliases for the same entity. This judgment file was not used in any scoring in the track, but provides additional information as to the quality of namestrings. Naturally, there are judgments only for those entries that made it into the pools.