The evaluation actually used for the TREC 2011
Entity track  REF task differed from what
the original guidelines indicated it would be.
In particular, judgments of unsupported were not
incorporated into the evaluation.

Runs were evaluated using standard trec_eval
with home pages of entities in the appropriate
relationship as the source entity marked relevant
and everything else marked not relevant.  Note that
this implies that a submission entry of
[homepage,supportdoc] that was judged as unsupported
was nonetheless counted as correct in the official evaluation.

The judgments came from two sources: answers found
by the assessors during topic development and pooled
results from participants.  System result pools were
created to depth 30.  All 12 runs submitted to the
task were pooled:  the union of distinct <docid,namestring>
pairs in the top 30 results of a topic across
all runs constituted the pool for that topic.
(Note that this was a mistake: it should have been
all distinct triples of <docid,namestring,supportdoc>
that formed the pools.)

There are two judgment files posted on the web site
for the REF task.  The standard trec_eval qrels
file (refqrels.txt) is of the form
   topic 0 docid judgment
where, as stated above, judgment is 1 if docid is
a correct homepage and 0 otherwise.

The "judgment file" (refjudge.txt) is of the form
   topic docid supportdoc judgment namestring nameclass namecorrectness
In this case, judgment can be either
    0: not a homepage of a correct entity,
    1: a homepage of a correct entity and
       supportdoc makes that clear     
   -2: a known homepage of a correct entity
       but supportdoc does not support that answer
Because of the mistake when building the pools,
the supportdoc that appears in the judgment file
is an arbitrary choice among all of the documents
returned as support for the same <docid,namestring> pair.
That is, if multiple runs returned the same
<docid,namestring> pair in the top 30 results but with
different support documents, only one of those support documents
will appear in the judgment file, and a judgment of
unsupported/relevant applies only to that particular
support document.

'namestring' is the name string supplied in the submission
(or "dummy" if no name was supplied).  'namecorrectness' is
    0: if that is not a correct name for the entity
       represented by the docid
    1: if it is a correct, but inexact, name for the entity;
    2: if it exactly correct.
'nameclass' is 0 unless multiple entities were judged relevant for
the topic.  If there were multiple relevant entities,
the nameclass variable is used to indicate whether particular
name-strings are synonyms for the same real-world entity.
That is, if two distinct strings are assigned the same
class value (when there are multiple class values) the
strings are aliases for the same entity.

This judgment file was not used in any scoring in the track,
but provides additional information as to the quality of
namestrings.  Naturally, there are judgments
only for those entries that made it into the pools.