The TREC 2011 Session Track released 76 query sessions
for 61 topics (some topics had more than one session
corresponding to them).  Each topic had a number of subtopics 
to guide users over their search that resulted in the
released datasets. Thus, user queries in the sessions
released may correspond to one or more subtopics.

The relevance judgments (judgments.txt) file contains
judgments for all 61 topics.  A document was judged against
each subtopic as well as against the general topic.
Judging was conducted using a 5-grade scale:
    -2: page is a spam document
     0: not relevant
     1: relevant
     2: highly relevant
     3: for navigational subtopics, this is
        precisely the right page

The <current query> of each session over which runs were
submitted may correspond only to a particular subtopic of
a given topic.  For instance, the <current query> of 
session number 1, "peace corp application", clearly corresponds
to the subtopic "Find information about jobs with the
Peace Corps, such as criteria for applying, salary/stipend,
and available positions."  The track took two *extreme*
approaches in evaluating runs:
    (a) computing evaluation scores by counting as relevant
        all documents that are relevant to any subtopic and/or
        the general topic; if a document is relevant to more
        than one subtopic then the maximum grade is considered
        as the relevance grade of the document.  This is the
        "allsubtopic" condition.
    (b) computing evaluation scores by counting as relevant
        only those documents that are relevant to the subtopic(s)
        that the <current query> corresponds to; as before,
        if a query corresponds to more than one subtopic and a
        document is relevant to more than one of these subtopics
        the maximum grade is considered as the relevance grade
        of the document. This is the "lastquery" condition.

The mapping between <current query> and subtopic is provided
in the file "sessionlastquery_subtopic_map.txt", where each line
of the mapping file has the form <sessionID topicID subtopicID>.
The session_eval.pl evaluation script will evaluate a retrieval
result in session track format (a "run") when invoked in a
directory that contains the judgment file, subtopic mapping file,
and run file.  E.g.,
    perl session_eval -q 1 -qrels judgments.txt -runs myruns           
	(for allsubtopics results)
and
    perl session_eval -q 1 -s 1 -qrels judgments.txt -runs myruns
        (for lastquery results)
The script computes
    (a) Expected Reciprocal Rank (ERR) -- as defined by
        Chapelle et al. at CIKM 2009,
    (b) ERR@10,
    (c) ERR normalised by the maximum ERR per query (nERR)
    (d) nERR@10,
    (e) nDCG,
    (f) nDCG@10,
    (g) Average Precision (AP),
and (h) Graded Average Precision (GAP) -- as defined by
        Robertson et al. at SIGIR 2010.