The TREC 2011 Session Track released 76 query sessions for 61 topics (some topics had more than one session corresponding to them). Each topic had a number of subtopics to guide users over their search that resulted in the released datasets. Thus, user queries in the sessions released may correspond to one or more subtopics. The relevance judgments (judgments.txt) file contains judgments for all 61 topics. A document was judged against each subtopic as well as against the general topic. Judging was conducted using a 5-grade scale: -2: page is a spam document 0: not relevant 1: relevant 2: highly relevant 3: for navigational subtopics, this is precisely the right page The of each session over which runs were submitted may correspond only to a particular subtopic of a given topic. For instance, the of session number 1, "peace corp application", clearly corresponds to the subtopic "Find information about jobs with the Peace Corps, such as criteria for applying, salary/stipend, and available positions." The track took two *extreme* approaches in evaluating runs: (a) computing evaluation scores by counting as relevant all documents that are relevant to any subtopic and/or the general topic; if a document is relevant to more than one subtopic then the maximum grade is considered as the relevance grade of the document. This is the "allsubtopic" condition. (b) computing evaluation scores by counting as relevant only those documents that are relevant to the subtopic(s) that the corresponds to; as before, if a query corresponds to more than one subtopic and a document is relevant to more than one of these subtopics the maximum grade is considered as the relevance grade of the document. This is the "lastquery" condition. The mapping between and subtopic is provided in the file "sessionlastquery_subtopic_map.txt", where each line of the mapping file has the form . The session_eval.pl evaluation script will evaluate a retrieval result in session track format (a "run") when invoked in a directory that contains the judgment file, subtopic mapping file, and run file. E.g., perl session_eval -q 1 -qrels judgments.txt -runs myruns (for allsubtopics results) and perl session_eval -q 1 -s 1 -qrels judgments.txt -runs myruns (for lastquery results) The script computes (a) Expected Reciprocal Rank (ERR) -- as defined by Chapelle et al. at CIKM 2009, (b) ERR@10, (c) ERR normalised by the maximum ERR per query (nERR) (d) nERR@10, (e) nDCG, (f) nDCG@10, (g) Average Precision (AP), and (h) Graded Average Precision (GAP) -- as defined by Robertson et al. at SIGIR 2010.