The Session Track released 133 query sessions for 69 topics (some topics had more than one sessions corresponding to them), with only the first 87 sessions, covering 49 topics, used for evaluation. NIST provided judgments for all 49 topics. All submitted runs were pooled and the depth-10 set of URLs were judged against the general topic. Judging was conducted in a 6-grades scale: spam (-2), not relevant (0), relevant (1), highly relevant (4), key (2), and navigational (3). When computing different measures the above relevance judgments were mapped to: spam -> 0, not relevant -> 0, relevant -> 1, highly relevant -> 2, key -> 3, and navigational -> 4. Different from Session 2011, and similar to Session 2012, relevance was defined against the entire topic and not against different subtopics (due to the nature of this years topics). *** Note that we did not apply any special treatment to duplicate documents, i.e. documents in the ranked lists for the current query that have been returned (and clicked by users) previously in the session. Based on the qrels provided by NIST, we evaluated runs by eight measures, (a) Average Precision (average_precision)xi (b) Expected Reciprocal Rank (err) -- as defined by Chapelle et al. at CIKM 2009, (c) ERR@10 (err_at_k), (d) nDCG (ndcg),xi (e) nDCG@10 (ndcg_at_k), (f) ERR normalised by the maximum ERR per query (nerr), (g) nERR@10 (nerr_at_k), and (h) Precision@10 (precision_at_k)