TREC 2024 Lateral Reading Track – Evaluation Data Files README

The TREC 2024 Lateral Reading Track focuses on supporting a lateral reading approach to help readers evaluate the trustworthiness of online news. It features two tasks: Task 1 (Question Generation) and Task 2 (Document Retrieval). Task 1 required participants to generate questions that a reader should ask when assessing the credibility of a given news article. Task 2 required participants to retrieve relevant documents from the ClueWeb22-B-English web corpus that would help answer those questions. The track used 50 topics (target news articles, each representing a widely discussed event from 2021–2022). For more information, see the official track website: https://trec-lateral-reading.github.io/.

File Descriptions:

- assessor_questions.csv: This CSV file contains questions written by NIST assessors for each of the 50 topics. These questions were created according to lateral reading principles outlined in the assessment guidelines. Each topic had a primary assessor (and up to two secondary assessors) who wrote up to 10 questions about that article. Due to time and budget constraints, some topics have fewer than three sets of assessor questions (i.e., only one or two assessors contributed). The questions were manually cleaned to remove spelling and grammatical errors. Assessors are anonymized in this file. The CSV columns are: topic ID, assessor ID, rank (1–10, the question's rank as ordered by that assessor), and question text.

- 2024-question-assessment.txt: This text file contains the official evaluation judgments for Task 1 (Question Generation). Each line corresponds to one question that was evaluated, with fields separated by tabs: topic ID, document ID (the ClueWeb22 ID of the topic article), run index (the order that run appears in the list of runs to be assessors), run ID (the submission tag), question rank (the question's position in that run), quality score, redundancy score, and the question text. For each article, we randomized the order of runs during evaluation to mitigate the ordering bias/effect. The quality score is an integer from -1 to 4 indicating how well the question meets the criteria: -1 = "Flawed", 0 = "Not Helpful", 1 = "Okay", 2 = "Good", 3 = "Very Good", 4 = "Excellent". The redundancy score denotes if the question was marked as redundant (0 = "No", 1 = "Yes"), but note that these redundancy judgments were not used in the final evaluation scoring. (For details on the assessment criteria, see the assessor guidelines: https://trec-lateral-reading.github.io/assessing_instructions.pdf.)

- 2024-retrieval-qrels.txt: This file contains graded relevance judgments for Task 2 (Document Retrieval). It follows the standard TREC qrels format: each line has a question ID, a placeholder (e.g., "0"), a document ID, and a relevance score. The first part of the question ID is the document ID of the topic news article. The document ID is the ClueWeb22-B identifier of a retrieved document. The relevance score is a grade in {0, 1, 2} indicating how useful that document was for answering the topic's question. Specifically, 0 = "Not Useful" (irrelevant or not helpful), 1 = "Useful" (contains some relevant information), and 2 = "Very Useful" (directly answers the question or provides key evidence).

Notes:

- Partial assessor coverage: Not all topics have three full sets of assessor questions; some have only one or two assessor question lists due to limited assessor availability.
- Assessor questions not used in 2024 evaluation: The assessor-written questions (in assessor_questions.csv) were not used to evaluate participant submissions in 2024. These questions are provided for reference and future use.
- Redundancy labels excluded from scoring: The redundancy judgments in the question assessment file were collected but found to be inconsistent, so they were excluded from the official Task 1 scoring and metrics.
- Underlying corpus: All document retrieval results are based on the ClueWeb22-B-English corpus, which consists of approximately 87 million English web pages from early 2022.