TREC disks 4 and 5

This document set includes material from the Financial Times Limited (1991, 1992, 1993, 1994), the Congressional Record of the 103rd Congress (1993), the Federal Register (1994), the Foreign Broadcast Information Service (1996), and the Los Angeles Times (1989, 1990).

The document set has been used in a variety of TREC and TAC tasks, including TREC ad hoc collections 6--8, TRECs 8-9 question answering track and the TREC Robust track.

Some of the documents in the data set are copyrighted by the original source of the material and thus the documents must be licensed for research use. The organizational agreement linked below describes the conditions under which the document set may be obtained. To receive access to the dataset, send a request to NIST with the signed organizational agreement attached.

Organizational agreement
This agreement must be signed by the person responsible for the data at your organization, and sent to NIST.
Individual agreement
This agreement must be signed by all researchers using the TREC Collection at your organization, and kept on file at your organization.

Getting the corpus

  1. Download and print the Organizational and Individual agreement forms above.
  2. Send a scan of the Organizational form to NIST to:
    In your email include the following:
    Subject: request for TREC disks 4 and 5
  3. Complete and keep the individual agreement form on file at your organization.
  4. Subject to our approval, a download URL, login, and password will be sent via email. Please allow seven business days for a response.

This page created on February 5, 2019
Last updated on Wednesday, 17-Jul-2019 11:29:52 MDT