Data - Non-English Documents

Return to the TREC home page TREC home Return to the TREC Data home page Data home          National Institute of Standards and Technology Home Page

Below are titles of collections other than English, that are, or have been, used for TREC purposes. In many cases, these collections have been created for a specific task, and are not as broadly available or supported as the English collections.

Topics and relevance assessments are available at http://trec.nist.gov/data.html for TREC participants.

The Chinese collection consists of a collection of articles selected from the Peoples Daily newspaper and the Xinhua newswire. Collection LDC2000T52 (TREC Mandarin) must be obtained from the Linguistic Data Consortium. Do not use LDC95T13 (Mandarin Chinese News Text), this is a different version.

The Spanish collection consists of a Mexican newspaper from Monterey (El Norte) and additional text from the 1994 newswire from Agence France Presse. LD2000T51 Spanish News Text must be obtained from the Linguistic Data Consortium.

Last updated: Thursday, 23-Feb-2017 11:04:21 MST
Date created: Tuesday, 01-Aug-00
trec@nist.gov