Below are titles of non-English collections that have been
used for TREC purposes. In many cases, these collections have
been created for a specific task, and are not as broadly available
or supported as the English collections.
Topics and
relevance assessments are also available.
- The Arabic collection consists of a collection of articles selected
from the Agence France Presse (AFP) Arabic newswire.
Collection LDC2001T55 (Arabic Newswire Part 1) must be obtained from
the Linguistic Data Consortium.
- The Chinese collection consists of a collection of articles selected from the Peoples Daily
newspaper and the Xinhua newswire.
Collection LDC2000T52 (TREC Mandarin) must be obtained from the Linguistic Data Consortium. Do not use LDC95T13 (Mandarin Chinese News Text), this is a different version.
- The Spanish collection consists of a Mexican newspaper
from Monterey (El Norte) and additional text from the 1994 newswire
from Agence France Presse. LD2000T51 Spanish News Text must be obtained from the Linguistic Data Consortium.
-
The set of documents used in the TRECs 6-8 Cross-Language track consisting of
documents from Schweìzerìsche Depeschenagentur (English, German, French, and Italian) are no longer available.
|