Guidelines for TREC-8

Return to the TREC home page TREC home Return to the TREC Active Participants home page Active Participants home Return to the TREC-8 Guidelines home page TREC Guidelines home          National Institute of Standards and Technology Home Page




Guidelines for Constructing and Manipulating the System Data Structures

The system data structures are defined to consist of the original documents, any new structures built automatically from the documents (such as inverted files, thesaurii, conceptual networks, etc.), and any new structures built manually from the documents (such as thesaurii, synonym lists, knowledge bases, rules, etc.).
  1. System data structures can be built using the initial training data (the documents, the topics, and the relevance judgments). These structures may not be modified in response to the new ad hoc test topics. For example, you can't add topic words that are not in your dictionary, nor may the system data structures be based in any way on the results of retrieving documents for the test topics and having a human look at the retrieved documents. The ad hoc task represents the real-world problem of an ordinary user posing a question to the system. If an ordinary user couldn't make the change to the system, you should not make it after receiving the topics. A corollary of this rule is that your system may not be tuned to the testtopics.

Guidelines for Constructing the Queries

There are many possible methods for converting the supplied topics into queries that your system can execute. We have broadly defined two generic methods, "automatic" and "manual", based on whether manual intervention is used. When more than one set of results are submitted, the different sets may correspond to different query construction methods, or if desired, can be variants within the same method.

The manual query construction method includes BOTH runs in which the queries are constructed manually and then run without looking at the results AND runs in which the results are used to alter the queries using some manual operation. The distinction is being made here between runs in which there is no human involvement (automatic query construction) and runs in which there is some type of human involvement (manual query construction). These same query construction definitions apply to those tracks for which it is appropriate (cross-language, high precision, etc).

To further clarify this, here are some example query construction methodologies, and their correct query construction classification. Note that these are only examples; many other methods may be used for automatic or manual query construction.
  1. queries constructed automatically from the topics, results of these queries sent to NIST --> automatic query construction
  2. queries constructed automatically from the topics, then expanded by a method that takes terms automatically from the top 30 documents (no human involved) --> automatic query construction
  3. queries constructed manually from the topics, results of these queries sent to NIST --> manual query construction
  4. queries constructed automatically from the topics, then modified by human selection of terms suggested from the top 30 documents --> manual query construction
  5. queries constructed manually from the topics, then queries "tweaked" based on looking at the results --> manual query construction

Note that by including all types of human-involved runs in the manual query construction method we make it harder to do comparisons of work within this query construction method. Therefore groups are strongly encouraged (as in the past) to determine what constitutes a base run for their experiments and to do these runs (officially or unofficially) to allow useful interpretations of the results. (For those of you new to TREC, unofficial runs are those not turned in to NIST but evaluated using the
trec_eval (.gz) (trec_eval_latest.tar.gz). See previous TREC papers such as the ones from UMass, City, or Cornell (on the TREC web site) for good examples of the use of base runs.) Additionally it is very important to fill in the portion of the system descriptions that deal with timing and type of person doing the manual query construction so there can be some measure of manual effort required for these runs.

Last updated: Tuesday, 08-Aug-2006 15:52:43 UTC
Date created: Tuesday, 01-Aug-00
[email protected]