TREC 2002 Novelty Track Guidelines
SummaryThe TREC 2002 novelty track is designed to investigate systems' abilities to locate relevant AND new information within the ranked set of documents retrieved in answer to a TREC topic. This track is new for TREC 2002 and should be regarded as an interesting (and hopefully fun) learning experience.
GoalCurrently systems return ranked lists of documents as the answer for an information request. The TREC question-answering track takes this a major step forward, but only for direct questions and only for short, fact-based questions. Another approach to providing answers would be to return only new AND relevant sentences (within context) rather than whole documents containing duplicate and extraneous information.
A possible application scenario here would be to envision a smart "next" button that walked a user down the ranked list by hitting the next new and relevant sentence. The user could then view that sentence and if interested, also read the surrounding sentences. Alternatively this task could be viewed as finding key sentences that could be useful as "hot spots" for collecting information to summarize an answer of length X to an information request.
TasksParticipants will return two lists of doc id/sentence number pairs for each topic, one list corresponding to all the relevant sentences and the second list (a subset of the first) containing only those sentences that contain new information.
Further detailsThe document id/sentence numbers must be order of occurrence for each document in the ranked lists. Documents with no relevant/relevant-new sentences should not be included in the appropriate lists.
Format of resultsThe format of the results should be useful for machine processing rather than human readability, so the format of the sample relevant and new files is not appropriate. We will be using a variation of the TREC ad hoc format.
Input DataThere are 50 topics taken from TRECs 6, 7, and 8 (topics 300-450). The documents are a subset of the relevant documents for those topics (all documents are from disks 4 and 5). Participants are provided with a ranked list of relevant documents, with between 10 and 25 relevant documents per topic. Additionally these documents are automatically segmented into sentences.
The documents are on a protected web site located at http://trec.nist.gov/novelty/. The web site is protected since it contains document text and we must be sure you have legitimate access to the document text before you can access it. To get the access sequence for the protected site, send an email message to Lori Buckland, firstname.lastname@example.org requesting access. Lori will check our records to make sure we have signed data use forms for the TREC disks (disks 4&5) from your organization and respond with the access sequence. Please note that this is a manual process, and Lori will respond to requests during her normal mail-answering routine. Do not expect an instantaneous response. In particular, do not wait until the night before the deadline and expect to get access to the test data.
Further detailsThe topics are a modified version of the original TREC topic statement for each topic This version includes the original topic fields plus two new fields. See the sample area of the web site for 4 examples.
The first field, tagged
The second new field is
Each document set is the set/subset of relevant documents to be used for mining sentences. These documents have been automatically segmented with the tags as used in the sample documents on the web site.
Restrictions on how to do the taskThis task should be done completely automatically. Any fields in the topic can be used; any other resources may be used. It should be assumed that the set of relevant documents are available as an ordered set, i.e. the entire set may be used in deciding the sentence sets. However the topics must be processed independently. Both these restrictions reflect the reality of the application.
EvaluationNIST had the TREC assessors do this task manually, creating first a file of relevant sentences and then reducing that file to those that are new. Both files were saved. Two different assessors did this task for each topic. See the attached instructions to assessors.
ScoringThe sentences selected manually by the NIST assessors will be considered the truth data. To avoid confusion, this set of sentences are called RELEVANT in the discussion below. Agreement between these sentences and those found by the systems will be used as input for recall and precision.
Because of issues in agreement between the two humans, several variations of metrics will be used. The variations to be used will deal with how the RELEVANT (new-RELEVANT) are chosen.
Obviously we could chose one set of assessors as the official one (similar to TREC ad hoc), and use the second set only for human agreement measurements. But alternatively we could use any of the following combinations:
Human1, Human2, Union12, Intersection12, Minimum(Human1,Human2), same for Maximum, some edit distance function from one of these
We will use the Minimum as the main score, with NIST also computing some of the others for comparison purposes. The reason for this is that the biggest disagreement between assessors had to do with how many sequential sentences they took as relevant. Often they included sentences "for context", even though the instructions tried to discourage this. By taking the minimum of two assessors, we can avoid many of the disagreements. This definition also matches better with the stated goals of the track.
Last updated:Wednesday, 12-Feb-2003 11:33:11 MST
Date created: Wednesday, 19-June-02