TREC 2002 Novelty Track Guidelines

Return to the TREC home page TREC home          National Institute of Standards and Technology Home Page


Summary

The TREC 2002 novelty track is designed to investigate systems' abilities to locate relevant AND new information within the ranked set of documents retrieved in answer to a TREC topic. This track is new for TREC 2002 and should be regarded as an interesting (and hopefully fun) learning experience.

Test data released: June 21, 2002
Result due date: September 3, 2002
Runs allowed: maximum of 5 runs per group


Goal

Currently systems return ranked lists of documents as the answer for an information request. The TREC question-answering track takes this a major step forward, but only for direct questions and only for short, fact-based questions. Another approach to providing answers would be to return only new AND relevant sentences (within context) rather than whole documents containing duplicate and extraneous information.

A possible application scenario here would be to envision a smart "next" button that walked a user down the ranked list by hitting the next new and relevant sentence. The user could then view that sentence and if interested, also read the surrounding sentences. Alternatively this task could be viewed as finding key sentences that could be useful as "hot spots" for collecting information to summarize an answer of length X to an information request.

Tasks

Participants will return two lists of doc id/sentence number pairs for each topic, one list corresponding to all the relevant sentences and the second list (a subset of the first) containing only those sentences that contain new information.

Further details

The document id/sentence numbers must be order of occurrence for each document in the ranked lists. Documents with no relevant/relevant-new sentences should not be included in the appropriate lists.

Format of results

The format of the results should be useful for machine processing rather than human readability, so the format of the sample relevant and new files is not appropriate. We will be using a variation of the TREC ad hoc format.
Required format (using the sample data)

    303 relevant 1 FT924-286 46 nist1
    303 relevant 2 FT924-286 48 nist1
    303 relevant 3 FT924-286 49 nist1
    303 relevant 4 FT931-6554 7 nist1
    .
    .
    303 relevant 16 LA122990-0029 14 nist1
    303 new 1 FT924-286 46 nist1
    303 new 2 FT924-286 48 nist1
    303 new 3 FT924-286 49 nist1
    303 new 4 FT931-6554 7 nist1
    .
    .
    303 new 15 LA112190-0043 15 nist1
    next topic
    ..
    There should be one file per run, ordered by topic number, including both the relevant
        and new lists for each topic number.
    Field 1 -- topic number
    Field 2 -- "relevant" or "new"
    Field 3 -- order of document/sentence pair within the current list
    Field 4 -- document id (the docid field exactly as it appears in the tag)
    Field 5 -- sentence number (again exactly as it appears in the tag)
    Field 6 -- the run tag; this should be a maximum of 12 characters, letters and digits only;
        it should be unique to the group, the type of run, and the year

Input Data

There are 50 topics taken from TRECs 6, 7, and 8 (topics 300-450). The documents are a subset of the relevant documents for those topics (all documents are from disks 4 and 5). Participants are provided with a ranked list of relevant documents, with between 10 and 25 relevant documents per topic. Additionally these documents are automatically segmented into sentences.

The documents are on a protected web site located at http://trec.nist.gov/novelty/. The web site is protected since it contains document text and we must be sure you have legitimate access to the document text before you can access it. To get the access sequence for the protected site, send an email message to Lori Buckland, lori.buckland@nist.gov requesting access. Lori will check our records to make sure we have signed data use forms for the TREC disks (disks 4&5) from your organization and respond with the access sequence. Please note that this is a manual process, and Lori will respond to requests during her normal mail-answering routine. Do not expect an instantaneous response. In particular, do not wait until the night before the deadline and expect to get access to the test data.

Further details

The topics are a modified version of the original TREC topic statement for each topic This version includes the original topic fields plus two new fields. See the sample area of the web site for 4 examples.

The first field, tagged contains a revised description. This is what the assessor used as the information need to build the relevant (and new) sets. Often this is the same as the original description, but many times it is not. The assessors were not supposed to use the narrative field and often the revised description includes some piece of the narrative that they did use.

The second new field is that contains an ordered list of the relevant documents that are to be mined for relevant and new sentences.

Each document set is the set/subset of relevant documents to be used for mining sentences. These documents have been automatically segmented with the tags as used in the sample documents on the web site.

Restrictions on how to do the task

This task should be done completely automatically. Any fields in the topic can be used; any other resources may be used. It should be assumed that the set of relevant documents are available as an ordered set, i.e. the entire set may be used in deciding the sentence sets. However the topics must be processed independently. Both these restrictions reflect the reality of the application.

Evaluation

NIST had the TREC assessors do this task manually, creating first a file of relevant sentences and then reducing that file to those that are new. Both files were saved. Two different assessors did this task for each topic. See the attached instructions to assessors.

Scoring

The sentences selected manually by the NIST assessors will be considered the truth data. To avoid confusion, this set of sentences are called RELEVANT in the discussion below. Agreement between these sentences and those found by the systems will be used as input for recall and precision.

Recall = #RELEVANT matched/#RELEVANT
Precision = #RELEVANT matched/#sentences submitted

Recall = #new-RELEVANT matched/#new-RELEVANT
Precision = #new-RELEVANT matched/#sentences submitted


Because of issues in agreement between the two humans, several variations of metrics will be used. The variations to be used will deal with how the RELEVANT (new-RELEVANT) are chosen.

Obviously we could chose one set of assessors as the official one (similar to TREC ad hoc), and use the second set only for human agreement measurements. But alternatively we could use any of the following combinations:

Human1, Human2, Union12, Intersection12, Minimum(Human1,Human2), same for Maximum, some edit distance function from one of these

We will use the Minimum as the main score, with NIST also computing some of the others for comparison purposes. The reason for this is that the biggest disagreement between assessors had to do with how many sequential sentences they took as relevant. Often they included sentences "for context", even though the instructions tried to discourage this. By taking the minimum of two assessors, we can avoid many of the disagreements. This definition also matches better with the stated goals of the track.

Definition for new and relevant

You are trying to create a list of sentences that are:
  1. relevant to the question or request made in the description section of the topic,
  2. their relevance is independent of any surrounding sentences,
  3. they provide new information that has not been found in any previously picked sentences.

Instructions to assessors

  1. order printed documents according to the ranked list
  2. using the description part of the topic only, go thru each printed document and mark in yellow all sentences that directly provide information requested by the description.
    Do not mark sentences that are introductory or explanatory in nature. In particular, if there is a set of sentences that provide a single piece of information, only select the sentence that provides the most detail in that set. If two adjacent sentences are needed to provide a single piece of information because of an unusual sentence construction or error in the sentence segmentor, mark both.
  3. go to the computer and pull up the online version of your documents. Go through each document, selecting the sentences that you have previously marked (you can change your mind). Save this edited version as "relevant"
  4. now go thru the online version looking for duplicate information. Order is important here; if a piece of information has already been picked, then repeats of that same information should be deleted. Instances that give further details of that information should be retained, but instances that summarize details seen earlier should be deleted. Save this second edited version as "new".

Last updated:Wednesday, 12-Feb-2003 13:33:11 EST
Date created: Wednesday, 19-June-02
trec@nist.gov