TREC 2004 Novelty Track Guidelines
The Novelty Track is designed to investigate systems' abilities to locate relevant AND new information within a set of documents relevant to a TREC topic. Systems are given the topic and a set of relevant documents ordered by date, and must identify sentences containing relevant and/or new information in those documents.
For information on past Novelty Tracks, see the overviews:
Currently systems return ranked lists of documents as the answer for an information request. The TREC question-answering track takes this a major step forward, but only for direct questions and only for short, fact-based questions. Another approach to providing answers would be to return only new AND relevant sentences (within context) rather than whole documents containing duplicate and extraneous information.
A possible application scenario here would be to envision a smart "next" button that walked a user down the ranked list by hitting the next new and relevant sentence. The user could then view that sentence and if interested, also read the surrounding sentences. Alternatively this task could be viewed as finding key sentences that could be useful as "hot spots" for collecting information to summarize an answer of length X to an information request.
This year there will be four tasks which vary the kinds of data available to the systems and the kinds of results that need to be returned. There will be fifty topics, each with 25 relevant documents selected by the assessor who wrote the topic, as well as zero or more documents which were judged irrelevant. The documents are split into sentences.
The four tasks are, for each topic:
Participants are free to participate in any or all tasks. You may submit a maximum of five runs per task.
Topics and Documents
This year, the track will be using fifty new topics (numbered N51-N100) developed using the AQUAINT collection. AQUAINT contains newswire articles from three different wires: New York Times News Service, AP, and Xinhua News Service. All three sources have documents covering the period June 1998 through September 2000; additionally, the Xinhua collection goes back to January 1996.
The topics are evenly divided between two topic types:
For each topic, the assessor has selected 25 relevant documents and some number (possibly zero) of irrelevant documents from the collection. They are probably not the only documents for that topic, nor are they necessarily the best. You will be provided with those documents concatenated together in chronological order and separated into individual sentences. Each sentence is tagged with a source document ID and a sequence number.
The documents are on a protected web site located at http://trec.nist.gov/novelty/. The web site is protected since it contains document text and we must be sure you have legitimate access to the document text before you can access it. To get the access sequence for the protected site, send an email message to Lori Buckland, firstname.lastname@example.org requesting access. Lori will check our records to make sure we have signed data use forms for the AQUAINT data from your organization and respond with the access sequence. Please note that this is a manual process, and Lori will respond to requests during her normal mail-answering routine. Do not expect an instantaneous response. In particular, do not wait until the night before the deadline and expect to get access to the test data.
Task and training data restrictions
This task should be done completely automatically. Any fields in the topic can be used. It should be assumed that the set of relevant documents are available as an ordered set, i.e. the entire set may be used in deciding the sentence sets. However the topics must be processed independently. Both these restrictions reflect the reality of the application.
You are free to use any other TREC documents or training data you would like. Although there are probably other relevant documents in the collection, NIST will not be providing further qrels. You will be asked when runs are submitted to describe additional data used.
Tasks 2 and 3 cannot be ordered such that all the test data is hidden from both tasks. Therefore, you are expected to keep the training and test sentences separate between your task 2 and 3 runs. Other training data may be kept in common, but do NOT (for example) submit a task 3 run which takes advantage of the relevant sentences released for task 2.
The topics and judgments for last year's Novelty Track data is available from the TREC web site (LINK). Keep in mind that last year, all documents were judged relevant, whereas this year there are irrelevant documents mixed in. Nevertheless, you may find the data useful for designing and/or training your system.
Format of results
Participants will return either one or two lists of doc id/sentence number pairs for each topic, one list corresponding to all the relevant sentences and the second list (a subset of the first) containing only those sentences that contain new information.
Only submit the sentences required for each task! For task 1, a run submission should have both relevant and novel sentences, but for task 2, a run should only contain novel sentences. Don't include any data given by NIST, only include the output your system is required to produce.
Results must be submitted in the following format. This format is a variation of the TREC ad hoc format, and is identical to last year's format without the sequence number field.
EvaluationThe sentences selected manually by the NIST assessors will be considered the truth data. To avoid confusion, this set of sentences are called RELEVANT in the discussion below. Agreement between these sentences and those found by the systems will be used as input for recall and precision.
The official measure for the Novelty track will be the F measure (with beta=1, equal emphasis on recall and precision):
2 * Precision * Recall F = ----------------------- Precision + Recallalternatively, this can be formulated
2 * (No. relevant sentences retrieved) F = --------------------------------------------------- (No. retrieved sentences) + (No. relevant sentences)(for novel sentence selection tasks, substitute "new" for "relevant") You are trying to create a list of sentences that are:
Last updated:Tuesday, 13-May-03 11:11:00
Date created: Tuesday, 13-May-03