TREC 2003 Novelty Track Guidelines |
||||||||||||||||||||||||||||||||||
Tracks home | ||||||||||||||||||||||||||||||||||
SummaryThe Novelty Track is designed to investigate systems' abilities to locate relevant AND new information within a set of documents relevant to a TREC topic. Systems are given the topic and a set of relevant documents ordered by date, and must identify sentences containing relevant and/or new information in those documents. The Novelty Track took place for the first time in TREC 2002. This year, there are several changes to note:
Due dates:
GoalCurrently systems return ranked lists of documents as the answer for an information request. The TREC question-answering track takes this a major step forward, but only for direct questions and only for short, fact-based questions. Another approach to providing answers would be to return only new AND relevant sentences (within context) rather than whole documents containing duplicate and extraneous information. A possible application scenario here would be to envision a smart "next" button that walked a user down the ranked list by hitting the next new and relevant sentence. The user could then view that sentence and if interested, also read the surrounding sentences. Alternatively this task could be viewed as finding key sentences that could be useful as "hot spots" for collecting information to summarize an answer of length X to an information request.
TasksThis year there will be four tasks which vary the kinds of data available to the systems and the kinds of results that need to be returned. There will be fifty topics, each with 25 relevant documents selected by the assessor who wrote the topic. The documents are split into sentences. The four tasks are, for each topic:
Participants are free to participate in any or all tasks. You may submit a maximum of five runs per task. Topics and DocumentsThis year, the track will be using fifty new topics developed using the AQUAINT collection. AQUAINT contains newswire articles from three different wires: New York Times News Service, AP, and Xinhua News Service. All three sources have documents covering the period June 1998 through September 2000; additionally, the Xinhua collection goes back to January 1996. This year's topics are of two types:
For each topic, the assessor has selected 25 relevant documents from the collection. They are probably not the only relevant documents for that topic, nor are they necessarily the best. You will be provided with those documents concatenated together in chronological order and separated into individual sentences. Each sentence is tagged with a source document ID and a sequence number. The documents are on a protected web site located at http://trec.nist.gov/novelty/. The web site is protected since it contains document text and we must be sure you have legitimate access to the document text before you can access it. To get the access sequence for the protected site, send an email message to Lori Buckland, lori.buckland@nist.gov requesting access. Lori will check our records to make sure we have signed data use forms for the AQUAINT data from your organization and respond with the access sequence. Please note that this is a manual process, and Lori will respond to requests during her normal mail-answering routine. Do not expect an instantaneous response. In particular, do not wait until the night before the deadline and expect to get access to the test data. Task and training data restrictionsThis task should be done completely automatically. Any fields in the topic can be used. It should be assumed that the set of relevant documents are available as an ordered set, i.e. the entire set may be used in deciding the sentence sets. However the topics must be processed independently. Both these restrictions reflect the reality of the application. You are free to use any other TREC documents or training data you would like. Although there are probably other relevant documents in the collection, NIST will not be providing further qrels. You will be asked when runs are submitted to describe additional data used. Tasks 2 and 3 cannot be ordered such that all the test data is hidden from both tasks. Therefore, you are expected to keep the training and test sentences separate between your task 2 and 3 runs. Other training data may be kept in common, but do NOT (for example) submit a task 3 run which takes advantage of the relevant sentences released for task 2. The topics and judgments for last year's Novelty Track data is available from the TREC web site (LINK). Keep in mind that there are several important differences: last year's documents were ordered by a retrieval status value rather than chronologically; this year's truth data is from the topic author rather than a minimum of two assessors; every topic has 25 relevant documents selected for the task; etc. Nevertheless, you may find the data useful for designing and/or training your system. Format of resultsParticipants will return either one or two lists of doc id/sentence number pairs for each topic, one list corresponding to all the relevant sentences and the second list (a subset of the first) containing only those sentences that contain new information. Only submit the sentences required for each task! For task 1, a run submission should have both relevant and novel sentences, but for task 2, a run should only contain novel sentences. Don't include any data given by NIST, only include the output your system is required to produce. Results must be submitted in the following format. This format is a variation of the TREC ad hoc format, and is identical to last year's format without the sequence number field.
EvaluationThe sentences selected manually by the NIST assessors will be considered the truth data. To avoid confusion, this set of sentences are called RELEVANT in the discussion below. Agreement between these sentences and those found by the systems will be used as input for recall and precision.
Note that this differs from last year, when two assessors judged each topic, with the truth data taken from the assessor who judged the fewest sentences as novel. This year, the truth data is from the assessor who authored the topic. This means that the truth data this year probably contains more relevant and novel sentences on average than last year. The official measure for the Novelty track will be the F measure (with beta=1, equal emphasis on recall and precision): 2 * Precision * Recall F = ----------------------- Precision + Recallalternatively, this can be formulated 2 * (No. relevant sentences retrieved) F = --------------------------------------------------- (No. retrieved sentences) + (No. relevant sentences)(for novel sentence selection tasks, substitute "new" for "relevant") Definition for new and relevantYou are trying to create a list of sentences that are: | ||||||||||||||||||||||||||||||||||
Last updated:Tuesday, 13-May-03 11:11:00 Date created: Tuesday, 13-May-03 trec@nist.gov |