TREC 2003 Novelty Track Guidelines

Tracks home

National Institute of Standards and Technology Home Page

Summary

The Novelty Track is designed to investigate systems' abilities to locate relevant AND new information within a set of documents relevant to a TREC topic. Systems are given the topic and a set of relevant documents ordered by date, and must identify sentences containing relevant and/or new information in those documents.

The Novelty Track took place for the first time in TREC 2002. This year, there are several changes to note:

All new topics, with document relevance judgments and sentence relevance and novelty judgments done by the topic author,
Documents will be ordered chronologically,
There are several subtasks within the track. Depending on the task, your system may have to identify relevant sentences, novel sentences, or both, and
The results submission format has changed slightly.

These changes are detailed below.

Due dates:

Test data released: June 2, 2003

Result due date: September 3/10/17, 2002

Runs allowed: maximum of 5 runs per group per task

Goal

Currently systems return ranked lists of documents as the answer for an information request. The TREC question-answering track takes this a major step forward, but only for direct questions and only for short, fact-based questions. Another approach to providing answers would be to return only new AND relevant sentences (within context) rather than whole documents containing duplicate and extraneous information.

A possible application scenario here would be to envision a smart "next" button that walked a user down the ranked list by hitting the next new and relevant sentence. The user could then view that sentence and if interested, also read the surrounding sentences. Alternatively this task could be viewed as finding key sentences that could be useful as "hot spots" for collecting information to summarize an answer of length X to an information request.

Tasks

This year there will be four tasks which vary the kinds of data available to the systems and the kinds of results that need to be returned. There will be fifty topics, each with 25 relevant documents selected by the assessor who wrote the topic. The documents are split into sentences.

The four tasks are, for each topic:

Given the set of 25 relevant documents for the topic, identify all relevant and novel sentences. This is last year's task.
(This task will be due first, on September 3, 2003.)
After the first due date, NIST will release the full set of relevant sentences for all 25 documents. Given all relevant sentences, identify all novel sentences.
We will also release the novel sentences for the first 5 documents. Given the relevant and novel sentences in the first 5 documents ONLY, find the relevant and novel sentences in the remaining 20 documents.
(Tasks 2 and 3 will be due second, on September 10, 2003.)
Given all relevant sentences from all documents, and the novel sentences from the first 5 documents, find the novel sentences in the last 20 documents.
(Task 4 will be due last, on September 17, 2003.)

Participants are free to participate in any or all tasks. You may submit a maximum of five runs per task.

Topics and Documents

This year, the track will be using fifty new topics developed using the AQUAINT collection. AQUAINT contains newswire articles from three different wires: New York Times News Service, AP, and Xinhua News Service. All three sources have documents covering the period June 1998 through September 2000; additionally, the Xinhua collection goes back to January 1996.

This year's topics are of two types:

Event topics are about a particular event that occurred within the time period of the collection. Relevant information pertains specifically to the event.
Opinion topics are about different opinions and points of view on an issue. Relevant information takes the form of opinions on the issue reported or expressed in the articles.

The topics have traditional TREC topics statements with a title, description, and narrative.

For each topic, the assessor has selected 25 relevant documents from the collection. They are probably not the only relevant documents for that topic, nor are they necessarily the best. You will be provided with those documents concatenated together in chronological order and separated into individual sentences. Each sentence is tagged with a source document ID and a sequence number.

The documents are on a protected web site located at http://trec.nist.gov/novelty/. The web site is protected since it contains document text and we must be sure you have legitimate access to the document text before you can access it. To get the access sequence for the protected site, send an email message to Lori Buckland, [email protected] requesting access. Lori will check our records to make sure we have signed data use forms for the AQUAINT data from your organization and respond with the access sequence. Please note that this is a manual process, and Lori will respond to requests during her normal mail-answering routine. Do not expect an instantaneous response. In particular, do not wait until the night before the deadline and expect to get access to the test data.

Task and training data restrictions

This task should be done completely automatically. Any fields in the topic can be used. It should be assumed that the set of relevant documents are available as an ordered set, i.e. the entire set may be used in deciding the sentence sets. However the topics must be processed independently. Both these restrictions reflect the reality of the application.

You are free to use any other TREC documents or training data you would like. Although there are probably other relevant documents in the collection, NIST will not be providing further qrels. You will be asked when runs are submitted to describe additional data used.

Tasks 2 and 3 cannot be ordered such that all the test data is hidden from both tasks. Therefore, you are expected to keep the training and test sentences separate between your task 2 and 3 runs. Other training data may be kept in common, but do NOT (for example) submit a task 3 run which takes advantage of the relevant sentences released for task 2.

The topics and judgments for last year's Novelty Track data is available from the TREC web site (LINK). Keep in mind that there are several important differences: last year's documents were ordered by a retrieval status value rather than chronologically; this year's truth data is from the topic author rather than a minimum of two assessors; every topic has 25 relevant documents selected for the task; etc. Nevertheless, you may find the data useful for designing and/or training your system.

Format of results

Participants will return either one or two lists of doc id/sentence number pairs for each topic, one list corresponding to all the relevant sentences and the second list (a subset of the first) containing only those sentences that contain new information.

Only submit the sentences required for each task! For task 1, a run submission should have both relevant and novel sentences, but for task 2, a run should only contain novel sentences. Don't include any data given by NIST, only include the output your system is required to produce.

Results must be submitted in the following format. This format is a variation of the TREC ad hoc format, and is identical to last year's format without the sequence number field.

��N1 relevant FT924-286 46 nist1

��N1 relevant FT924-286 48 nist1

��N1 relevant FT924-286 49 nist1

��N1 relevant FT931-6554 7 nist1

��.

��.

��N1 relevant LA122990-0029 14 nist1

��N1 new FT924-286 46 nist1

��N1 new FT924-286 48 nist1

��N1 new FT924-286 49 nist1

��N1 new FT931-6554 7 nist1

��.

��.

��N1 new LA112190-0043 15 nist1

��N2 relevant LA122490-0040 1 nist1

��. . .

��
There should be one file per run, ordered by topic number, including both the relevant
��and new lists for each topic number.

��Field 1 -- topic number, an N followed by a number

��Field 2 -- "relevant" or "new"

��Field 3 -- document id (the docid field exactly as it appears in the tag)

��Field 4 -- sentence number (again exactly as it appears in the tag)

��Field 5 -- the run tag; this should be a maximum of 12 characters, letters and digits only; it should be unique to the group, the type of run, and the year

Evaluation

The sentences selected manually by the NIST assessors will be considered the truth data. To avoid confusion, this set of sentences are called RELEVANT in the discussion below. Agreement between these sentences and those found by the systems will be used as input for recall and precision.

Recall = #RELEVANT matched/#RELEVANT

Precision = #RELEVANT matched/#sentences submitted

Recall = #new-RELEVANT matched/#new-RELEVANT

Precision = #new-RELEVANT matched/#sentences submitted

Note that this differs from last year, when two assessors judged each topic, with the truth data taken from the assessor who judged the fewest sentences as novel. This year, the truth data is from the assessor who authored the topic. This means that the truth data this year probably contains more relevant and novel sentences on average than last year.

The official measure for the Novelty track will be the F measure (with beta=1, equal emphasis on recall and precision):

            2 * Precision * Recall
     F  =  -----------------------
              Precision + Recall

alternatively, this can be formulated

                2 * (No. relevant sentences retrieved)
     F  =  ---------------------------------------------------
           (No. retrieved sentences) + (No. relevant sentences)

(for novel sentence selection tasks, substitute "new" for "relevant")

Definition for new and relevant

You are trying to create a list of sentences that are:

relevant to the question or request made in the description section of the topic,

their relevance is independent of any surrounding sentences,

they provide new information that has not been found in any previously picked sentences.

Last updated:Tuesday, 13-May-03 11:11:00
Date created: Tuesday, 13-May-03
[email protected]