This directory contains the evaluation data from the TREC 2002
novelty track.  In the TREC 2002 novelty track, systems were
given a set of relevant documents in a particular order,
and were to return two sets of sentences.  The first set
of sentences was to be all of the RELEVANT sentences,
that is, sentences that contained relevant information.
The second set, a subset of the first, was to be the 
set of NEW sentences, sentences that contained information
not previously returned by relevant sentences that preceded it.

The data consists of the following files:

   * the text of the topics
     These are the original texts of the topics as released 
     in earlier TRECs plus new "description" fields if the
     novelty assessor judged the topic differently

   * the document text files
     The document text for each topic contains up to 25
     relevant (as determined by the original TREC assessor)    
     documents ordered by the retrieval rank assigned
     by a particular search engine.  Since the document
     text files contain document text :-), they must be
     password protected.  To obtain access to the document
     texts, send email to Lori Buckland, lori.buckland@nist.gov,
     requesting the password.  We must have a signed TREC data
     use release form on file from you to give you the password.
     The document text is segmented into sentence boundaries
     such that each sentence is assigned a sentence number
     within its document.

  *  four different judgment ("qrels") files
     A judgment file for the novelty track contains a list of
     sentence identifiers of the form <document-id>:<sentence-num>
     for each topic.  If the name of the judgment file ends in
     ".relevant", then the sentences are the set of RELEVANT
     sentences as determined by the human assessor.  If the judgment
     file ends in ".new", then the sentences are the set of
     NEW sentences as determined by the assessor.  To check how
     often humans agree on the sets of RELEVANT and NEW sentences,
     we had two assessors independently judge each topic.
     The judge who selected the smaller number of RELEVANT
     sentences was called the minimum judge; the other
     judge was the maximum judge.  Ties were broken arbitrarily.
     The minimum judgments were used for the official TREC 2002
     novelty track results.  The four judgment files result
     from the cross product of two judges (min, max) and two
     set types (relevant, new).

     Note that the minimum assessor could not find any relevant
     sentences for topic 310, so the min_qrels files contain
     no sentences for topic 310.
 
     IMPORTANT NOTE:  This version of the min_qrels files should
     have been equivalent to the qrels files released during TREC 2002,
     but it is not.  We found some errors in the data after
     TREC 2002 and corrected those errors in this version of the qrels.
     In the original TREC 2002 version of the qrels, the "new" and the
     "relevant" sentences for two topics (382 and 397) were switched 
     such that the relevant set was a subset of the new set, 
     instead of the reverse.  Thus evaluation results will not be
     identical to those shown in the TREC 2002 results.
     You should download this version of the qrels file to
     replace the one released during TREC 2002 if you have the
     older version.

  *  a script,  eval_novelty_run.pl
     This perl script takes the type of sentences to evaluate,
     a judgment file, and a result file as arguments and
     prints an evaluation of the result file.  The type of
     sentences is either "relevant" or "new".  A judgment file
     is one of the files above, and should match the type specified.
     The format of the result file is the same as a TREC 2002
     novelty track submission file:
	topic-num relevant|new sequence-no doc-id sentence-num tag
     where the second field is either the literal "relevant"
     or the literal "new".  The sequence-no must be present, but is
     otherwise ignored so can be anything.  The evaluation scores
     printed are per-topic precision, recall, precision*recall, and F,
     plus averages over the entire topic set for each measure.

     Since the minimum assessor had no sentences for topic 310,
     this script ignores topic 310, even if given the max_qrels
     file.  To include topic 310, include 310 in the array
     of topic numbers at the top of the script.