This file summarizes the 2005 pilot for evaluation of opinion-oriented
questions, performed by NIST and AQUAINT program contractors.

The first part of the summary gives an overview of the pilot, and the
second part describes the format of some of the data files resulting
from the pilot.

The data files include:

    1. instructions to the analysts for developing and assessing the
       questions ("instr_{dev,assess}.txt")

    2. a set of 50 questions grouped into 20 topics ("questions.xml")

    3. answers and notes for the questions for each topic as compiled
       by its author ("topics/*")

    4. a list of concepts that should be included in the answer to
       each question, as determined by an assessor ("nuggets.primary")

    5. the judging of the systems' responses as determined by an
	assessor ("<run>.judged")

    6. one or more document ids for each concept that was found in a
       system response ("autonuggetdocs.primary")

    7. for a subset of the questions, a list of concepts that should
       be included in the answers to each question, as determined by a
       secondary assessor ("nuggets.secondary").  

(NB: The concepts in "nuggets.secondary" were not actually used in any
of the evaluated results included in these data files.)


Pilot Task Description
----------------------

The questions for the opinion pilot were developed by four analysts.
The questions involve 20 "topics" (people, issues, or events), with
2-3 questions per topic.  The questions are primarily
opinion-oriented, although a few factual questions were included.
Participants were given the topics and the questions for each topic,
but the questions were NOT labeled as factual or opinion.

The system response was a set of information nuggets that was
evaluated as in the AQUAINT definition pilot (see the bottom of
http://trec.nist.gov/data/qa/add_qaresources.html for details).  The
response did not have to be a synthesized answer but should be
information that the analyst would use to write the answer. For yes/no
questions (e.g., "Do other countries support Guatemala's claim on
Belize?"), the response should comprise evidence supporting the yes/no
answer.

The format of a response was the same as for the previous AQUAINT
definition and relationship pilots, namely a file containing lines of
the form
      <question-number> <run-tag> <doc-id> <answer-string>
with at least one line per <question-number>.

There was no limit on the length or number of answer-strings that
could be returned, but excessive length in a response was (weakly)
penalized in the scoring.  We used F(beta=3) as the official score for
the pilot, with a length allowance of 100 (non-white-space) characters
per nugget retrieved.

NIST accepted up to two runs per group.  Since one of the goals of
this pilot was to evaluate the utility of opinion annotation in QA,
participants were encouraged to submit two runs if possible -- one run
that had not been tuned for opinion questions, and one run that
incorporated some analysis/annotation of opinions. 


Data Files
==========


"nuggets.primary"
-----------------

The nuggets that were used for assessing the runs are in the
"nuggets.primary" file.  The format of each line is:
	  topic-number nugnum vital|okay answer-string 
where topic-number is the topic number, nugnum is the nugget number,
vital|okay indicates whether the nugget is "vital" or "okay", and
answer-string is the gloss of the answer


"nuggets.secondary"
-------------------

For six of the topics (14 questions total), we had a second analyst
create their own lists of nuggets.  These are in the file
"nuggets.secondary," in the same format as "nuggets.primary".  


"<runtag>.judged"
-----------------

The assessment of the systems' runs are given in the files
<runtag>.judged, with one file for each run.  A judged assessment file
contains two parts per question.

The first part repeats what was submitted to NIST with the exception
that an "item number" is added.  An item number is simply the count of
the strings submitted to NIST.  Thus the format of this part is
   question-number run-tag item-number doc-id answer-string
where question-number is the question number;
      run-tag is the [anonymized] id of the run;
      doc-id is the id of the document from which the answer was drawn;
  and answer-string is the text snippet. 

The second part lists the nuggets that were assigned to each item:
   question-number run-tag item-number nugget-number
where nugget-number is the number of the nugget as listed in the
"nuggets.primary" file.  For example, the line
    14.1 Run-A 1 3
means that the third nugget of question 14.1 was found in the first item
in the response submitted to NIST by Run-A.  If a nugget appeared
multiple times in a response, it was marked only once with the
assessor picking the item he or she thought was the "best" match (for
an unknown and arbitrary definition of best).

A single item might match multiple nuggets, in which case the item is
repeated in the second part. For example:
	 16.1 Run-A 1 2
	 16.1 Run-A 1 3
means that the first item in the response for question 16.1 contained
nugget 2 and nugget 3.  If no nuggets were found in the response, then
the second part will be empty.


"autonuggetdocs.primary"
------------------------

This file contains system responses and document ids for each nugget
in the "nuggets.primary" file for which some system returned a
matching response.

Each line in the file is in the format:
    question_id  nugget_id  vital|okay  doc_id  string
where each (question_id, nugget_id) pair is labeled as either "vital" or
"okay", based on the "nuggets.primary" file.

doc_id and string are the document id and answer string returned by
the system that match this nugget_id for this question_id. Multiple
(doc_id, string) pairs may be included, one per line, for each
question_id and nugget_id pair, but duplicates are removed.