This file summarizes the 2005 pilot for evaluation of opinion-oriented questions, performed by NIST and AQUAINT program contractors. The first part of the summary gives an overview of the pilot, and the second part describes the format of some of the data files resulting from the pilot. The data files include: 1. instructions to the analysts for developing and assessing the questions ("instr_{dev,assess}.txt") 2. a set of 50 questions grouped into 20 topics ("questions.xml") 3. answers and notes for the questions for each topic as compiled by its author ("topics/*") 4. a list of concepts that should be included in the answer to each question, as determined by an assessor ("nuggets.primary") 5. the judging of the systems' responses as determined by an assessor (".judged") 6. one or more document ids for each concept that was found in a system response ("autonuggetdocs.primary") 7. for a subset of the questions, a list of concepts that should be included in the answers to each question, as determined by a secondary assessor ("nuggets.secondary"). (NB: The concepts in "nuggets.secondary" were not actually used in any of the evaluated results included in these data files.) Pilot Task Description ---------------------- The questions for the opinion pilot were developed by four analysts. The questions involve 20 "topics" (people, issues, or events), with 2-3 questions per topic. The questions are primarily opinion-oriented, although a few factual questions were included. Participants were given the topics and the questions for each topic, but the questions were NOT labeled as factual or opinion. The system response was a set of information nuggets that was evaluated as in the AQUAINT definition pilot (see the bottom of http://trec.nist.gov/data/qa/add_qaresources.html for details). The response did not have to be a synthesized answer but should be information that the analyst would use to write the answer. For yes/no questions (e.g., "Do other countries support Guatemala's claim on Belize?"), the response should comprise evidence supporting the yes/no answer. The format of a response was the same as for the previous AQUAINT definition and relationship pilots, namely a file containing lines of the form with at least one line per . There was no limit on the length or number of answer-strings that could be returned, but excessive length in a response was (weakly) penalized in the scoring. We used F(beta=3) as the official score for the pilot, with a length allowance of 100 (non-white-space) characters per nugget retrieved. NIST accepted up to two runs per group. Since one of the goals of this pilot was to evaluate the utility of opinion annotation in QA, participants were encouraged to submit two runs if possible -- one run that had not been tuned for opinion questions, and one run that incorporated some analysis/annotation of opinions. Data Files ========== "nuggets.primary" ----------------- The nuggets that were used for assessing the runs are in the "nuggets.primary" file. The format of each line is: topic-number nugnum vital|okay answer-string where topic-number is the topic number, nugnum is the nugget number, vital|okay indicates whether the nugget is "vital" or "okay", and answer-string is the gloss of the answer "nuggets.secondary" ------------------- For six of the topics (14 questions total), we had a second analyst create their own lists of nuggets. These are in the file "nuggets.secondary," in the same format as "nuggets.primary". ".judged" ----------------- The assessment of the systems' runs are given in the files .judged, with one file for each run. A judged assessment file contains two parts per question. The first part repeats what was submitted to NIST with the exception that an "item number" is added. An item number is simply the count of the strings submitted to NIST. Thus the format of this part is question-number run-tag item-number doc-id answer-string where question-number is the question number; run-tag is the [anonymized] id of the run; doc-id is the id of the document from which the answer was drawn; and answer-string is the text snippet. The second part lists the nuggets that were assigned to each item: question-number run-tag item-number nugget-number where nugget-number is the number of the nugget as listed in the "nuggets.primary" file. For example, the line 14.1 Run-A 1 3 means that the third nugget of question 14.1 was found in the first item in the response submitted to NIST by Run-A. If a nugget appeared multiple times in a response, it was marked only once with the assessor picking the item he or she thought was the "best" match (for an unknown and arbitrary definition of best). A single item might match multiple nuggets, in which case the item is repeated in the second part. For example: 16.1 Run-A 1 2 16.1 Run-A 1 3 means that the first item in the response for question 16.1 contained nugget 2 and nugget 3. If no nuggets were found in the response, then the second part will be empty. "autonuggetdocs.primary" ------------------------ This file contains system responses and document ids for each nugget in the "nuggets.primary" file for which some system returned a matching response. Each line in the file is in the format: question_id nugget_id vital|okay doc_id string where each (question_id, nugget_id) pair is labeled as either "vital" or "okay", based on the "nuggets.primary" file. doc_id and string are the document id and answer string returned by the system that match this nugget_id for this question_id. Multiple (doc_id, string) pairs may be included, one per line, for each question_id and nugget_id pair, but duplicates are removed.