QUESTION ANSWERING DATA Karen Sparck Jones, December 2001 The data consists of batches of questions and candidate answer sentences. The data were originally compiled for an MPhil student project. The ideas behind them are that: (a) questions may be underspecified, ambiguous etc, so that without further clarifying information amplifying the question, there may be alternative `correct' answers. This is what might be called the `Where is the Taj Mahal?' problem. (We assume there is no use of relative frequency of occurrence of answers to suggest which is the `best' correct.) (b) questions may be open to alternative interpretation in what might be described as a presuppositional way. This can be called the `Do cats swim?' problem, because there may or may not be a presumption of the normal or standard. (c) questions may not have any `correct' or `likely' answers in the data file but there may still be sentences in the file which contribute pertinent information to the user. This can be called the `What's useful to the user' issue. (d) there may be candidate answers including a correct or likely one but the system cannot be expected to identify and select the correct one primarily because extensive pragmatic knowledge and inference, beyond the scope of current systems, are required for this. (e) while clarification from the user about what they are really after would be handy, it may not be easily obtained before searching: users are more likely to appreciate the assumptions and constraints behind their question when they see the ways it may be answered. The data were therefore constructed for a system which would operate on the basis that a) candidate answer sentences have already been selected by some efficient search and filter process; b) the file being search offers *many* candidate answers; c) instead of looking for *the* correct answer, the system should take it for granted that there will be many plausible or potentially useful answers, and should try to rank these in some order based on *general* criteria about matching conditions. Of course this is what TREC QA people try to do anyway, but the focus in the work for which this data was collected was that one was not trying to optimise ranking to ensure one gets the correct answer at top or near-top ranks, rather that, given there is no certainty about correct answers existing or being identifiable, one should offer the user an ordered body of information in response to their question. d) answers returned are always whole candidate answer sentences. The data were deliberately designed to test thess idea thoroughly, both by varying the linguistic expression of the same information and by varying the information contained in candidate answers. There is of course no reason to suppose that even a large file would deliver quite such large batches of candidate answers, but having these provides useful system testing capability. Note - several of the answer sets are identical - the challenge is how they are treated in response to the different questions. (sorry so much cardboard postcard!) EXAMPLES. Conventions: Up to set 16- a) a data set consists of a question Q and a set of candidate answers A; the latter are chiefly `positive' in some sense, but the sets also include retrieved sentences that one would not normally regard as answers or even contributing pertinent information. (These ought to come at the bottom of the ranking.) b) the data sets are numbered and each set has its own internal numbering for its A sentences. c) the sets have a brief motivating description attached to the set name. From set 20 - there are several questions to be offered to the same data set.   SET 1 : BASIC SET Q1 Are postcards made of cardboard? A1.1 Postcards are sold of cardboard. A1.2 Postcards were made of cardboard. A1.3 Tables are made of cardboard. A1.4 Postcards are made of wood. A1.5 Postcards are made. A1.6 Postcards are made of cardboard by us. A1.7 Postcards are made of thin cardboard. A1.8 Cheap postcards are made of cardboard. A1.9 Postcards are made of cardboard. A1.10 We make postcards of cardboard. A1.11 Postcards are frequently made of thin cardboard. A1.12 A postcard is normally made of thin cardboard. A1.13 Companies normally make postcards of cardboard. A1.14 Companies make postcards. A1.15 Companies make postcards of thin cardboard for sale. A1.16 Nobody makes postcards of cardboard. A1.17 Postcards of cardboard are sold. A1.18 Postcards are made of recycled boxes. A1.19 Cardboard is made of old postcards. A1.20 Postcards are never made of cardboard. A1.21 Postcards are not made of cardboard. A1.22 My grandfather makes postcards of cardboard. A1.23 Somebody makes postcards of cardboard. A1.24 Ecological postcards are made of cardboard. A1.25 Old postcards are made of cardboard. A1.26 New postcards are made of recycled cardboard boxes. A1.27 Postcards are made of grey cardboard. A1.28 Book covers are made of cardboard. A1.29 Postcards are normally made of cardboard A1.30 A postcard is normally made of cardboard. A1.31 A postcard can be made of cork. A1.32 Cardboard is what postcards are made of. A1.33 Cardboard is the preferred material for companies making postcards. A1.34 Cardboard is what companies making postcards prefer. A1.35 The preferred material for making postcards is thin card. A1.36 Making postcards is hard when there is no cardboard. A1.37 Cardboard is in short supply so we cannot produce any postcards. A1.38 We cut up cardboard boxes to make postcards. A1.39 Our company supplies postcard views of seaside towns. A1.40 We require steady supplies of cardboard for our postcards. A1.41 Our postcards are made only of the very finest cardboard. A1.42 We use recycled cardboard packing materials to make our postcards. A1.43 White cardboard is better than grey cardboard for postcards. A1.44 Comic postcards make pretty well everyone happy. A1.45 Our postcards have cardboard cutout figures as themes. SET 2 : ANSWERS WITH CONJUNCTIONS   Q2 Are postcards made of cardboard? A2.1 We make birthday cards, postcards, and cardboard gift boxes. A2.2 We make greetings cards and postcards. A2.3 Postcards and gift cards are usually made of fine cardboard. A2.4 Greetings cards and postcards are made of thick paper or thin cardboard. A2.5 Old postcards are made of cardboard but modern ones are made of plastic. A2.6 Greetings cards are made of paper and postcards of cardboard. SET 3 : QUESTION IS FIRST VARIATION ON SET 1 QUESTION Q3 Are cheap postcards made of cardboard? A3.1 Postcards are sold of cardboard. A3.2 Postcards were made of cardboard. A3.3 Tables are made of cardboard. A3.4 Postcards are made of wood. A3.5 Postcards are made. A3.6 Postcards are made of cardboard by us. A3.7 Postcards are made of thin cardboard. A3.8 Cheap postcards are made of cardboard. A3.9 Postcards are made of cardboard. A3.10 We make postcards of cardboard. A3.11 Postcards are frequently made of thin cardboard. A3.12 A postcard is normally made of thin cardboard. A3.13 Companies normally make postcards of cardboard. A3.14 Companies make postcards. A3.15 Companies make postcards of thin cardboard for sale. A3.16 Nobody makes postcards of cardboard. A3.17 Postcards of cardboard are sold. A3.18 Postcards are made of recycled boxes. A3.19 Cardboard is made of old postcards. A3.20 Postcards are never made of cardboard. A3.21 Postcards are not made of cardboard. A3.22 My grandfather makes postcards of cardboard. A3.23 Somebody makes postcards of cardboard. A3.24 Ecological postcards are made of cardboard. A3.25 Old postcards are made of cardboard. A3.26 New postcards are made of recycled cardboard boxes. A3.27 Postcards are made of grey cardboard. A3.28 Book covers are made of cardboard. A3.29 Postcards are normally made of cardboard A3.30 A postcard is normally made of cardboard. A3.31 A postcard can be made of cork. A3.32 Cardboard is what postcards are made of. A3.33 Cardboard is the preferred material for companies making postcards. A3.34 Cardboard is what companies making postcards prefer. A3.35 The preferred material for making postcards is thin card. A3.36 Making postcards is hard when there is no cardboard. A3.37 Cardboard is in short supply so we cannot produce any postcards. A3.38 We cut up cardboard boxes to make postcards. A3.39 Our company supplies postcard views of seaside towns. A3.40 We require steady supplies of cardboard for our postcards. A3.41 Our postcards are made only of the very finest cardboard. A3.42 We use recycled cardboard packing materials to make our postcards. A3.43 White cardboard is better than grey cardboard for postcards. A3.44 Comic postcards make pretty well everyone happy. A3.45 Our postcards have cardboard cutout figures as themes. A3.46 We make cheap postcards of cardboard. SET 4 : QUESTION IS SECOND VARIATION ON SET 1 QUESTION Q4 Are postcards made of thin cardboard? A4.1 Postcards are sold of cardboard. A4.2 Postcards were made of cardboard. A4.3 Tables are made of cardboard. A4.4 Postcards are made of wood. A4.5 Postcards are made. A4.6 Postcards are made of cardboard by us. A4.7 Postcards are made of thin cardboard. A4.8 Cheap postcards are made of cardboard. A4.9 Postcards are made of cardboard. A4.10 We make postcards of cardboard. A4.11 Postcards are frequently made of thin cardboard. A4.12 A postcard is normally made of thin cardboard. A4.13 Companies normally make postcards of cardboard. A4.14 Companies make postcards. A4.15 Companies make postcards of thin cardboard for sale. A4.16 Nobody makes postcards of cardboard. A4.17 Postcards of cardboard are sold. A4.18 Postcards are made of recycled boxes. A4.19 Cardboard is made of old postcards. A4.20 Postcards are never made of cardboard. A4.21 Postcards are not made of cardboard. A4.22 My grandfather makes postcards of cardboard. A4.23 Somebody makes postcards of cardboard. A4.24 Ecological postcards are made of cardboard. A4.25 Old postcards are made of cardboard. A4.26 New postcards are made of recycled cardboard boxes. A4.27 Postcards are made of grey cardboard. A4.28 Book covers are made of cardboard. A4.29 Postcards are normally made of cardboard A4.30 A postcard is normally made of cardboard. A4.31 A postcard can be made of cork. A4.32 Cardboard is what postcards are made of. A4.33 Cardboard is the preferred material for companies making postcards. A4.34 Cardboard is what companies making postcards prefer. A4.35 The preferred material for making postcards is thin card. A4.36 Making postcards is hard when there is no cardboard. A4.37 Cardboard is in short supply so we cannot produce any postcards. A4.38 We cut up cardboard boxes to make postcards. A4.39 Our company supplies postcard views of seaside towns. A4.40 We require steady supplies of cardboard for our postcards. A4.41 Our postcards are made only of the very finest cardboard. A4.42 We use recycled cardboard packing materials to make our postcards. A4.43 White cardboard is better than grey cardboard for postcards. A4.44 Comic postcards make pretty well everyone happy. A4.45 Our postcards have cardboard cutout figures as themes. A4.46 We make postcards of thin cardboard. SET 5 : QUESTION IS THIRD VARIATION ON SET 1 QUESTION Q5 Are postcards normally made of cardboard? A5.1 Postcards are sold of cardboard. A5.2 Postcards were made of cardboard. A5.3 Tables are made of cardboard. A5.4 Postcards are made of wood. A5.5 Postcards are made. A5.6 Postcards are made of cardboard by us. A5.7 Postcards are made of thin cardboard. A5.8 Cheap postcards are made of cardboard. A5.9 Postcards are made of cardboard. A5.10 We make postcards of cardboard. A5.11 Postcards are frequently made of thin cardboard. A5.12 A postcard is normally made of thin cardboard. A5.13 Companies normally make postcards of cardboard. A5.14 Companies make postcards. A5.15 Companies make postcards of thin cardboard for sale. A5.16 Nobody makes postcards of cardboard. A5.17 Postcards of cardboard are sold. A5.18 Postcards are made of recycled boxes. A5.19 Cardboard is made of old postcards. A5.20 Postcards are never made of cardboard. A5.21 Postcards are not made of cardboard. A5.22 My grandfather makes postcards of cardboard. A5.23 Somebody makes postcards of cardboard. A5.24 Ecological postcards are made of cardboard. A5.25 Old postcards are made of cardboard. A5.26 New postcards are made of recycled cardboard boxes. A5.27 Postcards are made of grey cardboard. A5.28 Book covers are made of cardboard. A5.29 Postcards are normally made of cardboard A5.30 A postcard is normally made of cardboard. A5.31 A postcard can be made of cork. A5.32 Cardboard is what postcards are made of. A5.33 Cardboard is the preferred material for companies making postcards. A5.34 Cardboard is what companies making postcards prefer. A5.35 The preferred material for making postcards is thin card. A5.36 Making postcards is hard when there is no cardboard. A5.37 Cardboard is in short supply so we cannot produce any postcards. A5.38 We cut up cardboard boxes to make postcards. A5.39 Our company supplies postcard views of seaside towns. A5.40 We require steady supplies of cardboard for our postcards. A5.41 Our postcards are made only of the very finest cardboard. A5.42 We use recycled cardboard packing materials to make our postcards. A5.43 White cardboard is better than grey cardboard for postcards. A5.44 Comic postcards make pretty well everyone happy. A5.45 Our postcards have cardboard cutout figures as themes. A5.46 We normally make postcards of cardboard. SET 6 : QUESTION IS FOURTH VARIATION ON SET 1 QUESTION Q6 Are postcards made of cardboard with gold borders? A6.1 Postcards are sold of cardboard. A6.2 Postcards were made of cardboard. A6.3 Tables are made of cardboard. A6.4 Postcards are made of wood. A6.5 Postcards are made. A6.6 Postcards are made of cardboard by us. A6.7 Postcards are made of thin cardboard. A6.8 Cheap postcards are made of cardboard. A6.9 Postcards are made of cardboard. A6.10 We make postcards of cardboard. A6.11 Postcards are frequently made of thin cardboard. A6.12 A postcard is normally made of thin cardboard. A6.13 Companies normally make postcards of cardboard. A6.14 Companies make postcards. A6.15 Companies make postcards of thin cardboard for sale. A6.16 Nobody makes postcards of cardboard. A6.17 Postcards of cardboard are sold. A6.18 Postcards are made of recycled boxes. A6.19 Cardboard is made of old postcards. A6.20 Postcards are never made of cardboard. A6.21 Postcards are not made of cardboard. A6.22 My grandfather makes postcards of cardboard. A6.23 Somebody makes postcards of cardboard. A6.24 Ecological postcards are made of cardboard. A6.25 Old postcards are made of cardboard. A6.26 New postcards are made of recycled cardboard boxes. A6.27 Postcards are made of grey cardboard. A6.28 Book covers are made of cardboard. A6.29 Postcards are normally made of cardboard A6.30 A postcard is normally made of cardboard. A6.31 A postcard can be made of cork. A6.32 Cardboard is what postcards are made of. A6.33 Cardboard is the preferred material for companies making postcards. A6.34 Cardboard is what companies making postcards prefer. A6.35 The preferred material for making postcards is thin card. A6.36 Making postcards is hard when there is no cardboard. A6.37 Cardboard is in short supply so we cannot produce any postcards. A6.38 We cut up cardboard boxes to make postcards. A6.39 Our company supplies postcard views of seaside towns. A6.40 We require steady supplies of cardboard for our postcards. A6.41 Our postcards are made only of the very finest cardboard. A6.42 We use recycled cardboard packing materials to make our postcards. A6.43 White cardboard is better than grey cardboard for postcards. A6.44 Comic postcards make pretty well everyone happy. A6.45 Our postcards have cardboard cutout figures as themes. A6.46 We make postcards of cardboard with gold borders. SET 7 BIRDS EGGS Q7 Do birds lay blue eggs? A7.1 Birds lay eggs which are blue or white or brown. A7.2 Blue eggs are laid by many birds. A7.3 The birds eggs are blue and they lay them in March. A7.4 The birds lay in March and their eggs are blue. A7.5 Birds are always laying eggs. A7.6 They are blue birds that lay brown eggs. A7.7 The birds lay in the yard. A7.8 I lay blue eggs in the window for the birds. A7.9 Birds do lay blue eggs. SET 8 PREPOSITIONAL PHRASES Q8 Is the crown of the status of Mary in the church of Utrecht made of gold?   A8.1 The golden crown of the Mary in the church of Utrecht was stolen this Friday. A8.2 The statue of Mary in the Utrecht church has a golden crown. A8.3 There is a statue of Mary in the church with a large crown. A8.4 The crown of Mary in the church of Utrecht is made of gold.   SET 9 TIME PERIODS   Q9 Was Kennedy ever president? A9.1 Kennedy was the president of the U.S.A. A9.2 Kennedy was elected president the first time he attempted it. A9.3 Kennedy was President.   SET 10 STATES Q10 Is Greece interesting? A10.1 Greece was beautiful and interesting. A10.2 There are interesting things to see in Greece. A10.3 We had a very interesting time when visiting Greece. A10.4 Greece is interesting when it rains. SET 11 APPLE PIE Q11 Who knows how to make apple pie? A11.1 Mary knows how to make apple pie. A11.2 The pies made by Mary are better than anyone else's. A11.3 Everyone knows how to make something as simple as apple pie. A11.4 Apple pie is something all good mothers know how to make. A11.5 We know how to make apple pie? NB ? on this last is deliberate SET 12 MUSHY ANSWERS Q12 What has been done to reduce global warming? A12.1 Noone knows what to do about global warming. A12.2 Global warming is one of those things everyone talks about. A12.3 The doomsayers are warning of global warming soon. A12.4 Global warming could be reduced by less burning of fossil fuels. A12.5 Something has been done to reduce global warming. SET 13 SCIENCE PROBLEMS Q13 What experiments have there been to measure the boiling point of water? A13.1 There have been many experiments to measure the boiling point of water. A13.2 It is very hard to measure the boiling point of water. A13.3 Scientists have measured the boiling point of many liquids, including water. A13.4 Smith measured the boiling point of water and found it was usually 100 degrees. A13.5 It is not clear whether scientists have measured the boiling point of water correctly or not. A13.6 A range of experiments have been done in different places to measure the boiling point of water which have all shown that it is 100 degrees at sea level. A13.7 I was so furious my temperature certainly reached boiling point, so I went in search of water. A13.8 Put the potatoes in a pan and measure in a pint of boiling water. A13.9 The water was on the point of boiling, so I measured the rice carefully. A13.10 Experiments have been made to measure the boiling point of water. SET 14 DISCOVERY Q14 Who is the discoverer of America ? A14.1 Columbus was the discoverer of America. A14.2 Columbus is the person who discovered America. A14.3 It is generally believed that Columbus discovered America. A14.4 Columbus is the discoverer of America. A14.5 It was Columbus who discovered America. A14.6 America was discovered by Columbus in 1492. A14.7 The Vikings discovered America in the early Middle Ages. A14.8 They say that it was really Leif Erikson who discovered America. A14.9 Who it was who really discovered America is actually not known. A14.10 Noone knows who was actually the first European person to set foot on American soil. A14.11 All good Americans believe that Columbus discovered America in 1492. A14.12 The discovery of America was in 1492. A14.13 America was discovered in the fifteenth century. A14.14 The discovery of America was made by Columbus. A14.15 There are people in America who have discovered perpetual motion. A14.16 Gold has been discovered in America. A14.17 The American discoverer of photography was Bloggs. A14.18 Someone is the discoverer of America. SET 15 DEFINITIONS Q15 What is an aardvark ? A15.1 I do not know what an aardvark is. A15.2 An aardvark is some sort of African animal. A15.3 Aardvarks are nocturnal insectivores. A15.4 An aardvark is a small disagreeable animal. A15.5 People eat aardvarks at Christmas. A15.6 Aardvarks are strange animals. A15.7 There are aardvarks in the woods that come out at night. A15.8 When you see an aardvark run a mile. A15.9 There are big aardvarks and little aardvarks all over the place. A15.10 There is an aardvaark. SET 16 : QUESTION VARIATION OF SET 14 Q16 Who discovered America ? A16.1 Columbus was the discoverer of America. A16.2 Columbus is the person who discovered America. A16.3 It is generally believed that Columbus discovered America. A16.4 Columbus is the discoverer of America. A16.5 It was Columbus who discovered America. A16.6 America was discovered by Columbus in 1692. A16.7 The Vikings discovered America in the early Middle Ages. A16.8 They say that it was really Leif Erikson who discovered America. A16.9 Who it was who really discovered America is actually not known. A16.10 Noone knows who was actually the first European person to set foot on American soil. A16.11 All good Americans believe that Columbus discovered America in 1492. A16.12 The discovery of America was in 1492. A16.13 America was discovered in the fifteenth century. A16.14 The discovery of America was made by Columbus. A16.15 There are people in America who have discovered perpetual motion. A16.16 Gold has been discovered in America. A16.17 The American discoverer of photography was Bloggs. A16.18 Someone is the disciverer of America. SETS 20-25 : DIFFERENT QUESTIONS TO TRY ON THE SAME ANSWER SET Q20 Where is honey produced ? Q21 Who makes honey ? Q22 Which countries produce honey ? Q23 Does Mexico produce honey? Q24 What is honey made from ? Q25 Where do people produce honey ? A1 Honey is produced in hives. A2 Bees produce honey in hives. A3 Bees in their hives produce honey. A4 Honey is produced by bees in their hives. A5 Honey is produced by bees from nectar. A6 Bees produce honey from nectar from flowers in their hives. A6 Bees produce honey in their hives using the nectar they have gathered from flowers. A7 Beehives are the factories that produce honey. A8 Looking at beehives, one finds they are amazing little factories, full of bees making honey. A9 When one looks inside a beehive, one finds it is an astonishing factory where the bees produce honey. A10 Bees produce honey, making it from the nectar they have stored in their hives. A11 Honey is produced wherever there are beehives. A12 Bees have their hives for producing honey in trees and walls as well as manmade wooden hives or basket skips. A13 Honey is produced in many countries round the world. A14 Mexico produces honey and so does Australia. A15 Honey is produced in Australia. A16 Bees make honey in Australia. A17 The honey made in Australia is very good and production is large. A18 Honey production in Mexico is a significant element in the rural economy. A19 The production of honey is an important activity in the Mexican countryside. A20 Village people throughout Mexico produce honey in large quantities. A21 Honey is sold in very pretty glass jars. A22 Tea with honey is sold in every tea shop in Grantchester. A23 They produce tea with honey for every tourist in Cambridge. A24 Many countries produce honey with different flavours according to their characteristic plant species. A25 Mexico produces very fine strongly flavoured honey. A26 The bees in Ireland work very hard to produce honey because the flowers are far apart. A27 Mary produced a jar of honey from her basket. A28 Supermarkets sell produce including honey in very large containers. A29 Enthusiastic amateurs produce a lot of honey from a few hives in their gardens. A30 Hives are placed in orchards in the spring to help the bees gather nectar for their honey. A31 With just honey and biscuits you can produce a very genteel dessert. A32 Making puddings with honey with Susie in the kitchen is always a big production. SETS 30-34 ANOTHER SET OF QUESTIONS FOR ONE ANSWER SET Q30 Who wins rowing races ? Q31 Are there rowing races ? Q32 Is rowing a sport with races for amateurs ? Q33 Do the people who win races row as amateurs ? Q34 How many people rowing regularly win races ? A1 Every nation wins a rowing race sometimes. A2 Rowing races are won by strong young men only. A3 There are races for running and rowing that anyone can win. A4 Rowing is a sport with races many famous amateurs have won. A5 Professionals and amateurs regularly win horse races.