TREC-8 Interactive Track Guidelines
Goal
----
The high-level goal of the Interactive Track in TREC-8 remains the
investigation of searching as an interactive task by examining the
process as well as the outcome. To this end a experimental framework
has been designed with the following common features:
- an interactive search task
- 6 topics
- a document collection to be searched
- a required set of searcher (demographics) questionnaires
- 6 classes of data to be collected at each site and submitted to NIST
- 3 summary measures to be calculated by NIST for use by participants
The framework will allow groups to estimate the effect of their
experimental manipulation free and clear of the main (additive)
effects of participant and topic and it will reduce the effect of
interactions.
In TREC-8 the emphasis will be on each group's exploration of
different approaches to supporting the common searcher task and
understanding the reasons for the results they get. No formal
coordination of hypotheses or comparison of systems across sites is
planned for TREC-8, but groups are encouraged to seek out and exploit
synergies. As a first step, groups are strongly encouraged to make the
focus of their planned investigations known to other track
participants as soon as possible, preferably via the track listserv
at trec-int@ohsu.edu. Contact track chair Bill Hersh to join.
General Description
-------------------
A minimum of 12 participating searchers, one experimental system, and
one control system per site will be required. The control system can
be any IR system appropriate to the goals of the local experiment,
e.g. a variant of the local experimental system, some other baseline
system such as SMART, ZPRISE, etc. (See "2. Augmentation" in the
detailed experimental design for information about how to use more
than eight searchers or more than one experimental system within this
design.)
Each searcher will perform six searches on the Financial Times of
London 1991-1994 collection (part of the TREC-8 adhoc collection),
using six topics especially chosen from the TREC-8 adhoc topics and
modified for use in the interactive track. Each searcher will perform
half of the total number of searches on the site's experimental system
and the other half on its control system. The experimental design
(see below) determines the order in which each searcher performs the
query and uses the systems (experimental and control).
In resolving experimental design questions not covered here (e.g.,
scheduling of tutorials and searches, etc.), participating sites
should try to minimize the differences between the conditions under
which a given searcher uses the control and those under which s/he
uses the experimental system. For example, running all the control
searches for a participant on one day and the searches on the
experimental system on another invites unequal, confounding
conditions.
Topics
------
Each of the topics will describe a need for information of a
particular type. Contained within the documents of the collection
to be searched will be multiple distinct examples or instances of the
needed information. The interactive topics will be modified versions
of specially selected adhoc topics. Here is an example TREC-6 adhoc
topic:
Number: 303
Title: Hubble Telescope Achievements
Description:
Identify positive accomplishments of the Hubble telescope
since it was launched in 1991.
Narrative:
Documents are relevant that show the Hubble telescope has
produced new data, better quality data than previously
available, data that has increased human knowledge of the
universe, or data that has led to disproving previously
existing theories or hypotheses. Documents limited to the
shortcomings of the telescope would be irrelevant. Details
of repairs or modifications to the telescope without
reference to positive achievements would not be relevant.
Here is an example of the same topic as it would be modified for use
in the TREC-8 interactive track. Note the addition of the "Please
save" paragraph and the removal of the usual Narrative section with
its specific criteria for relevance or non-relevance:
Number: 303i
Title: Hubble Telescope Achievements
Description:
Identify positive accomplishments of the Hubble telescope
since it was launched in 1991.
Instances:
In the time alloted, please find as many DIFFERENT positive
accomplishments of the sort described above as you can.
Please save at least one document for EACH such DIFFERENT
accomplishment.
If one document discusses several such accomplishments, then
you need not save other documents that repeat those, since your
goal is to identify as many DIFFERENT accomplishments of the sort
described above as possible.
Here are the topics for TREC-8 in NUMERICAL order. See the section
"Experimental design for a site" below for their assignment to blocks
and the order of presentation within the experimental design.
Number:
408i
Title:
tropical storms
Description:
What tropical storms (hurricanes and typhoons) have
caused property damage and/or loss of life?
Instances:
In the time alloted, please find as many DIFFERENT storms of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT storm.
If one document discusses several such storms, then you need
not save other documents that repeat those, since your goal
is to identify as many DIFFERENT storms of the sort described
above as possible.
Number:
414i
Title:
Cuba, sugar, imports
Description:
What countries import Cuban sugar?
Instances:
In the time alloted, please find as many DIFFERENT countries of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT country.
If one document discusses several such countries, then you need
not save other documents that repeat those, since your goal
is to identify as many DIFFERENT countries of the sort described
above as possible.
Number:
428i
Title:
declining birth rates
Description:
What countries other than the US and China have or have had
a declining birth rate?
Instances:
In the time alloted, please find as many DIFFERENT countries of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT country.
If one document discusses several such countries, then you need
not save other documents that repeat those, since your goal
is to identify as many DIFFERENT countries of the sort described
above as possible.
Number:
431i
Title:
robotic technology
Description:
What are the latest developments in robotic technology
and in its use?
Instances:
In the time alloted, please find as many DIFFERENT developments of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT development.
If one document discusses several such developments, then you need
not save other documents that repeat those, since your goal
is to identify as many DIFFERENT developments of the sort described
above as possible.
Number:
438i
Title:
tourism, increase
Description:
What countries have experienced an increase in tourism?
Instances:
In the time alloted, please find as many DIFFERENT countries of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT country.
If one document discusses several such countries, then you need
not save other documents that repeat those, since your goal
is to identify as many DIFFERENT countries of the sort described
above as possible.
Number:
446i
Title:
tourists, violence
Description:
In what countries have tourists been subject to
acts of violence causing bodily harm or death?
Instances:
In the time alloted, please find as many DIFFERENT countries of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT country.
If one document discusses several such countries, then you need
not save other documents that repeat those, since your goal
is to identify as many DIFFERENT countries of the sort described
above as possible.
Searcher task
-------------
The task of the interactive searcher is to save documents, which,
taken together, contain as many different instances as possible of
the type of information the topic expresses a need for - within
a 20 minute time limit.
Searchers will be encouraged to avoid saving documents which
contribute no instances beyond those in documents already saved, but
there will be no scoring penalty for saving such documents and
searchers will be told that.
Instructions to be given to searchers
-------------------------------------
The following introductory instructions are to be given once to each
searcher before the first search:
"Imagine that you have just returned from a visit to your doctor
during which it was discovered that you are suffering from high
blood pressure. The doctor suggests that you take a new experimental
drug, but you wonder what alternative treatments are currently
available. You decide to investigate the literature on your own
to satisfy your need for information about what different
alternatives are available to you for high blood pressure treatment.
You really need only one document for each of the different
treatments for high blood pressure.
You find and save a single document that lists four treatment drugs.
Then you find and save another two documents that each discusses a
separate alternative treatment: one that discusses the use of
calcium and one that talks about regular exercise. You've run out
of time and stop your search. In all, you have identified six
different instances of alternative treatments in three documents.
---
In this experiment, you will face a similar task. You will be
presented with several descriptions of needed information on a
number of topics. In each case there can be multiple examples or
instances of the type of information that's needed.
We would like you to identify as many different instances as you
can of the needed information for each topic that will be presented
to you - as many as you can in the 20 minutes you will be given
to search. Please save one document for EACH DIFFERENT instance
of the needed information that you identify. If you save one
document that contains several instances, try not to save additional
documents that contain ONLY those instances. However, you will not
be penalized if you save documents unnecessarily.
As you identify an instance of the needed information, please keep
track of which instances you have found: write down a word or short
phrase to identify the instance, or--if the system provides a
facility to keep track of instances--use it.
Carefully read each topic to understand the type of information
needed. This will vary from topic to topic. On one topic you may be
looking for instances of a certain kind of event. On another you may
be searching for examples of certain sorts of people, places, or
things.
Do you have any questions about
- what we mean by instances of needed information
- the way in which you are to save nonredundant documents for each
instance?"
Searcher questionnaires (minimum)
-----------------------
Provided by Rutgers (see track web site)
Data to be collected and submitted to NIST (emailed to over@nist.gov)
------------------------------------------
Several sorts of result data will be collected for evaluation/analysis (for
all searches unless otherwise specified):
===> Due at NIST by 30. August 1999:
1. sparse format data
===> Due at NIST by when the site reports for the conference are due:
2. rich format data
3. a full narrative description of one interactive session for
whichever topic is designated as T1
4. any further guidance or refinement of the task specification
given to the searchers
5. data from the common searcher questionnaires
Sparse format data for each search will comprise the list of documents
saved and the elapsed clock time of the search. The searcher's
selection (choice) of items for the final output list must be
identified in terms of each document's TREC document identifier
(DOCNO). The elapsed (clock) time in seconds taken for the search,
from the time the searcher first sees the topic until s/he declares
the search to be finished, should be recorded. It is assumed that the
interactive search takes place in one uninterrupted session. If a
session is unavoidably interrupted, it is recommended that it be
abandoned and the topic given to another searcher. Sparse format data
will be the basis for the summary evaluation at NIST, which will
produce a triple for each search: instance precision, instance
recall, and elapsed clock time.
Rich format data for each search will record:
- the word or phrase each searcher records to describe each
instance s/he identifies (no reference to the containing document(s))
- significant events in the course of the interaction and their
timing.
Rich format data are intended for analytical evaluation by the
experimenters.
All significant events and their timing in the course of the
interaction should be recorded. The events listed below are those
that seem to be fairly generally applicable to different systems
and interactive environments; however, the list may need extending
or modifying for specific systems and so should be taken as a
suggestion rather than a requirement:
o Intermediate search formulations: if appropriate to the
system, these should be recorded.
o Documents viewed: "viewing" is taken to mean the searcher
seeing a title or some other brief information about a
document; these events should be recorded.
o Documents seen: "seeing" is taken to mean the searcher
seeing the text of a document, or a substantial section of
text; these events should be recorded.
o Terms entered by the searcher: if appropriate to the
system, these should be recorded.
o Terms seen (offered by the system): if appropriate to the
system, these should be recorded.
o Selection/rejection: documents or terms selected by the
user for any further stage of the search (in addition to the
final selection of documents).
Format of sparse data to be submitted to NIST
---------------------------------------------
TWO files from each site
A. Search file
Here a "search" is the interaction of a searcher given a topic
and asked to carry out the interactive search task using a given
system against the collection - lasting at most 20 minutes.
One line for EACH SEARCH, each line containing the
following blank-delimited items from left to right:
1. Unique site ID
2. Search ID - site's choice (links search & document files)
3. Searcher ID - site's choice
4. System ID - site's choice
5. TREC topic number
6. Elapsed time - number of secs., fractions truncated
Clock time from the moment the searcher sees the
topic until the moment the searcher indicates the
search is complete or time is up.
B. Documents file
One line for each document in a given search result,
each line containing the following blank-delimited
items from left to right:
1. Chronological sequence number ( "1", "2") within a search
Use number of last time saved if saved multiple times.
2. Search ID (from search file)
3. TREC document identifier (DOCNO)
NOTE: Reported data items listed within each line must NOT
contain whitespace.
Format of other data to be submitted to NIST
--------------------------------------------
Data other than that in sparse-format should be submitted as ASCII text
files.
The FA-1 score plus the questionaire data for each searcher should be
submitted in a separate file with format close to the following example
but with the real responses to the right of the colons. The Tutorial
Worksheet and Experimenter Note need not be submitted.
S i t e:
S e a r c h e r I D:
FA-1 score: ?
P r e - s e a r c h : (1 per searcher)
Searcher: id
Condition: ?
Degrees: degree major date
Degrees: degree major date
Degrees: degree major date
Degrees: degree major date
Degrees: degree major date
Occupation: ...
Gender: M | F
Age: nn
Previous TREC: Y | N
Online searching: nn
Q1: 1-5
Q2: 1-5
Q3: 1-5
Q4: 1-5
Q5: 1-5
Q6: 1-5
Q7: 1-5
Q8: 1-5
S e a r c h : (8 per searcher)
Searcher: id
Condition: ?
Topic #: nnn
Q1: 1-5
Q2: 1-5
Q3: 1-5
Q4: 1-5
Q5: 1-5
Q6: 1-5
P o s t - s y s t e m : (2 per searcher)
Searcher: id
Condition: ?
Q1: 1-5
Q2: 1-5
Q3: 1-5
Comments: ...
S e a r c h e r w o r k s h e e t : (8 per searcher)
Searcher: id
Condition: ?
Topic #: nnn
1. ...
2. ...
3. ...
.
.
.
E x i t : (1 per searcher)
Searcher id
Q1: 1-5
Q2: 1-5
Q3: 1-5
Q4: one-system's-name rank
other-system's-name rank
Q5: one-system's-name rank
other-system's-name rank
Q6: one-system's-name rank
other-system's-name rank
Q7: ...
Q8: ...
Q9: ...
Evaluation of data submitted to NIST
------------------------------------
Evaluation by NIST of the sparse format data will proceed as follows.
For each topic, a pool will be formed containing the unique documents
saved by at least one searcher for that topic regardless of site.
For each topic, the NIST assessor, normally the topic author, will be asked
to:
- read the topic carefully
- read each of the documents from the pool for that topic and
gradually:
- create a list of instances of the topic's needed information
type found somewhere in the documents
- select and record a short phrase describing each instance found
- determine which documents contain which instances
- bracket each instance in the text of the document in which it
was found
For each search (by a given participant for a given topic at a given site),
NIST will use the submitted list of selected documents and the assessor's
instance-document mapping for the topic to calculate:
- the fraction of total instances (as determined by the assessor) for
the topic that are covered by the submitted documents (i.e.,
instance recall)
- the fraction of the submitted documents which contain one or more
instances (i.e., instance precision)
The third measure, elapsed clock time, will be taken directly from the
submitted results for each search.
Experimental design for a site
------------------------------
1. Minimal experimental matrix as run
The design for this year's track departs from last year's. One limitation
of last year's balanced block design was the potential statistical
confouding of topic and its order. A design that controls for query order
leads to a simpler statistical analysis of results.
As such, this year's approach will insure that each query is searched in
each position (first through sixth) by each system. This requires a
minimum of 12 searchers per site. In addition, the query orders for
each site will need to be generated in a pseudorandom fashion. To make
this process consistent, the query orders will be generated by the OHSU
group. Below is an example of system-query order for a site. (NOTE:
Please do not use this example, as new sets must be generated for each
12-searcher block.)
Subject Block #1 Block #2
1 System 1: 6-1-2 System 2: 3-4-5
2 System 2: 1-2-3 System 1: 4-5-6
3 System 2: 2-3-4 System 1: 5-6-1
4 System 2: 3-4-5 System 1: 6-1-2
5 System 1: 4-5-6 System 2: 1-2-3
6 System 1: 5-6-1 System 2: 2-3-4
7 System 2: 6-1-2 System 1: 3-4-5
8 System 1: 1-2-3 System 2: 4-5-6
9 System 1: 2-3-4 System 2: 5-6-1
10 System 1: 3-4-5 System 2: 6-1-2
11 System 2: 4-5-6 System 1: 1-2-3
12 System 2: 5-6-1 System 1: 2-3-4
Query blocks should be requested from Bill Hersh as early as possible.
2. Augmentation
The design for a given site can be augmented in two ways:
1. Participants can be added in groups of 6 using the design
above. Additional blocks should be requested from Bill
Hersh.
2. Systems can be added by adding additional groups of 6 users
with each new system. Additional blocks should be requested
Bill Hersh.
Topics cannot be added/subtracted individually for each site.
All augmentations other than the two listed above, however interesting,
are outside the scope of this design. If sites plan such adjunct
experiments, they are encouraged to design them for maximal synergy
with the track design.
3. Analysis
Up to each group, but all are strongly encouraged to take advantage
of the experimental design and undertake:
1. exploratory data analysis
to examine the patterns of correlation, interaction, etc.
involving the major factors. Some example plots for the TREC-6
interactive data (recall or precision by searcher or topic)
are available on the Interactive Track web site
<> under "Interactive Track History".
2. analysis of variance (ANOVA), where appropriate,
to estimate the separate contributions of searcher, topic and
system as a first step in understanding why the results of one
search are different from those of another
Last updated: Tuesday, 22-Sep-2015 09:51:46 EDT
Date created: Monday, 31-Jul-00
For information about this webpage contact trec@nist.gov