TREC-7 Interactive Track Guidelines
Goal
----
The high-level goal of the Interactive Track in TREC-7 remains the
investigation of searching as an interactive task by examining the
process as well as the outcome. To this end a experimental framework
has been designed with the following common features:
- an interactive search task
- 8 topics
- a document collection to be searched
- a required set of searcher (demographics) questionnaires
- a required psychometric test for all searchers
- 6 classes of data to be collected at each site and submitted to NIST
- 3 summary measures to be calculated by NIST for use by participants
The framework will allow groups to estimate the effect of their
experimental manipulation free and clear of the main (additive)
effects of participant and topic and it will reduce the effect of
interactions.
In TREC-7 the emphasis will be on each group's exploration of
different approaches to supporting the common searcher task and
understanding the reasons for the results they get. No formal
coordination of hypotheses or comparison of systems across sites is
planned for TREC-7, but groups are encouraged to seek out and exploit
synergies. As a first step, groups are strongly encouraged to make the
focus of their planned investigations known to other track
participants as soon as possible.
General Description
-------------------
A minimum of eight participating searchers, one experimental system, and
one control system per site will be required. The control system can
be any IR system appropriate to the goals of the local experiment,
e.g. a variant of the local experimental system, some other baseline
system such as SMART, ZPRISE, etc. (See "2. Augmentation" in the
detailed experimental design for information about how to use more
than eight searchers or more than one experimental system within this
design.)
Each searcher will perform eight searches on the Financial Times of
London 1991-1994 collection (part of the TREC-7 adhoc collection),
using eight topics especially chosen from the TREC-7 adhoc topics and
modified for use in the interactive track. Each searcher will perform
half of the total number of searches on the site's experimental system
and the other half on its control system. The detailed experimental
design (see below) determines the order in which each searcher uses
the systems (experimental and control).
In resolving experimental design questions not covered here (e.g.,
scheduling of tutorials and searches, etc.), participating sites
should try to minimize the differences between the conditions under
which a given searcher uses the control and those under which s/he
uses the experimental system. For example, running all the control
searches for a participant on one day and the searches on the
experimental system on another invites unequal, confounding
conditions.
Topics
------
Each of the topics will describe a need for information of a
particular type. Contained within the documents of the collection
to be searched will be multiple distinct examples or instances of the
needed information. The interactive topics will be modified versions
of specially selected adhoc topics. Here is an example TREC-6 adhoc
topic:
Number: 303i
Title: Hubble Telescope Achievements
Description:
Identify positive accomplishments of the Hubble telescope
since it was launched in 1991.
Narrative:
Documents are relevant that show the Hubble telescope has
produced new data, better quality data than previously
available, data that has increased human knowledge of the
universe, or data that has led to disproving previously
existing theories or hypotheses. Documents limited to the
shortcomings of the telescope would be irrelevant. Details
of repairs or modifications to the telescope without
reference to positive achievements would not be relevant.
Here is an example of the same topic as it would be modified for use
in the TREC-7 interactive track. Note the addition of the "Please
save" paragraph and the removal of the usual Narrative section with
its specific criteria for relevance or non-relevance:
Number: 303i
Title: Hubble Telescope Achievements
Description:
Identify positive accomplishments of the Hubble telescope
since it was launched in 1991.
Instances:
In the time alloted, please find as many DIFFERENT positive
accomplishments of the sort described above as you can.
Please save at least one document for EACH such DIFFERENT
accomplishment.
If one document discusses several such accomplishments, then
you need not save other documents that repeat those, since your
goal is to identify as many DIFFERENT accomplishments of the sort
described above as possible.
Here are the topics for TREC-7 in NUMERICAL order. See the section
"Experimental design for a site" below for their assignment to blocks
and the order of presentation within the experimental design.
-------------------------------------------------------------------------
Number:
352i
Title:
British Chunnel impacts
Description:
Impacts of the Chunnel - anticipated or actual - on the British
economy and/or the life style of the British
Instances:
In the time alloted, please find as many DIFFERENT impacts of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT impact.
If one document discusses several such impacts, then you need
not save other documents that repeat those, since your goal
is to identify as many DIFFERENT impacts of the sort described
above as possible.
-------------------------------------------------------------------------
Number:
353i
Title:
Antarctic exploration
Description:
Identify systematic explorations and scientific investigations
of Antarctica, current or planned.
Instances:
In the time alloted, please find as many DIFFERENT explorations
or investigations of the sort described above as you can. Please
save at least one document for EACH such DIFFERENT exploration or
investigation.
If one document discusses several such investigations/explorations,
then you need not save other documents that repeat those, since your
goal is to identify as many DIFFERENT investigations or explorations
of the sort described above as possible.
-------------------------------------------------------------------------
Number:
357i
Title:
territorial waters dispute
Description:
Identify documents discussing international boundary
disputes relevant to the 200-mile special economic
zones or 12-mile territorial waters subsequent to
the passing of the "International Convention on the
Law of the Sea".
Instances:
In the time alloted, please find as many DIFFERENT disputes of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT dispute.
If one document discusses several such disputes, then you need
not save other documents that repeat those, since your goal is
to identify as many DIFFERENT disputes of the sort described
above as possible.
-------------------------------------------------------------------------
Number:
362i
Title:
human smuggling
Description:
Identify incidents of human smuggling.
Instances:
In the time alloted, please find as many DIFFERENT incidents of
the sort described above as you can. Please save at least one
document for EACH DIFFERENT incident of the sort described above.
If one document discusses several such incidents, then you
need not save other documents that repeat those, since your goal
is to identify DIFFERENT incidents of the sort described above.
-------------------------------------------------------------------------
Number:
365i
Title:
El Nino
Description:
What effects have been attributed to El Nino?
Instances:
In the time alloted, please find as many DIFFERENT effects of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT effect.
If one document discusses several such effects, then you need
not save other documents that repeat those, since your goal
is to identify as many DIFFERENT effects of the sort described
above as possible.
-------------------------------------------------------------------------
Number:
366i
Title:
commercial cyanide uses
Description:
What are the industrial or commercial uses of
cyanide or its derivatives?
Instances:
In the time alloted, please find as many DIFFERENT uses of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT use.
If one document discusses several such uses, then you need not
save other documents that repeat those, since your goal is to
identify as many DIFFERENT uses of the sort described above as
possible.
-------------------------------------------------------------------------
Number:
387i
Title:
radioactive waste
Description:
Identify documents that discuss effective and safe ways to
permanently handle long-lived radioactive wastes.
Instances:
In the time alloted, please find as many DIFFERENT ways of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT way.
If one document discusses several such ways, then you need not
not save other documents that repeat those, since your goal is
to identify as many DIFFERENT ways of the sort described above
as possible.
-------------------------------------------------------------------------
Number:
392i
Title:
robotics
Description:
What are the applications of robotics in the world today?
Instances:
In the time alloted, please find as many DIFFERENT applications of
the sort described above as you can. Please save at least one
document for EACH such DIFFERENT application.
If one document discusses several such applications, then you
need not save other documents that repeat those, since your goal
is to identify as many DIFFERENT applications of the sort described
above as possible.
-------------------------------------------------------------------------
Searcher task
-------------
The task of the interactive searcher is to save documents, which,
taken together, contain as many different instances as possible of
the type of information the topic expresses a need for - within
a 15 minute time limit.
Searchers will be encouraged to avoid saving documents which
contribute no instances beyond those in documents already saved, but
there will be no scoring penalty for saving such documents and
searchers will be told that.
Instructions to be given to searchers
-------------------------------------
The following introductory instructions are to be given once to each searcher
before the first search:
"Imagine that you have just returned from a visit to your doctor
during which it was discovered that you are suffering from high
blood pressure. The doctor suggests that you take a new experimental
drug, but you wonder what alternative treatments are currently
available. You decide to investigate the literature on your own
to satisfy your need for information about what different alternatives
are available to you for high blood pressure treatment. You really
need only one document for each of the different treatments for high
blood pressure.
You find and save a single document that lists four treatment drugs.
Then you find and save another two documents that each discusses a
separate alternative treatment: one that discusses the use of
calcium and one that talks about regular exercise. You've run out
of time and stop your search. In all, you have identified six different
instances of alternative treatments in three documents.
---
In this experiment, you will face a similar task. You will be
presented with several descriptions of needed information on a
number of topics. In each case there can be multiple examples or
instances of the type of information that's needed.
We would like you to identify as many different instances as you
can of the needed information for each topic that will be presented
to you - as many as you can in the 15 minutes you will be given
to search. Please save one document for EACH DIFFERENT instance
of the needed information that you identify. If you save one document
that contains several instances, try not to save additional documents
that contain ONLY those instances. However, you will not be penalized
if you save documents unnecessarily.
As you identify an instance of the needed information, please keep
track of which instances you have found: write down a word or short
phrase to identify the instance, or--if the system provides a facility
to keep track of instances--use it.
Carefully read each topic to understand the type of information
needed. This will vary from topic to topic. On one topic you may be
looking for instances of a certain kind of event. On another you may
be searching for examples of certain sorts of people, places, or
things.
Do you have any questions about
- what we mean by instances of needed information
- the way in which you are to save nonredundant documents for each
instance?"
Searcher questionnaires (minimum)
-----------------------
Provided by Rutgers (see track web site)
Psychometric test
-----------------
- FA-1 (Controlled Associations)
from ETS's "Kit of Reference Tests for Cognitive Factors" (1976 Edition)
Data to be collected and submitted to NIST (emailed to over@nist.gov)
------------------------------------------
Several sorts of result data will be collected for evaluation/analysis (for
all searches unless otherwise specified):
===> Due at NIST by 30. August 1998:
1. sparse format data
===> Due at NIST by end of the day (Washington,DC) on 27. October 1998:
2. rich format data
3. a full narrative description of one interactive session for
whichever topic is designated as T1
4. any further guidance or refinement of the task specification
given to the searchers
5. data from the common searcher questionnaires
6. results from the psychometric test (FA-1) given to all searchers
Sparse format data for each search will comprise the list of documents
saved and the elapsed clock time of the search. The searcher's
selection (choice) of items for the final output list must be
identified in terms of each document's TREC document identifier
(DOCNO). The elapsed (clock) time in seconds taken for the search,
from the time the searcher first sees the topic until s/he declares
the search to be finished, should be recorded. It is assumed that the
interactive search takes place in one uninterrupted session. If a
session is unavoidably interrupted, it is recommended that it be
abandoned and the topic given to another searcher. Sparse format data
will be the basis for the summary evaluation at NIST, which will
produce a triple for each search: instance precision, instance
recall, and elapsed clock time.
Rich format data for each search will record:
- the word or phrase each searcher records to describe each
instance s/he identifies (no reference to the containing document(s))
- significant events in the course of the interaction and their
timing.
Rich format data are intended for analytical evaluation by the
experimenters.
All significant events and their timing in the course of the
interaction should be recorded. The events listed below are those
that seem to be fairly generally applicable to different systems
and interactive environments; however, the list may need extending
or modifying for specific systems and so should be taken as a
suggestion rather than a requirement:
o Intermediate search formulations: if appropriate to the
system, these should be recorded.
o Documents viewed: "viewing" is taken to mean the searcher
seeing a title or some other brief information about a
document; these events should be recorded.
o Documents seen: "seeing" is taken to mean the searcher
seeing the text of a document, or a substantial section of
text; these events should be recorded.
o Terms entered by the searcher: if appropriate to the
system, these should be recorded.
o Terms seen (offered by the system): if appropriate to the
system, these should be recorded.
o Selection/rejection: documents or terms selected by the
user for any further stage of the search (in addition to the
final selection of documents).
Format of sparse data to be submitted to NIST
---------------------------------------------
TWO files from each site
A. Search file
Here a "search" is the interaction of a searcher given a topic
and asked to carry out the interactive search task using a given
system against the collection - lasting at most 20 minutes.
One line for EACH SEARCH, each line containing the
following blank-delimited items from left to right:
1. Unique site ID
2. Search ID - site's choice (links search & document files)
3. Searcher ID - site's choice
4. System ID - site's choice
5. TREC topic number
6. Elapsed time - number of secs., fractions truncated
Clock time from the moment the searcher sees the
topic until the moment the searcher indicates the
search is complete or time is up.
B. Documents file
One line for each document in a given search result,
each line containing the following blank-delimited
items from left to right:
1. Chronological sequence number ( "1", "2") within a search
Use number of last time saved if saved multiple times.
2. Search ID (from search file)
3. TREC document identifier (DOCNO)
NOTE: Reported data items listed within each line must NOT
contain whitespace.
Format of other data to be submitted to NIST
--------------------------------------------
Data other than that in sparse-format should be submitted as ASCII text
files.
The FA-1 score plus the questionaire data for each searcher should be
submitted in a separate file with format close to the following example
but with the real responses to the right of the colons. The Tutorial
Worksheet and Experimenter Note need not be submitted.
S i t e:
S e a r c h e r I D:
FA-1 score: ?
P r e - s e a r c h : (1 per searcher)
Searcher: id
Condition: ?
Degrees: degree major date
Degrees: degree major date
Degrees: degree major date
Degrees: degree major date
Degrees: degree major date
Occupation: ...
Gender: M | F
Age: nn
Previous TREC: Y | N
Online searching: nn
Q1: 1-5
Q2: 1-5
Q3: 1-5
Q4: 1-5
Q5: 1-5
Q6: 1-5
Q7: 1-5
Q8: 1-5
S e a r c h : (8 per searcher)
Searcher: id
Condition: ?
Topic #: nnn
Q1: 1-5
Q2: 1-5
Q3: 1-5
Q4: 1-5
Q5: 1-5
Q6: 1-5
P o s t - s y s t e m : (2 per searcher)
Searcher: id
Condition: ?
Q1: 1-5
Q2: 1-5
Q3: 1-5
Comments: ...
S e a r c h e r w o r k s h e e t : (8 per searcher)
Searcher: id
Condition: ?
Topic #: nnn
1. ...
2. ...
3. ...
.
.
.
E x i t : (1 per searcher)
Searcher id
Q1: 1-5
Q2: 1-5
Q3: 1-5
Q4: one-system's-name rank
other-system's-name rank
Q5: one-system's-name rank
other-system's-name rank
Q6: one-system's-name rank
other-system's-name rank
Q7: ...
Q8: ...
Q9: ...
Evaluation of data submitted to NIST
------------------------------------
Evaluation by NIST of the sparse format data will proceed as follows.
For each topic, a pool will be formed containing the unique documents
saved by at least one searcher for that topic regardless of site.
For each topic, the NIST assessor, normally the topic author, will be asked
to:
- read the topic carefully
- read each of the documents from the pool for that topic and
gradually:
- create a list of instances of the topic's needed information
type found somewhere in the documents
- select and record a short phrase describing each instance found
- determine which documents contain which instances
- bracket each instance in the text of the document in which it
was found
For each search (by a given participant for a given topic at a given site),
NIST will use the submitted list of selected documents and the assessor's
instance-document mapping for the topic to calculate:
- the fraction of total instances (as determined by the assessor) for
the topic that are covered by the submitted documents (i.e.,
instance recall)
- the fraction of the submitted documents which contain one or more
instances (i.e., instance precision)
The third measure, elapsed clock time, will be taken directly from the
submitted results for each search.
Experimental design for a site
------------------------------
1. Minimal experimental matrix as run
Define two blocks of four topics each, order of presentation fixed
within each block:
B1 = T1 -> T2 -> T3 -> T4
365i 357i 362i 352i
B2 = T5 -> T6 -> T7 -> T8
366i 392i 387i 353i
Participants | System,Topic
--------------+--------------------
P1 | E,B1 C,B2
P2 | C,B2 E,B1
P3 | E,B2 C,B1
P4 | C,B1 E,B2
P5 | E,B1 C,B2
P6 | C,B2 E,B1
P7 | E,B2 C,B1
P8 | C,B1 E,B2
or expanded to show the individual topics:
Participants | System,Topic combinations
--------------+---------------------------------------------------
P1 | E,T1 E,T2 E,T3 E,T4 C,T5 C,T6 C,T7 C,T8
P2 | C,T5 C,T6 C,T7 C,T8 E,T1 E,T2 E,T3 E,T4
P3 | E,T5 E,T6 E,T7 E,T8 C,T1 C,T2 C,T3 C,T4
P4 | C,T1 C,T2 C,T3 C,T4 E,T5 E,T6 E,T7 E,T8
P5 | E,T1 E,T2 E,T3 E,T4 C,T5 C,T6 C,T7 C,T8
P6 | C,T5 C,T6 C,T7 C,T8 E T1 E,T2 E,T3 E,T4
P7 | E,T5 E,T6 E,T7 E,T8 C,T1 C,T2 C,T3 C,T4
P8 | C,T1 C,T2 C,T3 C,T4 E,T5 E,T6 E,T7 E,T8
- E = experimental system
- C = Control system - site's choice
- The participants (searchers) should be numbered sequentially, 1,
..., J. J must be at least 8 (see part 4 below on how to add more)
Each site will randomly assign participants to the rows of its
design.
The order for presentation of topics to searchers at all participating
sites is defined by the above design. The assignment of actual topics
to T1, T2, ... T8 will be determined by NIST in collaboration with the
track shortly after the interactive topics are made available.
For the purposes of analysis each 4-person-by-8-topic matrix
defined above will in effect be rearranged by permuting the
columns (topics) so E alternates with C as in the following:
Participants | System,Topic combinations
--------------+---------------------------------------------------
P1 | E,T1 C,T5 E,T2 C,T6 E,T3 C,T7 E,T4 C,T8
P2 | C,T5 E,T1 C,T6 E,T2 C,T7 E,T3 C,T8 E,T4
P3 | E,T5 C,T1 E,T6 C,T2 E,T7 C,T3 E,T8 C,T4
P4 | C,T1 E,T5 C,T2 E,T6 C,T3 E,T7 C,T4 E,T8
Note that this matrix consists of the following 2x2 subdesign:
E C
C E
This 2x2 design is a latin square design. It has the property
that the "treatment effect", here E-C, the control-adjusted response,
can be estimated free and clear of the main (additive) effects of
participant and topic. Here, participant and topic are treated
statistically as blocking factors. This means that even in the
presence of differences between participants and topics, which
clearly are anticipated, the design will provide estimates of E-C
that are not contaminated by these differences.
However, the estimate of E-C is contaminated by the presence
of an interaction between topic and participant. Therefore, we
replicate the 2x2 latin square 4x4 times to get the minimal 8x8
design for each site. The contaminating effect of the topic
by participant interaction is reduced by averaging the sixteen
estimates of E-C that are available, one for each 2x2 latin
square. This is analogous to averaging replicate measurements of
a single quantity in order to reduce the measurement uncertainty.
2. Augmentation
The design for a given site can be augmented in two ways:
1. Participants can be added by in groups of 4 using the design
for P1-4 (above).
2. Systems can be added by repeating the 8x8 design with at
least one new system.
Topics cannot be added/subtracted individually for each site.
All augmentations other than the two listed above, however interesting,
are outside the scope of this design. If sites plan such adjunct
experiments, they are encouraged to design them for maximal synergy
with the track design.
3. Analysis
Up to each group, but all are strongly encouraged to take advantage
of the experimental design and undertake:
1. exploratory data analysis
to examine the patterns of correlation, interaction, etc.
involving the major factors. Some example plots for the TREC-6
interactive data (recall or precision by searcher or topic)
are available on the Interactive Track web site at
www-nlpir.nist.gov/~over/t7i under "Interactive Track History".
2. analysis of variance (ANOVA), where appropriate,
to estimate the separate contributions of searcher, topic and
system as a first step in understanding why the results of one
search are different from those of another.
Last updated: Thursday, 23-Feb-2017 18:17:54 UTC
Date created: Monday, 31-Jul-00
For information about this webpage contact trec@nist.gov