TREC-7 Interactive Track Guidelines

Goal 
---- 

The high-level goal of the Interactive Track in TREC-7 remains the
investigation of searching as an interactive task by examining the
process as well as the outcome. To this end a experimental framework
has been designed with the following common features:

	- an interactive search task
	- 8 topics
	- a document collection to be searched
	- a required set of searcher (demographics) questionnaires
	- a required psychometric test for all searchers
	- 6 classes of data to be collected at each site and submitted to NIST
	- 3 summary measures to be calculated by NIST for use by participants

The framework will allow groups to estimate the effect of their
experimental manipulation free and clear of the main (additive)
effects of participant and topic and it will reduce the effect of
interactions.

In TREC-7 the emphasis will be on each group's exploration of
different approaches to supporting the common searcher task and
understanding the reasons for the results they get. No formal
coordination of hypotheses or comparison of systems across sites is
planned for TREC-7, but groups are encouraged to seek out and exploit
synergies. As a first step, groups are strongly encouraged to make the
focus of their planned investigations known to other track
participants as soon as possible.


General Description 
------------------- 

A minimum of eight participating searchers, one experimental system, and
one control system per site will be required.  The control system can
 be any IR system appropriate to the goals of the local experiment,
e.g. a variant of the local experimental system, some other baseline
system such as SMART, ZPRISE, etc.  (See "2. Augmentation" in the
detailed experimental design for information about how to use more
than eight searchers or more than one experimental system within this
design.)

Each searcher will perform eight searches on the Financial Times of
London 1991-1994 collection (part of the TREC-7 adhoc collection),
using eight topics especially chosen from the TREC-7 adhoc topics and
modified for use in the interactive track.  Each searcher will perform
half of the total number of searches on the site's experimental system
and the other half on its control system.  The detailed experimental
design (see below) determines the order in which each searcher uses
the systems (experimental and control).

In resolving experimental design questions not covered here (e.g.,
scheduling of tutorials and searches, etc.), participating sites
should try to minimize the differences between the conditions under
which a given searcher uses the control and those under which s/he
uses the experimental system. For example, running all the control
searches for a participant on one day and the searches on the
experimental system on another invites unequal, confounding
conditions.


Topics
------

Each of the topics will describe a need for information of a
particular type. Contained within the documents of the collection
to be searched will be multiple distinct examples or instances of the
needed information. The interactive topics will be modified versions
of specially selected adhoc topics. Here is an example TREC-6 adhoc
topic:

	Number: 303i 

	Title: Hubble Telescope Achievements 

	Description: 
	Identify positive accomplishments of the Hubble telescope 
	since it was launched in 1991.

	Narrative: 
	Documents are relevant that show the Hubble telescope has 
	produced new data, better quality data than previously 
	available, data that has increased human knowledge of the 
	universe, or data that has led to disproving previously 
	existing theories or hypotheses.  Documents limited to the 
	shortcomings of the telescope would be irrelevant.  Details 
	of repairs or modifications to the telescope without 
	reference to positive achievements would not be relevant.


Here is an example of the same topic as it would be modified for use
in the TREC-7 interactive track. Note the addition of the "Please
save" paragraph and the removal of the usual Narrative section with
its specific criteria for relevance or non-relevance:

	Number: 303i 

	Title: Hubble Telescope Achievements 

	Description: 
	Identify positive accomplishments of the Hubble telescope 
	since it was launched in 1991.

	Instances:
        In the time alloted, please find as many DIFFERENT positive 
	accomplishments of the sort described above as you can.
        Please save at least one document for EACH such DIFFERENT 
	accomplishment.
	If one document discusses several such accomplishments, then 
	you need not save other documents that repeat those, since your 
	goal is to identify as many DIFFERENT accomplishments of the sort 
	described above as possible.

Here are the topics for TREC-7 in NUMERICAL order. See the section
"Experimental design for a site" below for their assignment to blocks
and the order of presentation within the experimental design.

-------------------------------------------------------------------------
Number: 
 	352i
    
Title: 	
	British Chunnel impacts  

Description: 
	Impacts of the Chunnel - anticipated or actual - on the British 
 	economy and/or the life style of the British
 
Instances:
        In the time alloted, please find as many DIFFERENT impacts of 
	the sort described above as you can. Please save at least one 
	document for EACH such  DIFFERENT impact.
	If one document discusses several such impacts, then you need
	not save other documents that repeat those, since your goal 
	is to identify as many DIFFERENT impacts of the sort described 
	above as possible.

-------------------------------------------------------------------------
Number: 
	353i  
 
Title: 	
	Antarctic exploration

Description: 
	Identify systematic explorations and scientific investigations
	of Antarctica, current or planned.

Instances:
        In the time alloted, please find as many DIFFERENT explorations
	or investigations of the sort described above as you can. Please 
	save at least one document for EACH such DIFFERENT exploration or
	investigation. 
	If one document discusses several such investigations/explorations, 
	then you need not save other documents that repeat those, since your 
	goal is to identify as many DIFFERENT investigations or explorations
	of the sort described above as possible.
-------------------------------------------------------------------------
Number: 
	357i 
  
Title: 
	territorial waters dispute 

Description: 
	Identify documents discussing international boundary 
	disputes relevant to the 200-mile special economic 
	zones or 12-mile territorial waters subsequent to 
	the passing of the "International Convention on the 
	Law of the Sea".

Instances:
        In the time alloted, please find as many DIFFERENT disputes of 
	the sort described above as you can. Please save at least one 
	document for EACH such DIFFERENT dispute. 
	If one document discusses several such disputes, then you need
	not save other documents that repeat those, since your goal is 
	to identify as many DIFFERENT disputes of the sort described
	above as possible.

-------------------------------------------------------------------------
Number: 
	362i   

Title: 
	human smuggling 

Description: 
	Identify incidents of human smuggling.

Instances:
        In the time alloted, please find as many DIFFERENT incidents of 
	the sort described above as you can. Please save at least one 
	document for EACH DIFFERENT incident of the sort described above. 
	If one document discusses several such incidents, then you 
	need not save other documents that repeat those, since your goal 
	is to identify DIFFERENT incidents of the sort described above.

-------------------------------------------------------------------------
Number: 
	365i   

Title: 
	El Nino   

Description: 
	What effects have been attributed to El Nino?
 
Instances:
        In the time alloted, please find as many DIFFERENT effects of 
	the sort described above as you can. Please save at least one 
	document for EACH such DIFFERENT effect. 
	If one document discusses several such effects, then you need
	not save other documents that repeat those, since your goal 
	is to identify as many DIFFERENT effects of the sort described 
	above as possible.

-------------------------------------------------------------------------
Number: 
	366i 
  
Title: 
	commercial cyanide uses 

Description: 
	What are the industrial or commercial uses of 
	cyanide or its derivatives? 
 
Instances:
        In the time alloted, please find as many DIFFERENT uses of 
	the sort described above as you can. Please save at least one 
	document for EACH such DIFFERENT use.
	If one document discusses several such uses, then you need not 
	save other documents that repeat those, since your goal is to
	identify as many DIFFERENT uses of the sort described above as
	possible.

-------------------------------------------------------------------------
Number: 
	387i  
 
Title: 
	radioactive waste 

Description: 
	Identify documents that discuss effective and safe ways to 
	permanently handle long-lived radioactive wastes.
 
Instances:
        In the time alloted, please find as many DIFFERENT ways of 
	the sort described above as you can. Please save at least one 
	document for EACH such DIFFERENT way.
	If one document discusses several such ways, then you need not
	not save other documents that repeat those, since your goal is
	to identify as many DIFFERENT ways of the sort described above
	as possible.

-------------------------------------------------------------------------
Number: 
	392i  
 
Title: 
	robotics 

Description: 
	What are the applications of robotics in the world today?

Instances:
        In the time alloted, please find as many DIFFERENT applications of 
	the sort described above as you can. Please save at least one 
	document for EACH such DIFFERENT application.
	If one document discusses several such applications, then you 
	need not save other documents that repeat those, since your goal 
	is to identify as many DIFFERENT applications of the sort described 
	above as possible.

------------------------------------------------------------------------- 

Searcher task
-------------

The task of the interactive searcher is to save documents, which,
taken together, contain as many different instances as possible of 
the type of information the topic expresses a need for - within
a 15 minute time limit.

Searchers will be encouraged to avoid saving documents which
contribute no instances beyond those in documents already saved, but
there will be no scoring penalty for saving such documents and
searchers will be told that.

Instructions to be given to searchers
-------------------------------------

The following introductory instructions are to be given once to each searcher 
before the first search:

	"Imagine that you have just returned from a visit to your doctor 
	during which it was discovered that you are suffering from high
        blood pressure. The doctor suggests that you take a new experimental
        drug, but you wonder what alternative treatments are currently 
        available.  You decide to investigate the literature on your own
        to satisfy your need for information about what different alternatives
	are available to you for high blood pressure treatment. You really 
	need only one document for each of the different treatments for high 
	blood pressure. 

        You find and save a single document that lists four treatment drugs.
        Then you find and save another two documents that each discusses a
	separate alternative treatment: one that discusses the use of
        calcium and one that talks about regular exercise.  You've run out 
	of time and stop your search. In all, you have identified six different
	instances of alternative treatments in three documents. 

	---

	In this experiment, you will face a similar task. You will be 
	presented with several descriptions of needed information on a 
	number of topics. In each case there can be multiple examples or 
	instances of the type of information that's needed.

	We would like you to identify as many different instances as you
	can of the needed information for each topic that will be presented 
	to you -  as many as you can in the 15 minutes you will be given 
	to search.  Please save one document for EACH DIFFERENT instance 
	of the needed information that you identify. If you save one document 
	that contains several instances, try not to save additional documents 
	that contain ONLY those instances. However, you will not be penalized 
	if you save documents unnecessarily.  

	As you identify an instance of the needed information, please keep 
	track of which instances you have found: write down a word or short 
	phrase to identify the instance, or--if the system provides a facility
	to keep track of instances--use it.
	
        Carefully read each topic to understand the type of information 
	needed. This will vary from topic to topic. On one topic you may be 
	looking for instances of a certain kind of event. On another you may 
	be searching for examples of certain sorts of people, places, or 
	things.

	Do you have any questions about 
	- what we mean by instances of needed information 
	- the way in which you are to save nonredundant documents for each
	  instance?"

Searcher questionnaires (minimum)
-----------------------

Provided by Rutgers (see track web site)


Psychometric test
-----------------

- FA-1 (Controlled Associations)

from ETS's "Kit of Reference Tests for Cognitive Factors" (1976 Edition)


Data to be collected and submitted to NIST (emailed to [email protected])
------------------------------------------

Several sorts of result data will be collected for evaluation/analysis (for
all searches unless otherwise specified):


   ===>  Due at NIST by 30. August 1998:

	1. sparse format data	


   ===>  Due at NIST by end of the day (Washington,DC) on 27. October 1998:

	2. rich format data

	3. a full narrative description of one interactive session for
           whichever topic is designated as T1

	4. any further guidance or refinement of the task specification
	   given to the searchers

	5. data from the common searcher questionnaires

	6. results from the psychometric test (FA-1) given to all searchers

Sparse format data for each search will comprise the list of documents
saved and the elapsed clock time of the search. The searcher's
selection (choice) of items for the final output list must be
identified in terms of each document's TREC document identifier
(DOCNO). The elapsed (clock) time in seconds taken for the search,
from the time the searcher first sees the topic until s/he declares
the search to be finished, should be recorded.  It is assumed that the
interactive search takes place in one uninterrupted session.  If a
session is unavoidably interrupted, it is recommended that it be
abandoned and the topic given to another searcher.  Sparse format data
will be the basis for the summary evaluation at NIST, which will
produce a triple for each search: instance precision, instance
recall, and elapsed clock time.

Rich format data for each search will record:

- the word or phrase each searcher records to describe each
  instance s/he identifies (no reference to the containing document(s))

- significant events in the course of the interaction and their 
  timing.  

          Rich format data are intended for analytical evaluation by the 
          experimenters.
 
          All significant events and their timing in the course of the 
          interaction should be recorded.  The events listed below are those 
          that seem to be fairly generally applicable to different systems 
          and interactive environments; however, the list may need extending 
          or modifying for specific systems and so should be taken as a 
          suggestion rather than a requirement:

	  o Intermediate search formulations:  if appropriate to the 
	    system, these should be recorded.

	  o Documents viewed:  "viewing" is taken to mean the searcher 
	    seeing a title or some other brief information about a 
	    document; these events should be recorded.

	  o Documents seen:  "seeing" is taken to mean the searcher 
	    seeing the text of a document, or a substantial section of 
	    text; these events should be recorded. 

	  o Terms entered by the searcher:  if appropriate to the 
	    system, these should be recorded.

	  o Terms seen (offered by the system):  if appropriate to the 
	    system, these should be recorded.

	  o Selection/rejection:  documents or terms selected by the 
	    user for any further stage of the search (in addition to the 
	    final selection of documents). 

Format of sparse data to be submitted to NIST
---------------------------------------------

TWO files from each site
	
  A. Search file

	Here a "search" is the interaction of a searcher given a topic
	and asked to carry out the interactive search task using a given 
        system against the collection - lasting at most 20 minutes.

	One line for EACH SEARCH, each line containing the 
	following blank-delimited items from left to right:

		1. Unique site ID

		2. Search ID  - site's choice (links search & document files)

		3. Searcher ID - site's choice

		4. System ID - site's choice

		5. TREC topic number
			
		6. Elapsed time - number of secs., fractions truncated

		   Clock time from the moment the searcher sees the 
		   topic until the moment the searcher indicates the 
		   search is complete or time is up.

  B. Documents file

	One line for each document in a given search result,
	each line containing the following blank-delimited
	items from left to right:

		1. Chronological sequence number ( "1", "2") within a search
		   Use number of last time saved if saved multiple times.
	
		2. Search ID (from search file)

		3. TREC document identifier (DOCNO)	


	NOTE: Reported data items listed within each line must NOT 
	contain whitespace.	


Format of other data to be submitted to NIST
--------------------------------------------

Data other than that in sparse-format should be submitted as ASCII text
files.

The FA-1 score plus the questionaire data for each searcher should be 
submitted in a separate file with format close to the following example
but with the real responses to the right of the colons. The Tutorial
Worksheet and Experimenter Note need not be submitted.


	S i t e:

	S e a r c h e r  I D:

	FA-1 score:  ?

	P r e - s e a r c h : 			(1 per searcher)

	Searcher: 	id
	Condition: 	?
	Degrees:	degree major date
	Degrees:	degree major date
	Degrees:	degree major date
	Degrees:	degree major date
	Degrees:	degree major date
	Occupation:	...
	Gender:		M | F
	Age:   		nn
	Previous TREC:	Y | N
	Online searching: nn
	Q1: 		1-5
	Q2: 		1-5
	Q3: 		1-5
	Q4: 		1-5
	Q5: 		1-5
	Q6: 		1-5
	Q7: 		1-5
	Q8: 		1-5


	S e a r c h : 				(8 per searcher)

	Searcher: 	id
	Condition:	?
	Topic #:	nnn
	Q1: 		1-5
	Q2: 		1-5
	Q3: 		1-5
	Q4: 		1-5
	Q5: 		1-5
	Q6: 		1-5


	P o s t - s y s t e m :			(2 per searcher)

	Searcher:	id
	Condition:	?
	Q1: 		1-5
	Q2: 		1-5
	Q3: 		1-5
	Comments:	...

	S e a r c h e r   w o r k s h e e t : 	(8 per searcher)

	Searcher:	id
	Condition:	?
	Topic #:	nnn
	1. 		...
	2. 		...
	3. 		...
	.
	.
	.
	

	E x i t : 				(1 per searcher)

	Searcher	id
	Q1: 		1-5
	Q2: 		1-5
	Q3: 		1-5
	Q4: 		one-system's-name 	rank
	 		other-system's-name 	rank
	Q5: 		one-system's-name 	rank
	  		other-system's-name 	rank
	Q6: 		one-system's-name 	rank
	   		other-system's-name	rank
	Q7: 		...
	Q8: 		...
	Q9: 		...

Evaluation of data submitted to NIST
------------------------------------

Evaluation by NIST of the sparse format data will proceed as follows.
For each topic, a pool will be formed containing the unique documents
saved by at least one searcher for that topic regardless of site.

For each topic, the NIST assessor, normally the topic author, will be asked 
to:
	- read the topic carefully 
        - read each of the documents from the pool for that topic and 
	  gradually:
	   - create a list of instances of the topic's needed information
	     type found somewhere in the documents
           - select and record a short phrase describing each instance found
           - determine which documents contain which instances
           - bracket each instance in the text of the document in which it 
             was found

For each search (by a given participant for a given topic at a given site), 
NIST will use the submitted list of selected documents and the assessor's
instance-document mapping for the topic to calculate:

        - the fraction of total instances (as determined by the assessor) for 
          the topic that are covered by the submitted documents (i.e., 
          instance recall)

	- the fraction of the submitted documents which contain one or more
	  instances (i.e., instance precision)	

The third measure, elapsed clock time, will be taken directly from the 
submitted results for each search.



Experimental design for a site
------------------------------

  1. Minimal experimental matrix as run

     Define two blocks of four topics each, order of presentation fixed 
     within each block:

     B1 = T1 -> T2 -> T3 -> T4
          365i  357i  362i  352i
 
     B2 = T5 -> T6 -> T7 -> T8
          366i  392i  387i  353i

             
     		Participants  |  System,Topic
     		--------------+--------------------
      	          	   P1 |    E,B1  C,B2
       	         	   P2 |    C,B2  E,B1 
  			   P3 |    E,B2  C,B1
        	           P4 |    C,B1  E,B2

      	          	   P5 |    E,B1  C,B2
       	         	   P6 |    C,B2  E,B1  
   			   P7 |    E,B2  C,B1
        	           P8 |    C,B1  E,B2

     or expanded to show the individual topics:
                     
     Participants  |    System,Topic combinations
     --------------+---------------------------------------------------
                P1 |    E,T1  E,T2  E,T3  E,T4    C,T5  C,T6  C,T7  C,T8
                P2 |    C,T5  C,T6  C,T7  C,T8    E,T1  E,T2  E,T3  E,T4
  		P3 |    E,T5  E,T6  E,T7  E,T8    C,T1  C,T2  C,T3  C,T4
                P4 |    C,T1  C,T2  C,T3  C,T4    E,T5  E,T6  E,T7  E,T8

                P5 |    E,T1  E,T2  E,T3  E,T4    C,T5  C,T6  C,T7  C,T8
                P6 |    C,T5  C,T6  C,T7  C,T8    E T1  E,T2  E,T3  E,T4
  		P7 |    E,T5  E,T6  E,T7  E,T8    C,T1  C,T2  C,T3  C,T4
                P8 |    C,T1  C,T2  C,T3  C,T4    E,T5  E,T6  E,T7  E,T8

       -  E = experimental system
          
       -  C = Control system - site's choice
                 
       -  The participants (searchers) should be numbered sequentially, 1,
          ..., J. J must be at least 8 (see part 4 below on how to add more)
             
     Each site will randomly assign participants to the rows of its
     design.
       
     The order for presentation of topics to searchers at all participating 
     sites is defined by the above design. The assignment of actual topics
     to T1, T2, ... T8 will be determined by NIST in collaboration with the 
     track shortly after the interactive topics are made available.
             
     For the purposes of analysis each 4-person-by-8-topic matrix
     defined above will in effect be rearranged by permuting the
     columns (topics) so E alternates with C as in the following:
          

     Participants  |    System,Topic combinations
     --------------+---------------------------------------------------
                P1 |    E,T1  C,T5  E,T2  C,T6  E,T3  C,T7  E,T4  C,T8
                P2 |    C,T5  E,T1  C,T6  E,T2  C,T7  E,T3  C,T8  E,T4
  		P3 |    E,T5  C,T1  E,T6  C,T2  E,T7  C,T3  E,T8  C,T4
                P4 |    C,T1  E,T5  C,T2  E,T6  C,T3  E,T7  C,T4  E,T8

     Note that this matrix consists of the following 2x2 subdesign:

               E  C  
               C  E  

     This 2x2 design is a latin square design.  It has the property
     that the "treatment effect", here E-C, the control-adjusted response, 
     can be estimated free and clear of the main (additive) effects of 
     participant and topic.  Here, participant and topic are treated 
     statistically as blocking factors.  This means that even in the
     presence of differences between participants and topics, which 
     clearly are anticipated, the design will provide estimates of E-C 
     that are not contaminated by these differences.  

     However, the estimate of E-C is contaminated by the presence
     of an interaction between topic and participant. Therefore, we
     replicate the 2x2 latin square 4x4 times to get the minimal 8x8 
     design for each site.  The contaminating effect of the topic
     by participant interaction is reduced by averaging the sixteen
     estimates of E-C that are available, one for each 2x2 latin
     square.  This is analogous to averaging replicate measurements of
     a single quantity in order to reduce the measurement uncertainty.


  2. Augmentation

     The design for a given site can be augmented in two ways:

       1. Participants can be added by in groups of 4 using the design
          for P1-4 (above).

       2. Systems can be added by repeating the 8x8 design with at
          least one new system. 

     Topics cannot be added/subtracted individually for each site. 

     All augmentations other than the two listed above, however interesting, 
     are outside the scope of this design. If sites plan such adjunct 
     experiments, they are encouraged to design them for maximal synergy 
     with the track design.

 3. Analysis

     Up to each group, but all are strongly encouraged to take advantage
     of the experimental design and undertake:

	1. exploratory data analysis

	   to examine the patterns of correlation, interaction, etc.
	   involving the major factors. Some example plots for the TREC-6
	   interactive data (recall or precision by searcher or topic)
	   are available on the Interactive Track web site at
	   www-nlpir.nist.gov/~over/t7i under "Interactive Track History".
	   
	2. analysis of variance (ANOVA), where appropriate,

           to estimate the separate contributions of searcher, topic and 
	   system as a first step in understanding why the results of one 
	   search are different from those of another.
Last updated:

Date created: Monday, 31-Jul-00
National Institute of Standards and Technology Home
For information about this webpage contact [email protected]