System Summary and Timing
  Organization Name: IBM
  List of Run ID's: ibmgd1 ibmgd2 ibmge1 ibmge2

  Construction of Indices, Knowledge Bases, and other Data Structures

    Methods Used to build Data Structures

    - Length (in words) of the stopword list:  255
    - Controlled Vocabulary?:  no
    - Stemming Algorithm: none            
    -  Phrase Discovery?:            
    -  Tokenizer?:            

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID:  ibmgd1, ibmgd2, ibmge1, ibmge2
      - Total Storage (in MB):  3485
      - Total Computer Time to Build (in hours):  2
      - Automatic Process? (If not, number of manual hours):  yes
      - Use of Term Positions?: yes
      - Only Single Terms Used?:  yes
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases           
      - Use of Manual Labor                
    - Special Routing Structures           
    - Other Data Structures built from TREC text           

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used:  desc
    - Average Computer Time to Build Query (in cpu seconds):  0.0126
    - Method used in Query Construction         
      - Tokenizer?:              
        - Patterns which are Tokenized: Phrases common to many topics such as 
"To be relevant, a document must..." were removed.  The list of phrases to 
match was manually constructed from the desc fields of topics 51-250.  Also, 
each query term was automatically expanded into a list of terms using suffix 
expansion.  

      - Expansion of Queries using Previously-Constructed Data Structure?:              
      - Other:  The system automatically constructs additional query terms from
pairs of query words that occur within a window of five in the query.  

    Manually Constructed Queries (Ad-Hoc)

    - Topic Fields Used:  title, desc, narr
    - Average Time to Build Query (in Minutes): 4
    - Type of Query Builder          
      - Computer System Expert: yes
    - Tools used to Build Query          
      - Knowledge Base Browser?:              
      - Other Lexical Tools?:              
    - Method used in Query Construction         
      - Addition of Terms not Included in Topic?:              

  Searching

    Search Times

      - Run ID:  ibmgd1, ibmge1
      - Computer Time to Search (Average per Query, in CPU seconds): 400
    -  Search Times            
      - Run ID:  ibmgd2, ibmge2
      - Computer Time to Search (Average per Query, in CPU seconds): 1000

    Machine Searching Methods

      - Probabilistic Model?: yes  

    Factors in Ranking

      - Term Frequency?:  yes. Both in the document and in the collection as a 
whole.  
      - Proximity of Terms?:  yes (runs ibmgd1, ibmgd2 only.)
      - Document Length?:  yes.

    Machine Information

    - Machine Type for TREC Experiment:  IBM PowerPC RS/6000 42T
    - Was the Machine Dedicated or Shared:  Dedicated
    - Amount of Hard Disk Storage (in MB):  15028
    - Amount of RAM (in MB):  64
    - Clock Rate of CPU (in MHz):  120

    System Comparisons

    - Amount of "Software Engineering" which went into the Development of the 
System:  Several years all non-TREC specific. 
    - Given appropriate resources           
      - Could your system run faster?: yes 
      - By how much (estimate)?:  The system is I/O bound 
    - Features the System is Missing that would be beneficial: Ability to 
execute Boolean queries; Ability to handle phrases; Ability to handle fields 
within the document. 

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the 
system and are not answered by above questions:  The system includes Lexical 
Affinities - terms that are formed out of pairs of words occurring within a 
distance of 5 words from each other.  The system performs morphological 
expansion at run time instead of stemming at indexing time.