System Summary and Timing
  Organization Name: ETH Zurich, Switzerland
  List of Run ID's: ETHI01 (interactive )

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list:  571 
    - Controlled Vocabulary? : no 
    - Stemming Algorithm: suffix stripping (Porter, 1980)      
      - Morphological Analysis:  no 
    - Term Weighting:  (so-called BM25(2.0,0.0,infty,0.75)  
    -  Phrase Discovery? : no    
    -  Syntactic Parsing? :  no 
    -  Word Sense Disambiguation? : no 
    -  Heuristic Associations (including short definition)? : no 
    -  Spelling Checking (with manual correction)? :  no 
    -  Spelling Correction? :  no 
    -  Proper Noun Identification Algorithm? :  no 
    -  Tokenizer? : yes     
      - Patterns which are tokenized: words 
    -  Manually-Indexed Terms? :  no 
    -  Other Techniques for building Data Structures:  no 

    Statistics on Data Structures built from TREC Text

    - Inverted index   
      - Run ID : ETHI01 
      - Total Storage (in MB): 2000 
      - Total Computer Time to Build (in hours): 240 for inserting 
        all documents into the system  
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions? : no 
      - Only Single Terms Used? :  yes
    - Clusters   
      - Run ID : none 
    - N-grams, Suffix arrays, Signature Files   
      - Run ID : none 
    - Knowledge Bases   
      - Run ID : none 
      - Use of Manual Labor  
    - Special Routing Structures   
      - Run ID : none 
    - Other Data Structures built from TREC text   
      - Run ID :  ETHI01 
      - Type of Structure: non-inverted files 
      - Total Storage (in MB): 1500 
      - Total Computer Time to Build (in hours): see above * 
      - Automatic Process? (If not, number of manual hours):  yes 
      - Brief Description of Method: for fast processing of relevance 
        feedback, system uses non-inverted index for updates.   
    - Other Data Structures built from TREC text   
      - Run ID :  ETHI01 
      - Type of Structure: Document Info 


      - Total Storage (in MB): 1100 
      - Total Computer Time to Build (in hours):  see above * 
      - Automatic Process? (If not, number of manual hours):  yes 
      - Brief Description of Method: Titles of documents to show in 
        ranked list    
    - Other Data Structures built from TREC text   
      - Run ID :  ETHI01 
      - Type of Structure: Feature Numbering 
      - Total Storage (in MB): 14 
      - Total Computer Time to Build (in hours):  see above * 
      - Automatic Process? (If not, number of manual hours):  yes 
      - Brief Description of Method: Features are mapped to numbers    
    - Other Data Structures built from TREC text    
      - Run ID :   ETHI01 
      - Type of Structure: Hidden Markov Models 
      - Total Storage (in MB): 0.0005 
      - Total Computer Time to Build (in hours):  2 
      - Automatic Process? (If not, number of manual hours):  yes 
      - Brief Description of Method: Hidden Markov Models used for 
        passage retrieval    

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used: all 
    - Average Computer Time to Build Query (in cpu seconds):  msec 
    - Method used in Query Construction  
      - Term Weighting (weights based on terms in topics)? : feature frequency 
      - Phrase Extraction from Topics? : no 
      - Syntactic Parsing of Topics? : no 
      - Word Sense Disambiguation? : no 
      - Proper Noun Identification Algorithm? : no 
      - Tokenizer? : yes      
        - Patterns which are Tokenized:  words 
      - Heuristic Associations to Add Terms? : no 
      - Expansion of Queries using Previously-Constructed Data Structure? : yes      
      - Automatic Addition of Boolean Connectors or Proximity Operators? : no 
      - Other: none 

    Automatically Built Queries (Routing)

    - Average Computer Time to Build Query (in cpu seconds):  
    - Method used in Query Construction  
      - Terms Selected From    
      - Term Weighting with Weights Based on terms in    
      - Phrase Extraction from    
      - Syntactic Parsing    
      - Word Sense Disambiguation using    
      - Proper Noun Identification Algorithm from    
      - Tokenizer    
        - Patterns which are tokenized (dates, phone numbers, common patterns, 
          etc): words and phrases 
      - Heuristic Associations to Add Terms from    
      - Expansion of Queries using Previously-Constructed Data Structure:      
      - Automatic Addition of Boolean connectors or Proximity Operators using 
        information from     


    Interactive Queries

    - Initial Query Built Automatically or Manually: Manually 
    - Type of Person doing Interaction    
      - Domain Expert: 2 out of 13 
      - System Expert: 2 out of 13 
    - Average Time to do Complete Interaction    
      - CPU Time (Total CPU Seconds for all Iterations): unknown
      - Clock Time from Initial Construction of Query to Completion of Final 
        Query (in minutes): 29 
    - Average Number of Iterations: ??? 
    - Minimum Number of Iterations: ??? 
    - Maximum Number of Iterations: ??? 
    - What Determines the End of an Iteration: user decision
    - Methods used in Interaction 
      - Automatic Term Reweighting from Relevant Documents? : no  
      - Automatic Query Expansion from Relevant Documents? : yes        
        - All Terms in Relevant Documents added: no 
        - Only Top X Terms Added (what is X): 20  
        - User Selected Terms Added:  user selected relevant passages
      - Other Automatic Methods:  none 
      - Manual Methods        
        - Using Individual Judgment (No Set Algorithm)? : yes
        - Following a Given Algorithm (Brief Description)? : no 

  Searching

    -  Search Times     
      - Run ID : ETHI01  
      - Computer Time to Search (Average per Query, in CPU seconds): 1--2 sec 
      - Component Times :     
    -  Search Times     
      - Run ID : ETHI01 
      - Component Times :     

    Machine Searching Methods

      - Vector Space Model? :  yes (basic method) 
      - Probabilistic Model? : yes (passage retrieval based on HMM) 
      - Cluster Searching? :  no 
      - N-gram Matching? :  no 
      - Boolean Matching? :  no 
      - Fuzzy Logic? :   no 
      - Free Text Scanning? :  no 
      - Neural Networks? :  no 
      - Conceptual Graph Matching? : no 
      - Other: no 

    Factors in Ranking

      - Term Frequency? :  yes 
      - Inverse Document Frequency? :  yes 
      - Other Term Weights? :  Query Term Frequency 
      - Semantic Closeness? :  no 
      - Position in Document? :  no 
      - Syntactic Clues? :  yes  
      - Proximity of Terms? : yes, for passage retrieval  
      - Information Theoretic Weights? : no 


      - Document Length? :  yes 
      - Percentage of Query Terms which match? : no 
      - N-gram Frequency? : no 
      - Word Specificity? :  no 
      - Word Sense Frequency? : no  
      - Cluster Distance? :  no
      - Other:  no 

    Machine Information

    - Machine Type for TREC Experiment: SPARC Center 1000
    - Was the Machine Dedicated or Shared: shared 
    - Amount of Hard Disk Storage (in MB): 10000 
    - Amount of RAM (in MB):  384 
    - Clock Rate of CPU (in MHz): 4x50 

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of 
      the System: research prototype developped not exclusively for TREC,
      redesign of retrieval component: 0.5 person years 
    - Given appropriate resources    
      - Could your system run faster? :  yes 
      - By how much (estimate)? : inserting and building the index: 10 times faster 
    - Features the System is Missing that would be beneficial:  

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the system 
      and are not answered by above questions: none