System Summary and Timing
  Organization Name: U. of California, Berkeley
  List of Run ID's: Brkly13, Brkly14, Brkly15, Brkly16, Brkly17, Brkly18,  
BrklySP5, BrklySP6, BrklyCH1, BrklyCH20

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list: 572 (English)  444 (Chinese)
    - Stemming Algorithm: English: SMART system (version 10) stemmer  
      Spanish: The stemming algorithm removes standard endings from words 
      (gender, tense, etc.).  It also resolves spelling of irregular verbs to 
      the standard (infinitive) spelling            
    - Term Weighting: WEIGHTS DETERMINED FROM FREQUENCY STATISTICS BY 
LOGISTIC REGRESSION. 
    -  Phrase Discovery?: no             
    -  Syntactic Parsing?: no  
    -  Word Sense Disambiguation?: no 
    -  Heuristic Associations (including short definition)?: no  
    -  Proper Noun Identification Algorithm?: no  
    -  Tokenizer?: no             
    -  Manually-Indexed Terms?: /* Chinese? */ 

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID: Brkly13, Brkly14 
      - Total Storage (in MB): 156 
      - Total Computer Time to Build (in hours): 3 
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: yes 
    - Inverted index           
      - Run ID: Brkly15, Brkly16, Brkly17, Brkly18 
      - Total Storage (in MB): 621 
      - Total Computer Time to Build (in hours): 10 
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: yes 
    - Inverted index           
      - Run ID: BrklySP5, BrklySP6 
      - Total Storage (in MB): 129 
      - Total Computer Time to Build (in hours): 20 
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: yes 
    - Inverted index           
      - Run ID: BrklyCH1, BrklyCH2 
      - Total Storage (in MB): 170 
      - Total Computer Time to Build (in hours): 5 
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: yes 
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases            
      - Use of Manual Labor                  
    - Special Routing Structures           
    - Other Data Structures built from TREC text           

    Data Built from Sources Other than the Input Text

    -  Internally-built Auxiliary File            
      - Use of Manual Labor                   
    -  Externally-built Auxiliary File            
      - Type of File (Treebank, WordNet, etc.): dictionary (Chinese)  
      - Total Storage (in MB): 0.8 
      - Number of Concepts Represented: 90,000 
      - Type of Representation: list of words 

  Query construction

    Automatically Built Queries (Ad-Hoc)

      - Run ID: Brkly13 
    - Topic Fields Used: desc 
    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)?: yes 
      - Phrase Extraction from Topics?: no 
      - Syntactic Parsing of Topics?: no 
      - Word Sense Disambiguation?: no  
      - Proper Noun Identification Algorithm?: no 
      - Tokenizer?:  no                
      - Heuristic Associations to Add Terms?: no 
      - Expansion of Queries using Previously-Constructed Data Structure?: no              
      - Automatic Addition of Boolean Connectors or Proximity Operators?: no 

    Automatically Built Queries (Ad-Hoc)

      - Run ID: Brkly14 
    - Topic Fields Used: title, desc, narr 
    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)?: yes 
      - Phrase Extraction from Topics?: no 
      - Syntactic Parsing of Topics?: no 
      - Word Sense Disambiguation?: no  
      - Proper Noun Identification Algorithm?: no 
      - Tokenizer?:  no                
      - Heuristic Associations to Add Terms?: no 
      - Expansion of Queries using Previously-Constructed Data Structure?: no              
      - Automatic Addition of Boolean Connectors or Proximity Operators?: no 

    Automatically Built Queries (Ad-Hoc)

      - Run ID: BrklySP5 
    - Topic Fields Used: desc 
    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)?: yes  
      - Phrase Extraction from Topics?: no 
      - Syntactic Parsing of Topics?: no 
      - Word Sense Disambiguation?: no  
      - Proper Noun Identification Algorithm?: no 
      - Tokenizer?:  no                
      - Heuristic Associations to Add Terms?: no 
      - Expansion of Queries using Previously-Constructed Data Structure?: no              
      - Automatic Addition of Boolean Connectors or Proximity Operators?: no 

    Automatically Built Queries (Ad-Hoc)

      - Run ID: BrklyCH1 
    - Topic Fields Used: C-title, C-desc, and C-narr 
    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)?: yes 
      - Phrase Extraction from Topics?: no 
      - Syntactic Parsing of Topics?: no 
      - Word Sense Disambiguation?: no  
      - Proper Noun Identification Algorithm?: no 
      - Tokenizer?:  no                
      - Heuristic Associations to Add Terms?: no 
      - Expansion of Queries using Previously-Constructed Data Structure?: no              
      - Automatic Addition of Boolean Connectors or Proximity Operators?: no 

    Automatically Built Queries (Routing)

    - Topic Fields Used: dom, title, desc, narr, con, def, nat, time  
    - Average Computer Time to Build Query (in cpu seconds): 100 
    - Method used in Query Construction          
      - Terms Selected From            
        - Topics: yes 
        - Only Documents with Relevance Judgments: yes 
      - Term Weighting with Weights Based on terms in            
        - Topics: yes 
        - Documents with Relevance Judgments: yes 
      - Phrase Extraction from            
      - Syntactic Parsing            
      - Word Sense Disambiguation using            
      - Proper Noun Identification Algorithm from            
      - Tokenizer             
      - Heuristic Associations to Add Terms from            
      - Expansion of Queries using Previously-Constructed Data Structure:
      - Automatic Addition of Boolean connectors or Proximity Operators using 
information from             

    Manually Constructed Queries (Ad-Hoc)

      - Run ID: Brkly17, Brkly18 
    - Topic Fields Used: title, desc, narr 
    - Average Time to Build Query (in Minutes): 90 
    - Type of Query Builder          
      - Computer System Expert: yes 
    - Tools used to Build Query          
      - Knowledge Base Browser?:                 
      - Other Lexical Tools?:                
    - Method used in Query Construction          
      - Term Weighting?: yes 
      - Addition of Terms not Included in Topic?: yes              
        - Source of Terms: test collection & parallel collections 

    Manually Constructed Queries (Ad-Hoc)

      - Run ID: BrklyCH2 
    - Topic Fields Used: C-title, C-desc, C-narr 
    - Average Time to Build Query (in Minutes): 160 
    - Type of Query Builder          
      - Computer System Expert: yes 
    - Tools used to Build Query          
      - Word Frequency List?: yes 
      - Knowledge Base Browser?:                 
      - Other Lexical Tools?:                
    - Method used in Query Construction          
      - Term Weighting?: yes  
      - Addition of Terms not Included in Topic?: yes              
        - Source of Terms: test collection 

  Searching

    Search Times

      - Run ID: Brkly13, Brkly14 
      - Computer Time to Search (Average per Query, in CPU seconds): 9.382 
      - Component Times: User time: 7.331 seconds/query   
                         System time: 2.050 seconds/query 

    Search Times

      - Run ID: Brkly15 
      - Computer Time to Search (Average per Query, in CPU seconds): 39 
      - Component Times: User time: 38.1  seconds/query
                         System time: 0.9 seconds/query 

    Search Times

      - Run ID: Brkly16 
      - Computer Time to Search (Average per Query, in CPU seconds): 39 
      - Component Times: User time: 38.1 seconds/query
                         System time: 0.9 seconds/query 

    Search Times

      - Run ID: BrklyCH1, BrklyCH2 
      - Computer Time to Search (Average per Query, in CPU seconds): 20 
      - Component Times: User time: 18.56 seconds/query
                         System time: 1.43 seconds/query 

    Machine Searching Methods

      - Probabilistic Model?: yes 

    Factors in Ranking

      - Term Frequency?: yes 
      - Inverse Document Frequency?: yes 
      - Document Length?: yes 

    Machine Information

    - Machine Type for TREC Experiment: Ultrasparc 
    - Was the Machine Dedicated or Shared: shared 
    - Amount of Hard Disk Storage (in MB): 2,000 
    - Amount of RAM (in MB): 128 
    - Clock Rate of CPU (in MHz): 166