System Summary and Timing
  Organization Name: University of California, San Diego
  List of Run ID's: sdmix1, sdmix2, sdmix3

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list: 572 
    - Stemming Algorithm:  SMART triestem             
    - Term Weighting: Term-count and log-normalized tf-idf 
    -  Phrase Discovery?:              
    -  Tokenizer?:              
    -  Other Techniques for building Data Structures: Latent Semantic 
             Indexing; Optimization of a ranking function based on relevance 

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID: sdmix1, sdmix2 
      - Total Storage (in MB): 438 
      - Total Computer Time to Build (in hours): 55  
      - Use of Term Positions?: no 
      - Only Single Terms Used?: yes 
    - Inverted index           
      - Run ID: sdmix3 
      - Total Storage (in MB): 372 
      - Total Computer Time to Build (in hours): 14.5  
      - Use of Term Positions?: no 
      - Only Single Terms Used?: yes 
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases            
      - Use of Manual Labor                  
    - Special Routing Structures           
    - Other Data Structures built from TREC text           
      - Run ID: sdmix1, sdmix2, sdmix3 
      - Type of Structure: LSI projection matrix
      - Total Storage (in MB): 82 
      - Total Computer Time to Build (in hours): 66 
      - Automatic Process? (If not, number of manual hours): yes 
      - Brief Description of Method: Used SVDPACK to generate first 300 singular                
values/vectors for all training documents in training set 
    - Other Data Structures built from TREC text           
      - Run ID: sdmix1, sdmix2, sdmix3 
      - Type of Structure: Weights on linear mixture of three experts
      - Total Storage (in MB): 0.000012 
      - Total Computer Time to Build (in hours): 48 
      - Automatic Process? (If not, number of manual hours): yes 
      - Brief Description of Method: Optimized rank-order statistic objective
function using relevance feedback and a linear model of combining two vector-
space experts and one LSI expert. 

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used: all or DESC only
    - Average Computer Time to Build Query (in cpu seconds): 0.2 
    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)?: yes 
      - Tokenizer?:                 
      - Expansion of Queries using Previously-Constructed Data Structure?: 
      - Other: ignored negated phrases; LSI projection to 300 dimensions

    Automatically Built Queries (Routing)

    - Topic Fields Used: all 
    - Average Computer Time to Build Query (in cpu seconds): 0.2 
    - Method used in Query Construction          
      - Terms Selected From            
        - All Training Documents: yes 
      - Term Weighting with Weights Based on terms in            
        - Topics: yes 
      - Phrase Extraction from            
      - Syntactic Parsing            
      - Word Sense Disambiguation using            
      - Proper Noun Identification Algorithm from            
      - Tokenizer             
      - Heuristic Associations to Add Terms from            
      - Expansion of Queries using Previously-Constructed Data Structure:
      - Automatic Addition of Boolean connectors or Proximity Operators using 
information from             
      - Other:  ignored negated phrases; LSI projection to 300 dimensions

  Searching

    Search Times

      - Run ID: sdmix1, sdmix2, sdmix3 
      - Computer Time to Search (Average per Query, in CPU seconds): 480

    Machine Searching Methods

      - Vector Space Model?: yes 
      - Other: Latent Semantic Indexing; linear mixture of experts 

    Factors in Ranking

      - Term Frequency?: yes 
      - Inverse Document Frequency?: yes 
      - Other Term Weights?: yes (boolean) 
      - Other: Relative weighting of three different experts based on relevance 

    Machine Information

    - Machine Type for TREC Experiment: Sun Sparc 10 
    - Was the Machine Dedicated or Shared: Shared 
    - Amount of Hard Disk Storage (in MB): 1360 
    - Amount of RAM (in MB): 83 
    - Clock Rate of CPU (in MHz): ??? 

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of the 
System: minimal 
    - Given appropriate resources            
      - Could your system run faster?: yes 
      - By how much (estimate)?: 2-10 times 
    - Features the System is Missing that would be beneficial: system is 
currently only experimental and not easily usable 

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the 
system and are not answered by above questions: Reported times are generally 
real time and not CPU.  Furthermore, many real time operations were slowed 
down by up to ten times due to network traffic and a slow disk.