System Summary and Timing
  Organization Name:  CITRI, Royal Melbourne Institute of Technology
  List of Run ID's:  citri1, citri2, citri-sp1, citri-sp2

  Construction of Indices, Knowledge Bases, and other Data Structures

    Methods Used to build Data Structures

    - Length (in words) of the stopword list: 601 
    - Controlled Vocabulary? : no  
    - Stemming Algorithm: Lovins            
      - Morphological Analysis: none 
    - Term Weighting: standard 
    -  Phrase Discovery? : no            
    -  Syntactic Parsing? :  no 
    -  Word Sense Disambiguation? : no 
    -  Heuristic Associations (including short definition)? : no  
    -  Spelling Checking (with manual correction)? : no  
    -  Spelling Correction? : no  
    -  Proper Noun Identification Algorithm? : no  
    -  Tokenizer? : words are alphanumeric strings            
    -  Manually-Indexed Terms? : no  
    -  Other Techniques for building Data Structures: no  

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID : citri1, citri2  
      - Total Storage (in MB): about 140 Mb  
      - Total Computer Time to Build (in hours): 4 
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions? : no 
      - Only Single Terms Used? : yes 
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases           
      - Use of Manual Labor                
    - Special Routing Structures           
    - Other Data Structures built from TREC text           

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used: all (ie, description) 
    - Average Computer Time to Build Query (in cpu seconds): 0 
    - Method used in Query Construction         
      - Term Weighting (weights based on terms in topics)? : no 
      - Phrase Extraction from Topics? :no 
      - Syntactic Parsing of Topics? :no 
      - Word Sense Disambiguation? :no  
      - Proper Noun Identification Algorithm? :no 
      - Tokenizer? : words are alphanumeric strings              
      - Heuristic Associations to Add Terms? : no 
      - Expansion of Queries using Previously-Constructed Data Structure? : no              
      - Automatic Addition of Boolean Connectors or Proximity Operators? : no 
      - Other: none  



    Manually Constructed Queries (Ad-Hoc)

    - Topic Fields Used: description  
    - Average Time to Build Query (in Minutes): q 
    - Type of Query Builder          
      - Domain Expert: no  
      - Computer System Expert: yes 
    - Tools used to Build Query          
      - Word Frequency List? : no 
      - Knowledge Base Browser? : no              
      - Other Lexical Tools? : no              
    - Method used in Query Construction         
      - Term Weighting? : no  
      - Boolean Connectors (AND, OR, NOT)? : no  
      - Proximity Operators? : no 
      - Addition of Terms not Included in Topic? : no              
      - Other: none  

  Searching

    Search Times

      - Run ID : citri1/2  
      - Computer Time to Search (Average per Query, in CPU seconds): less than 1 

    Machine Searching Methods

      - Vector Space Model? : yes 

    Factors in Ranking

      - Term Frequency? : yes 
      - Inverse Document Frequency? : yes 
      - Other Term Weights? : no 
      - Semantic Closeness? : no 
      - Position in Document? : no 
      - Syntactic Clues? : no 
      - Proximity of Terms? :no  
      - Information Theoretic Weights? :no  
      - Document Length? : yes  
      - Percentage of Query Terms which match? : no  
      - N-gram Frequency? : no 
      - Word Specificity? : no  
      - Word Sense Frequency? :no   
      - Cluster Distance? :no   
      - Other: none  

    Machine Information

    - Machine Type for TREC Experiment: Sparc 10 
    - Was the Machine Dedicated or Shared: shared 
    - Amount of Hard Disk Storage (in MB): 20 Gb 
    - Amount of RAM (in MB): 256 
    - Clock Rate of CPU (in MHz): 4 times 50 

    System Comparisons

    - Amount of "Software Engineering" which went into the Development 
      of the System: some  
    - Given appropriate resources           
      - Could your system run faster? : not much  
    - Features the System is Missing that would be beneficial: sophistication! 
      query processing is fairly basic 

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the 
      system and are not answered by above questions: Compression is used 
      throughout, to save space and improve performance.  Total database size 
      including indexes and compressed data is about 750 Mb.