System Summary and Timing Organization Name: CITRI, Royal Melbourne Institute of Technology List of Run ID's: citri1, citri2, citri-sp1, citri-sp2 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 601 - Controlled Vocabulary? : no - Stemming Algorithm: Lovins - Morphological Analysis: none - Term Weighting: standard - Phrase Discovery? : no - Syntactic Parsing? : no - Word Sense Disambiguation? : no - Heuristic Associations (including short definition)? : no - Spelling Checking (with manual correction)? : no - Spelling Correction? : no - Proper Noun Identification Algorithm? : no - Tokenizer? : words are alphanumeric strings - Manually-Indexed Terms? : no - Other Techniques for building Data Structures: no Statistics on Data Structures built from TREC Text - Inverted index - Run ID : citri1, citri2 - Total Storage (in MB): about 140 Mb - Total Computer Time to Build (in hours): 4 - Automatic Process? (If not, number of manual hours): yes - Use of Term Positions? : no - Only Single Terms Used? : yes - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Use of Manual Labor - Special Routing Structures - Other Data Structures built from TREC text Query construction Automatically Built Queries (Ad-Hoc) - Topic Fields Used: all (ie, description) - Average Computer Time to Build Query (in cpu seconds): 0 - Method used in Query Construction - Term Weighting (weights based on terms in topics)? : no - Phrase Extraction from Topics? :no - Syntactic Parsing of Topics? :no - Word Sense Disambiguation? :no - Proper Noun Identification Algorithm? :no - Tokenizer? : words are alphanumeric strings - Heuristic Associations to Add Terms? : no - Expansion of Queries using Previously-Constructed Data Structure? : no - Automatic Addition of Boolean Connectors or Proximity Operators? : no - Other: none Manually Constructed Queries (Ad-Hoc) - Topic Fields Used: description - Average Time to Build Query (in Minutes): q - Type of Query Builder - Domain Expert: no - Computer System Expert: yes - Tools used to Build Query - Word Frequency List? : no - Knowledge Base Browser? : no - Other Lexical Tools? : no - Method used in Query Construction - Term Weighting? : no - Boolean Connectors (AND, OR, NOT)? : no - Proximity Operators? : no - Addition of Terms not Included in Topic? : no - Other: none Searching Search Times - Run ID : citri1/2 - Computer Time to Search (Average per Query, in CPU seconds): less than 1 Machine Searching Methods - Vector Space Model? : yes Factors in Ranking - Term Frequency? : yes - Inverse Document Frequency? : yes - Other Term Weights? : no - Semantic Closeness? : no - Position in Document? : no - Syntactic Clues? : no - Proximity of Terms? :no - Information Theoretic Weights? :no - Document Length? : yes - Percentage of Query Terms which match? : no - N-gram Frequency? : no - Word Specificity? : no - Word Sense Frequency? :no - Cluster Distance? :no - Other: none Machine Information - Machine Type for TREC Experiment: Sparc 10 - Was the Machine Dedicated or Shared: shared - Amount of Hard Disk Storage (in MB): 20 Gb - Amount of RAM (in MB): 256 - Clock Rate of CPU (in MHz): 4 times 50 System Comparisons - Amount of "Software Engineering" which went into the Development of the System: some - Given appropriate resources - Could your system run faster? : not much - Features the System is Missing that would be beneficial: sophistication! query processing is fairly basic Significant Areas of System - Brief Description of features in your system which you feel impact the system and are not answered by above questions: Compression is used throughout, to save space and improve performance. Total database size including indexes and compressed data is about 750 Mb.