System Summary and Timing Organization Name: Australian National University List of Run ID's: anu5man4, anu5man6, anu5aut1, anu5aut2, anu5mrg0, anu5mrg1, anu5mrg7, anu5con0, anu5con1, anu5vlc2, anu5vlc3 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 0 - Controlled Vocabulary?: No - Stemming Algorithm: No, except in auto query gen. - Morphological Analysis: No, except in auto query gen. - Term Weighting: Optional - Phrase Discovery?: No, except in auto query gen. - Kind of Phrase: Various - Method Used (statistical, syntactic, other): Statistical - Syntactic Parsing?: No - Word Sense Disambiguation?: No, except by proximity - Heuristic Associations (including short definition)?: No, except in augmentation of manual queries - Spelling Checking (with manual correction)?: No - Spelling Correction?: No - Proper Noun Identification Algorithm?: No - Tokenizer?: - Manually-Indexed Terms?: No Statistics on Data Structures built from TREC Text - Inverted index - Run ID: anu5man4, anu5man6, anu5aut1, anu5aut2, anu5mrg0, anu5mrg1, anu5mrg7, anu5con0, anu5con1, anu5vlc2 - Total Storage (in MB): 2973, but size is exaggerated due to 128-way parallel file system. Raw text of CD2/4 comes out at 3649! - Total Computer Time to Build (in hours): 0.60 - Automatic Process? (If not, number of manual hours): Yes - Use of Term Positions?: Yes - Only Single Terms Used?: Yes - Clusters - Run ID: N/A - N-grams, Suffix arrays, Signature Files - Run ID: N/A - Knowledge Bases - Run ID: N/A - Use of Manual Labor - Special Routing Structures - Run ID: N/A - Other Data Structures built from TREC text - Run ID: N/A Query construction Automatically Built Queries (Ad-Hoc) - Topic Fields Used: short form for both Automatic - Average Computer Time to Build Query (in cpu seconds): 10 approx. - Method used in Query Construction - Term Weighting (weights based on terms in topics)?: Yes - Phrase Extraction from Topics?: Yes - Syntactic Parsing of Topics?: No - Word Sense Disambiguation?: No - Proper Noun Identification Algorithm?: No - Tokenizer?: No - Heuristic Associations to Add Terms?: No - Expansion of Queries using Previously-Constructed Data Structure?: No - Automatic Addition of Boolean Connectors or Proximity Operators?: Yes - Other: No Manually Constructed Queries (Ad-Hoc) - Topic Fields Used: Full topic - Average Time to Build Query (in Minutes): 20, (25 for anu5man6) - Type of Query Builder - Domain Expert: No - Computer System Expert: Yes - Tools used to Build Query - Word Frequency List?: No - Knowledge Base Browser?: No - Other Lexical Tools?: Term-Term Implication - Method used in Query Construction - Term Weighting?: No - Boolean Connectors (AND, OR, NOT)?: No - Proximity Operators?: Yes - Addition of Terms not Included in Topic?: Yes - Source of Terms: Query Builder's Head, Term-Term Implications Searching Search Times - Run ID: anu5man4 - Computer Time to Search (Average per Query, in CPU seconds): 39 (ELAPSED) - Component Times: For topic 253 (250 docs found) Term location: 36.89, relevance scoring: 6.31, ranking 1.13, total 43.20 Machine Searching Methods - Vector Space Model?: No - Probabilistic Model?: Yes, in automatic adhoc - Cluster Searching?: No - N-gram Matching?: No - Boolean Matching?: No - Fuzzy Logic?: No - Free Text Scanning?: Yes, only in anu5man6 - Neural Networks?: No - Conceptual Graph Matching?: No - Other: Distance Model, in manual adhoc Factors in Ranking - Term Frequency?: Yes, in automatic - Inverse Document Frequency?: Yes, in automatic - Other Term Weights?: Yes, Probability Differentials - Semantic Closeness?: No - Position in Document?: No - Syntactic Clues?: No - Proximity of Terms?: Yes - Information Theoretic Weights?: No - Document Length?: No - Percentage of Query Terms which match?: No - N-gram Frequency?: No - Word Specificity?: No - Word Sense Frequency?: No - Cluster Distance?: No - Other: Span Distance Machine Information - Machine Type for TREC Experiment: Fujitsu AP1000 - Was the Machine Dedicated or Shared: dedicated only during runs - Amount of Hard Disk Storage (in MB): 16 x 4000 = 64000 - Amount of RAM (in MB): 128 x 16 = 2048 - Clock Rate of CPU (in MHz): 128 CPUs, each 25 MHz System Comparisons - Given appropriate resources - Could your system run faster?: Yes - By how much (estimate)?: Factor of 125 - Features the System is Missing that would be beneficial: Phrase finding