System Summary and Timing Organization Name: University of Toronto List of Run ID's: uofto1 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 24 - Controlled Vocabulary? : no - Stemming Algorithm: no - Morphological Analysis: no - Term Weighting: no - Phrase Discovery? : yes - Method Used (statistical, syntactic, other): tree - Syntactic Parsing? : no - Word Sense Disambiguation? : no - Heuristic Associations (including short definition)? : no - Spelling Checking (with manual correction)? : no - Spelling Correction? : no - Proper Noun Identification Algorithm? : no - Tokenizer? : no - Patterns which are tokenized: no - Manually-Indexed Terms? : no - Other Techniques for building Data Structures: tree Statistics on Data Structures built from TREC Text - Inverted index - Run ID : uofto1 - Total Storage (in MB): 900 - Total Computer Time to Build (in hours): 70 - Automatic Process? (If not, number of manual hours): yes - Use of Term Positions? : no - Only Single Terms Used? : no - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Use of Manual Labor - Special Routing Structures - Other Data Structures built from TREC text Query construction Interactive Queries - Initial Query Built Automatically or Manually: manual - Type of Person doing Interaction - Domain Expert: no - System Expert: no - Average Time to do Complete Interaction - CPU Time (Total CPU Seconds for all Iterations): 20 seconds/query - Clock Time from Initial Construction of Query to Completion of Final Query (in minutes): no final query for searchers - Average Number of Iterations: 6 - Average Number of Documents Examined per Iteration: 5 - Minimum Number of Iterations: 2 - Maximum Number of Iterations: 16 - What Determines the End of an Iteration: start of another query - Methods used in Interaction - Automatic Term Reweighting from Relevant Documents? : no - Automatic Query Expansion from Relevant Documents? : no - All Terms in Relevant Documents added: no - Only Top X Terms Added (what is X): no - User Selected Terms Added: yes - Manual Methods - Using Individual Judgment (No Set Algorithm)? : yes - Following a Given Algorithm (Brief Description)? : no Searching Search Times - Run ID : uofto1 - Computer Time to Search (Average per Query, in CPU seconds): 20 - Component Times : search 8 files Machine Searching Methods - Vector Space Model? : no - Probabilistic Model? : no - Cluster Searching? : no - N-gram Matching? : no - Boolean Matching? : yes - Fuzzy Logic? : no - Free Text Scanning? : no - Neural Networks? : no - Conceptual Graph Matching? : no Machine Information - Machine Type for TREC Experiment: 2 SUN SPARC 20/50 - Was the Machine Dedicated or Shared: 1 dedicated and 1 shared - Amount of Hard Disk Storage (in MB): 4500 MB (dedicated) - Amount of RAM (in MB): 32 (dedicated), and 128 (shared) - Clock Rate of CPU (in MHz): 50 System Comparisons - Amount of "Software Engineering" which went into the Development of the System: approximate 240 hours