System Summary and Timing Organization Name: Information Technology Institute List of Run ID's: itidp1, itidp2, itift1, iticn1, iticn2 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 431 - Controlled Vocabulary?: No - Stemming Algorithm: Porter - Term Weighting: tf.idf - Phrase Discovery?: - Tokenizer?: Statistics on Data Structures built from TREC Text - Inverted index - Run ID: itidp1, itidp2, itift1 - Total Storage (in MB): 30 - Total Computer Time to Build (in hours): 12 - Automatic Process? (If not, number of manual hours): yes - Use of Term Positions?: yes - Only Single Terms Used?: yes - Inverted index - Run ID: iticn1, iticn2 - Total Storage (in MB): 2000 - Total Computer Time to Build (in hours): 96 - Automatic Process? (If not, number of manual hours): yes - Use of Term Positions?: yes - Only Single Terms Used?: yes - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Use of Manual Labor - Special Routing Structures - Other Data Structures built from TREC text Query construction Automatically Built Queries (Ad-Hoc) - Topic Fields Used: no - Average Computer Time to Build Query (in cpu seconds): 1 - Method used in Query Construction - Tokenizer?: - Expansion of Queries using Previously-Constructed Data Structure?: Automatically Built Queries (Routing) - Topic Fields Used: no - Average Computer Time to Build Query (in cpu seconds): 3600 - Method used in Query Construction - Terms Selected From - Only Documents with Relevance Judgments: yes - Term Weighting with Weights Based on terms in - Documents with Relevance Judgments: yes - Phrase Extraction from - Syntactic Parsing - Word Sense Disambiguation using - Proper Noun Identification Algorithm from - Tokenizer - Heuristic Associations to Add Terms from - Expansion of Queries using Previously-Constructed Data Structure: - Automatic Addition of Boolean connectors or Proximity Operators using information from Manually Constructed Queries (Ad-Hoc) - Topic Fields Used: no - Average Time to Build Query (in Minutes): 10 - Type of Query Builder - Tools used to Build Query - Knowledge Base Browser?: - Other Lexical Tools?: - Method used in Query Construction - Boolean Connectors (AND, OR, NOT)?: yes - Proximity Operators?: yes - Addition of Terms not Included in Topic?: Searching Search Times - Run ID: iticn1, iticn2 - Computer Time to Search (Average per Query, in CPU seconds): 3600 - Component Times: Disk IO, 50%; Proximity Check, 50% Factors in Ranking - Term Frequency?: yes - Inverse Document Frequency?: yes - Proximity of Terms?: yes - Document Length?: yes Machine Information - Machine Type for TREC Experiment: Ultra Sparc - Was the Machine Dedicated or Shared: shared - Amount of Hard Disk Storage (in MB): 9000 - Amount of RAM (in MB): 64 - Clock Rate of CPU (in MHz): 167 System Comparisons - Amount of "Software Engineering" which went into the Development of the System: 50% - Given appropriate resources - Could your system run faster?: yes - By how much (estimate)?: 100 - Features the System is Missing that would be beneficial: Chinese phrase- based indexing