System Summary and Timing Organization Name: Institute of Systems Science List of Run ID's: issah1 issah2 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 495 - Controlled Vocabulary? : NO - Stemming Algorithm: Porter - Morphological Analysis: NO - Term Weighting: YES - Phrase Discovery? : - Tokenizer? : YES - Patterns which are tokenized: WORDS - Other Techniques for building Data Structures: YES, sentence masking Statistics on Data Structures built from TREC Text - Inverted index - Run ID : issah1 issah2 - Total Storage (in MB): 1385 - Total Computer Time to Build (in hours): 17.5 - Automatic Process? (If not, number of manual hours): YES - Use of Term Positions? : NO - Only Single Terms Used? : YES - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Use of Manual Labor - Special Routing Structures - Other Data Structures built from TREC text Data Built from Sources Other than the Input Text - Internally-built Auxiliary File - Use of Manual Labor - Externally-built Auxiliary File Query construction Automatically Built Queries (Ad-Hoc) - Topic Fields Used: title, description - Average Computer Time to Build Query (in cpu seconds): 0.0867 - Method used in Query Construction - Term Weighting (weights based on terms in topics)? : YES - Tokenizer? : - Expansion of Queries using Previously-Constructed Data Structure? : Automatically Built Queries (Routing) - Method used in Query Construction - Terms Selected From - Term Weighting with Weights Based on terms in - Phrase Extraction from - Syntactic Parsing - Word Sense Disambiguation using - Proper Noun Identification Algorithm from - Tokenizer - Heuristic Associations to Add Terms from - Expansion of Queries using Previously-Constructed Data Structure: - Automatic Addition of Boolean connectors or Proximity Operators using information from Manually Constructed Queries (Ad-Hoc) - Topic Fields Used: title, description - Average Time to Build Query (in Minutes): 0.5 - Type of Query Builder - Computer System Expert: YES - Tools used to Build Query - Knowledge Base Browser? : - Other Lexical Tools? : - Method used in Query Construction - Term Weighting? : YES - Addition of Terms not Included in Topic? : - Other: PRUNING, to reduce search times Manually Constructed Queries (Routing) - Type of Query Builder - Tools used to Build Query - Knowledge Base Browser? : - Other Lexical Tools? : - Data Used for Building Query from - Method used in Query Construction - Addition of Terms not Included in Topic? : Interactive Queries - Type of Person doing Interaction - Average Time to do Complete Interaction - Methods used in Interaction - Automatic Query Expansion from Relevant Documents? : - Manual Methods Searching Search Times - Run ID : issah1 - Computer Time to Search (Average per Query, in CPU seconds): 78.6 - Search Times - Run ID : issah2 - Computer Time to Search (Average per Query, in CPU seconds): 32.4 Machine Searching Methods - Machine Searching Methods - Probabilistic Model? : YES Factors in Ranking - Factors in Ranking - Term Frequency? : YES - Inverse Document Frequency? : YES - Other Term Weights? : YES, based on query field - Proximity of Terms? : YES - Document Length? : Y Machine Information - Machine Type for TREC Experiment: Sun SPARC20 workstation - Was the Machine Dedicated or Shared: shared - Amount of Hard Disk Storage (in MB): 10,000 - Amount of RAM (in MB): 96 - Clock Rate of CPU (in MHz): 50 System Comparisons - Amount of "Software Engineering" which went into the Development of the System: 13 man MONTHS - Given appropriate resources - Could your system run faster? : NO - Features the System is Missing that would be beneficial: sophisticated tokenizers, thesaurus expansion Significant Areas of System - Brief Description of features in your system which you feel impact the system and are not answered by above questions: full multilingual architecture, maximum memory usage of 2MB per query, updates to corpus reflected within 5 mins