System Summary and Timing Organization Name: HNC Software Inc. List of Run ID's: HNC11, HNC21 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 375 - Stemming Algorithm: Lovins - Morphological Analysis: No - Term Weighting: Yes - Phrase Discovery? : - Kind of Phrase: Yes - Method Used (statistical, syntactic, other): statistical - Syntactic Parsing? : No - Word Sense Disambiguation? : No - Heuristic Associations (including short definition)? : No - Spelling Checking (with manual correction)? : No - Spelling Correction? : No - Proper Noun Identification Algorithm? : No - Tokenizer? : Yes - Patterns which are tokenized: No - Manually-Indexed Terms? : No Statistics on Data Structures built from TREC Text - Inverted index - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Use of Manual Labor - Other Data Structures built from TREC text - Run ID : HNC11 - Type of Structure: Word Context Vectors - Total Storage (in MB): 300 - Total Computer Time to Build (in hours): 130 - Automatic Process? (If not, number of manual hours): Yes - Other Data Structures built from TREC text - Run ID : HNC21 - Type of Structure: Word Context Vectors - Total Storage (in MB): 300 - Total Computer Time to Build (in hours): 130 - Automatic Process? (If not, number of manual hours): Yes Data Built from Sources Other than the Input Text - Internally-built Auxiliary File - Use of Manual Labor - Externally-built Auxiliary File Query construction Automatically Built Queries (Ad-Hoc) - Method used in Query Construction - Tokenizer? : - Expansion of Queries using Previously-Constructed Data Structure? : Automatically Built Queries (Routing) - Average Computer Time to Build Query (in cpu seconds): 10-20 - Method used in Query Construction - Terms Selected From - Only Documents with Relevance Judgments: Yes - Term Weighting with Weights Based on terms in - Phrase Extraction from - Syntactic Parsing - Word Sense Disambiguation using - Proper Noun Identification Algorithm from - Tokenizer - Heuristic Associations to Add Terms from - Expansion of Queries using Previously-Constructed Data Structure: - Automatic Addition of Boolean connectors or Proximity Operators using information from Manually Constructed Queries (Ad-Hoc) - Type of Query Builder - Tools used to Build Query - Knowledge Base Browser? : - Other Lexical Tools? : - Method used in Query Construction - Addition of Terms not Included in Topic? : Manually Constructed Queries (Routing) - Type of Query Builder - Tools used to Build Query - Knowledge Base Browser? : - Other Lexical Tools? : - Data Used for Building Query from - Method used in Query Construction - Addition of Terms not Included in Topic? : Interactive Queries - Type of Person doing Interaction - Average Time to do Complete Interaction - Methods used in Interaction - Automatic Query Expansion from Relevant Documents? : - Manual Methods Searching Search Times - Run ID : HNC11 - Computer Time to Search (Average per Query, in CPU seconds): 20 - Component Times : Context Vector dot product sorting Machine Searching Methods - Machine Searching Methods - Vector Space Model? : Yes Machine Information - Machine Type for TREC Experiment: Sun Sparc 10 - Was the Machine Dedicated or Shared: Shared - Amount of Hard Disk Storage (in MB): 2 GB - Amount of RAM (in MB): 512 - Clock Rate of CPU (in MHz): 45 System Comparisons - Amount of "Software Engineering" which went into the Development of the System: 3-4 years - Given appropriate resources - Could your system run faster? : Yes - By how much (estimate)? : 2 or 3 times on faster hardware Significant Areas of System - Brief Description of features in your system which you feel impact the system and are not answered by above questions: Routing method uses an LVQ (neural network) algorithm given the Context Vectors of judged documents to create Query Context Vector(s)