System Summary and Timing Organization Name: The University of Kansas List of Run ID's: KUSG2, KUSG3 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 580 - Controlled Vocabulary?: no - Stemming Algorithm: no - Morphological Analysis: no - Term Weighting: yes - Phrase Discovery?: - Heuristic Associations (including short definition)?: yes - Tokenizer?: Statistics on Data Structures built from TREC Text - Inverted index - Run ID: KUSG2, KUSG3 - Total Storage (in MB): 1020 - Total Computer Time to Build (in hours): 5 hours - Automatic Process? (If not, number of manual hours): yes - Use of Term Positions?: no - Only Single Terms Used?: yes - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Run ID: KUSG2, KUSG3 - Total Storage (in MB): 316 - Total Computer Time to Build (in hours): 129 hours - Automatic Process? (If not, number of manual hours): yes - Use of Manual Labor - Number of Concepts Represented: 42944 - Type of Representation: similarity matrix - Auxiliary Files Needed: none - Special Routing Structures - Other Data Structures built from TREC text Query construction Automatically Built Queries (Ad-Hoc) - Topic Fields Used: ap: HEAD, TEXT; fr: TEXT; wsj: LP, TEXT; ziff: TITLE, TEXT; cr: TEXT; fr: TEXT; ft: HEADLINE, TEXT - Average Computer Time to Build Query (in cpu seconds): 14 - Method used in Query Construction - Term Weighting (weights based on terms in topics)?: yes - Phrase Extraction from Topics?: no - Syntactic Parsing of Topics?: no - Word Sense Disambiguation?: no - Proper Noun Identification Algorithm?: no - Tokenizer?: no - Heuristic Associations to Add Terms?: yes - Expansion of Queries using Previously-Constructed Data Structure?: yes - Structure Used: similarity matrix - Automatic Addition of Boolean Connectors or Proximity Operators?: no Searching Search Times - Run ID: KUSG2, KUSG3 - Computer Time to Search (Average per Query, in CPU seconds): 26 - Component Times: query expansion 4; document retrieval 22 Factors in Ranking - Term Frequency?: yes - Inverse Document Frequency?: yes - Other Term Weights?: yes - Semantic Closeness?: no - Position in Document?: no - Syntactic Clues?: no - Proximity of Terms?: no - Information Theoretic Weights?: no - Document Length?: yes - Percentage of Query Terms which match?: yes - N-gram Frequency?: no - Word Specificity?: no - Word Sense Frequency?: no - Cluster Distance?: no - Other: Term similarity between original term and terms added from similarity matrix by automatic expansion Machine Information - Machine Type for TREC Experiment: Sun SPARCcenter 2000 - Was the Machine Dedicated or Shared: Shared - Amount of Hard Disk Storage (in MB): 9 GB - Amount of RAM (in MB): 512 - Clock Rate of CPU (in MHz): 67 System Comparisons - Amount of "Software Engineering" which went into the Development of the System: modest - Given appropriate resources - Could your system run faster?: yes - By how much (estimate)?: 20% - Features the System is Missing that would be beneficial: disambiguation, browser for viewing term similarity matrix Significant Areas of System - Brief Description of features in your system which you feel impact the system and are not answered by above questions: automatic calculation of term similarity based on the contexts in the corpus in which the terms instances appear.