System Summary and Timing Organization Name: U. of California, Berkeley List of Run ID's: Brkly9, Brkly10, Brkly11, Brkly12 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 592 - Controlled Vocabulary? : NO - Stemming Algorithm: SMART STEMMER - Morphological Analysis: NO - Term Weighting: YES - Phrase Discovery? : NO - Syntactic Parsing? : NO - Word Sense Disambiguation? : NO - Heuristic Associations (including short definition)? : NO - Spelling Checking (with manual correction)? : NO - Spelling Correction? : NO - Proper Noun Identification Algorithm? : NO - Tokenizer? : NO - Manually-Indexed Terms? : NO - Other Techniques for building Data Structures: NONE Statistics on Data Structures built from TREC Text - Inverted index - Run ID : Brkly9, Brkly10 - Total Storage (in MB): 550 - Total Computer Time to Build (in hours): APPROX. 50 - Automatic Process? (If not, number of manual hours): YES - Use of Term Positions? : NO - Only Single Terms Used? : YES - Inverted index - Run ID : Brkly11, Brkly12 - Total Storage (in MB): 250 - Total Computer Time to Build (in hours): APPROX. 25 - Automatic Process? (If not, number of manual hours): YES - Use of Term Positions? : NO - Only Single Terms Used? : YES - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Use of Manual Labor - Special Routing Structures - Other Data Structures built from TREC text Query construction Automatically Built Queries (Ad-Hoc) - Topic Fields Used: DESCRIPTION - Average Computer Time to Build Query (in cpu seconds): APPROX. 3 - Method used in Query Construction - Term Weighting (weights based on terms in topics)? : YES - Tokenizer? : - Expansion of Queries using Previously-Constructed Data Structure? : Automatically Built Queries (Routing) - Topic Fields Used: DOM, TITLE, DESC, NARR, CON, DEF, NAT TIME - Average Computer Time to Build Query (in cpu seconds): APPROX. 50 - Method used in Query Construction - Terms Selected From - Topics: YES - Only Documents with Relevance Judgments: YES - Term Weighting with Weights Based on terms in - Phrase Extraction from - Syntactic Parsing - Word Sense Disambiguation using - Proper Noun Identification Algorithm from - Tokenizer - Heuristic Associations to Add Terms from - Expansion of Queries using Previously-Constructed Data Structure: - Automatic Addition of Boolean connectors or Proximity Operators using information from Manually Constructed Queries (Ad-Hoc) - Topic Fields Used: DESCRIPTION - Average Time to Build Query (in Minutes): 25 - Type of Query Builder - Tools used to Build Query - Knowledge Base Browser? : - Other Lexical Tools? : - Method used in Query Construction - Addition of Terms not Included in Topic? : - Source of Terms: Boolean lookups in parallel collections Searching Search Times - Run ID : Brkly9, Brkly10 - Computer Time to Search (Average per Query, in CPU seconds): APPROX. 30 Search Times - Run ID : Brkly11, Brkly12 - Computer Time to Search (Average per Query, in CPU seconds): APPROX. 50 Machine Searching Methods - Probabilistic Model? : YES Factors in Ranking - Term Frequency? : YES - Inverse Document Frequency? : YES Machine Information - Machine Type for TREC Experiment: SPARC 10, SPARC 20 - Was the Machine Dedicated or Shared: SHARED - Amount of Hard Disk Storage (in MB): 5,000 - Amount of RAM (in MB): 128 - Clock Rate of CPU (in MHz): 50 for SPARC 10, 90 for SPARC 20