System Summary and Timing Organization Name: CRL/NMSU List of Run ID's: nmsu_1 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 30 - Controlled Vocabulary?: No - Stemming Algorithm: No - Morphological Analysis: No - Term Weighting: Yes - Phrase Discovery?: Yes - Kind of Phrase: bi-gram - Method Used (statistical, syntactic, other): statistical - Syntactic Parsing?: No - Word Sense Disambiguation?: No - Heuristic Associations (including short definition)?: No - Spelling Checking (with manual correction)?: No - Spelling Correction?: No - Proper Noun Identification Algorithm?: No - Tokenizer?: No - Manually-Indexed Terms?: No - Other Techniques for building Data Structures: Word Frequency Differences Statistics on Data Structures built from TREC Text - Inverted index - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Total Storage (in MB): .765 - Total Computer Time to Build (in hours): 24 - Use of Manual Labor - Special Routing Structures - Run ID: nmsu_1 - Type of Structure: word and bi-gram frequency lists - Total Storage (in MB): 0.765 - Total Computer Time to Build (in hours): 24 - Automatic Process? (If not, number of manual hours): yes - Brief Description of Method: statistical using relevance judgements - Other Data Structures built from TREC text Query construction Automatically Built Queries (Routing) - Method used in Query Construction - Terms Selected From - Only Documents with Relevance Judgments: yes - Term Weighting with Weights Based on terms in - Documents with Relevance Judgments: yes - Phrase Extraction from - Documents with Relevance Judgments: yes - Syntactic Parsing - Word Sense Disambiguation using - Proper Noun Identification Algorithm from - Tokenizer - Heuristic Associations to Add Terms from - Expansion of Queries using Previously-Constructed Data Structure: Searching Search Times - Run ID: nmsu_1 - Computer Time to Search (Average per Query, in CPU seconds): 10 elapsed seconds per document per query Factors in Ranking - Term Frequency?: yes Machine Information - Machine Type for TREC Experiment: SPARCstation 5 - Was the Machine Dedicated or Shared: shared - Amount of Hard Disk Storage (in MB): 2 - Amount of RAM (in MB): 64 System Comparisons - Amount of "Software Engineering" which went into the Development of the System: 2 man months - Given appropriate resources - Could your system run faster?: yes - By how much (estimate)?: 100