System Summary and Timing Organization Name: Xerox PARC List of Run ID's: xerox1 xerox2 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Stemming Algorithm: - Term Weighting: mixed: no weighting / LSI - Phrase Discovery?: - Kind of Phrase: two-word phrases - Method Used (statistical, syntactic, other): statistical - Syntactic Parsing?: no - Word Sense Disambiguation?: no - Heuristic Associations (including short definition)?: no - Spelling Checking (with manual correction)?: no - Spelling Correction?: no - Proper Noun Identification Algorithm?: no - Tokenizer?: - Manually-Indexed Terms?: no - Other Techniques for building Data Structures: none Statistics on Data Structures built from TREC Text - Inverted index - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Use of Manual Labor - Special Routing Structures - Run ID: xerox1 xerox2 - Type of Structure: lsi - Total Storage (in MB): 40 - Total Computer Time to Build (in hours): 5 - Automatic Process? (If not, number of manual hours): yes - Brief Description of Method: local LSI, one for each topic on 2000 chisquare selected terms - Other Data Structures built from TREC text Data Built from Sources Other than the Input Text - Internally-built Auxiliary File - Use of Manual Labor - Externally-built Auxiliary File Query construction Automatically Built Queries (Ad-Hoc) - Method used in Query Construction - Tokenizer?: - Expansion of Queries using Previously-Constructed Data Structure?: Automatically Built Queries (Routing) - Topic Fields Used: all fields - Average Computer Time to Build Query (in cpu seconds): less than 5 - Method used in Query Construction - Terms Selected From - Topics: yes - All Training Documents: yes - Only Documents with Relevance Judgments: yes - Term Weighting with Weights Based on terms in - Topics: no weights or lsi weights - All Training Documents: no weights or lsi weights - Documents with Relevance Judgments: no weights or lsi weights - Phrase Extraction from - Topics: yes - All Training Documents: yes - Documents with Relevance Judgments: yes - Syntactic Parsing - Topics: no - All Training Documents: no - Documents with Relevance Judgments: no - Word Sense Disambiguation using - Topics: no - All Training Documents: no - Documents with Relevance Judgments: no - Proper Noun Identification Algorithm from - Topics: no - All Training Documents: no - Documents with Relevance Judgments: no - Tokenizer - Heuristic Associations to Add Terms from - Topics: no - All Training Documents: no - Documents with Relevance Judgments: no - Expansion of Queries using Previously-Constructed Data Structure: - Automatic Addition of Boolean connectors or Proximity Operators using information from Searching Machine Searching Methods - Vector Space Model? : yes - Probabilistic Model? : yes - Neural Networks? : yes Factors in Ranking - Term Frequency? : yes - Inverse Document Frequency? : yes - Other Term Weights? : lsi - Document Length? : yes Machine Information System Comparisons - Amount of "Software Engineering" which went into the Development of the System: little - Given appropriate resources - Could your system run faster?: yes - By how much (estimate)?: factor of 10