System Summary and Timing Organization Name: NEC List of Run ID's: virtu3, virtu4 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 430 - Controlled Vocabulary? : Yes - Stemming Algorithm: Yes - Morphological Analysis: Yes - Term Weighting: Yes - Phrase Discovery? : No - Syntactic Parsing? : No - Word Sense Disambiguation? : No - Heuristic Associations (including short definition)? : No - Spelling Checking (with manual correction)? : No - Spelling Correction? : No - Proper Noun Identification Algorithm? : No - Tokenizer? : Yes - Patterns which are tokenized: common patterns - Manually-Indexed Terms? : No - Other Techniques for building Data Structures: No Statistics on Data Structures built from TREC Text - Inverted index - Run ID : virtu3, virtu4 - Total Storage (in MB): 3000 - Total Computer Time to Build (in hours): 200 - Automatic Process? (If not, number of manual hours): Yes - Use of Term Positions? : Yes - Only Single Terms Used? : Yes - Clusters - Run ID : No - N-grams, Suffix arrays, Signature Files - Run ID : No - Knowledge Bases - Run ID : No - Use of Manual Labor - Special Routing Structures - Run ID : No - Other Data Structures built from TREC text - Run ID : virtu3 - Type of Structure: word co-occurrency - Total Storage (in MB): 630 - Total Computer Time to Build (in hours): 120 - Automatic Process? (If not, number of manual hours): Yes - Brief Description of Method: Frequency of two words occuring in the same paragraph. Query construction Automatically Built Queries (Ad-Hoc) - Topic Fields Used: All - Average Computer Time to Build Query (in cpu seconds): 20 min per query - Method used in Query Construction - Term Weighting (weights based on terms in topics)? : Yes - Phrase Extraction from Topics? : Yes - Syntactic Parsing of Topics? : Yes - Word Sense Disambiguation? : No - Proper Noun Identification Algorithm? : Yes - Tokenizer? : Yes - Patterns which are Tokenized: part of noun phrase identification - Heuristic Associations to Add Terms? : No - Expansion of Queries using Previously-Constructed Data Structure? : Yes - Structure Used: thesaurus (WordNet) - Automatic Addition of Boolean Connectors or Proximity Operators? : No - Other: No Automatically Built Queries (Routing) - Topic Fields Used: All - Average Computer Time to Build Query (in cpu seconds): 30 min per query - Method used in Query Construction - Terms Selected From - Topics: All - All Training Documents: No - Only Documents with Relevance Judgments: No - Term Weighting with Weights Based on terms in - Topics: Yes - All Training Documents: No - Documents with Relevance Judgments: Yes - Phrase Extraction from - Topics: Yes - All Training Documents: No - Documents with Relevance Judgments: No - Syntactic Parsing - Topics: Yes - All Training Documents: No - Documents with Relevance Judgments: No - Word Sense Disambiguation using - Topics: No - All Training Documents: No - Documents with Relevance Judgments: No - Proper Noun Identification Algorithm from - Topics: Yes - All Training Documents: No - Documents with Relevance Judgments: No - Tokenizer - Patterns which are tokenized (dates, phone numbers, common patterns, etc): part of noun phrase identificaton - from Topics: Yes - from All Training Documents: No - from Documents with Relevance Judgments: No - Heuristic Associations to Add Terms from - Topics: No - All Training Documents: No - Documents with Relevance Judgments: No - Expansion of Queries using Previously-Constructed Data Structure: - Structure Used: word co-occurrency - Automatic Addition of Boolean connectors or Proximity Operators using information from - Topics: No - All Training Documents: No - Documents with Relevance Judgments: No Searching Search Times - Run ID : virtu3, virtu4 - Computer Time to Search (Average per Query, in CPU seconds): 1200 Machine Searching Methods - Vector Space Model? : Yes - Probabilistic Model? : No - Cluster Searching? : No - N-gram Matching? : No - Boolean Matching? : No - Fuzzy Logic? : No - Free Text Scanning? : No - Neural Networks? : No - Conceptual Graph Matching? : No - Other: No Factors in Ranking - Term Frequency? : Yes - Inverse Document Frequency? : No - Other Term Weights? : No - Semantic Closeness? : No - Position in Document? : No - Syntactic Clues? : No - Proximity of Terms? : No - Information Theoretic Weights? : No - Document Length? : Yes - Percentage of Query Terms which match? : No - N-gram Frequency? : No - Word Specificity? : No - Word Sense Frequency? : No - Cluster Distance? : No - Other: No Machine Information - Machine Type for TREC Experiment: sparc10 - Was the Machine Dedicated or Shared: shared - Amount of Hard Disk Storage (in MB): 10000 - Amount of RAM (in MB): 128 - Clock Rate of CPU (in MHz): 40 System Comparisons - Amount of "Software Engineering" which went into the Development of the System: Three people in two month - Given appropriate resources - Could your system run faster? : Yes - By how much (estimate)? : 50% - Features the System is Missing that would be beneficial: The combined use of thesaurus and word co-occurrence information