System Summary and Timing Organization Name: InTEXT Systems R and D Labs List of Run ID's: INTXM, INTXA Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Stemming Algorithm: - Term Weighting: Automatic - Phrase Discovery?: - Tokenizer?: Statistics on Data Structures built from TREC Text - Inverted index - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Use of Manual Labor - Special Routing Structures - Run ID: INTXA, INTXM - Type of Structure: Query index inverted on all words in all queries - Total Storage (in MB): 0.005 - Total Computer Time to Build (in hours): 0.0005 - Automatic Process? (If not, number of manual hours): YES - Brief Description of Method: All queries are indexed in memory. The documents are passed through all queries in one pass. - Other Data Structures built from TREC text Query construction Automatically Built Queries (Routing) - Average Computer Time to Build Query (in cpu seconds): 1200 - Method used in Query Construction - Terms Selected From - Only Documents with Relevance Judgments: Yes - Term Weighting with Weights Based on terms in - Documents with Relevance Judgments: No - Phrase Extraction from - Documents with Relevance Judgments: Yes - Syntactic Parsing - Documents with Relevance Judgments: Yes - Word Sense Disambiguation using - Proper Noun Identification Algorithm from - Documents with Relevance Judgments: Yes - Tokenizer - Heuristic Associations to Add Terms from - Expansion of Queries using Previously-Constructed Data Structure: - Automatic Addition of Boolean connectors or Proximity Operators using information from - Other: To create queries for INTXA, Documents from training data that had relevance judgements were passed through Queries written manually in INTXM. The top 15 relevant documents for each query were concatenated and the paragraphs containing search terms were extracted and run as a single document through InTEXT Precision s/w (See TREC-4) to generate weighted keywords and phrases. The INTXM queries were transformed into the INTXA filters by a set of rules to insert new alternatives, add new terms and delete original ones. Manually Constructed Queries (Routing) - Average Time to Build Query (in Minutes): 15 - Type of Query Builder - Computer System Expert: YES - Tools used to Build Query - Knowledge Base Browser?: - Other Lexical Tools?: See previous TREC queries - Data Used for Building Query from - Method used in Query Construction - Boolean Connectors (AND, OR, NOT)?: YES - Proximity Operators?: YES - Addition of Terms not Included in Topic?: - Other: This was the method used to build the INTXA query set Searching Search Times - Run ID: INTXA, INTXM - Computer Time to Search (Average per Query, in CPU seconds): 360 Machine Searching Methods - Boolean Matching?: YES - Free Text Scanning?: YES Factors in Ranking - Term Frequency?: YES - Other Term Weights?: Computed dynamically during running from document set to identify good discriminating terms - Position in Document?: YES - Proximity of Terms?: YES - Document Length?: YES - Percentage of Query Terms which match?: YES Machine Information - Machine Type for TREC Experiment: 486 - Was the Machine Dedicated or Shared: Dedicated - Amount of Hard Disk Storage (in MB): 2000 - Amount of RAM (in MB): 16 - Clock Rate of CPU (in MHz): 66 System Comparisons - Amount of "Software Engineering" which went into the Development of the System: 25 days to put commercial components together - Given appropriate resources - Could your system run faster?: YES - By how much (estimate)?: 2 Significant Areas of System - Brief Description of features in your system which you feel impact the system and are not answered by above questions: The documents were run through all 100 queries simultaneously. The mean time per query quoted is the total CPU time divided by 100.