System Summary and Timing Organization Name: MultiText Project, Department of Computer Science, University of Waterloo List of Run ID's: uwgcr0 (routing) uwgcx0, uwgcx1 (ad-hoc) Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: no stopword list was used - Controlled Vocabulary?: no - Stemming Algorithm: none - Term Weighting: none - Phrase Discovery?: - Tokenizer?: yes - Patterns which are tokenized: SGML tags and sequences of alphanumeric characters ("words") were recognized and indexed. All words were mapped to lower case before indexing. Words with length greater than one consisting entirely of upper case alphabetic characters were doubly indexed in upper and lower case. Statistics on Data Structures built from TREC Text - Inverted index - Run ID: uwgcr0, uwgcx0, uwgcx1 - Total Storage (in MB): approximately 60% of text size - Total Computer Time to Build (in hours): approximately 30 - Automatic Process? (If not, number of manual hours): yes - Use of Term Positions?: yes - Only Single Terms Used?: yes - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Use of Manual Labor - Special Routing Structures - Other Data Structures built from TREC text Query construction Manually Constructed Queries (Ad-Hoc) - Topic Fields Used: title, desc, narr - Average Time to Build Query (in Minutes): 45 minutes - Type of Query Builder - Domain Expert: no - Computer System Expert: yes - Tools used to Build Query - Knowledge Base Browser?: - Other Lexical Tools?: - Method used in Query Construction - Term Weighting?: no - Boolean Connectors (AND, OR, NOT)?: yes - Proximity Operators?: yes - Addition of Terms not Included in Topic?: yes - Source of Terms: personal knowledge, interaction with data Manually Constructed Queries (Routing) - Topic Fields Used: all (title, dom, desc, narr, con, fac, def...) - Average Time to Build Query (in Minutes): 45 minutes - Type of Query Builder - Domain Expert: no - Computer System Expert: yes - Tools used to Build Query - Knowledge Base Browser?: - Other Lexical Tools?: - Data Used for Building Query from - Documents with Relevance Judgments: yes - Method used in Query Construction - Term Weighting?: no - Boolean Connectors (AND, OR, NOT)?: yes - Proximity Operators?: yes - Addition of Terms not Included in Topic?: yes - Other: GCL containment and ordering operators Searching Search Times - Run ID: uwgcr0 (routing) - Computer Time to Search (Average per Query, in CPU seconds): 14 seconds (elapsed) - Component Times: Each query was composed of several sub-queries, each of which was run separately. An average of 1.8 sub-queries per query for the routing run gives an average search time of 8 seconds per sub-query. - Run ID: uwgcx0, uwgcx1 (ad-hoc) - Computer Time to Search (Average per Query, in CPU seconds): 32 seconds (elapsed) - Component Times: Each query was composed of several sub-queries, each of which was run separately. An average of 1.8 sub-queries per query for the ad-hoc runs gives an average search time of 18 seconds per sub-query. Machine Searching Methods - Boolean Matching?: yes - Other: GCL query matching (see main paper) Factors in Ranking - Term Frequency?: yes - Proximity of Terms?: yes - Other: density of solutions within document Machine Information - Machine Type for TREC Experiment: DEC Alpha 2000/300 - Was the Machine Dedicated or Shared: Dedicated - Amount of Hard Disk Storage (in MB): 10GB - Amount of RAM (in MB): 64MB - Clock Rate of CPU (in MHz): 150MHz System Comparisons - Amount of "Software Engineering" which went into the Development of the System: Base retrieval system is a research prototype. Approximately two weeks of software development was specific to TREC-4. - Given appropriate resources - Could your system run faster?: yes - By how much (estimate)?: factor of two