System Summary and Timing Organization Name: Dublin City University List of Run ID's: DCU961, DCU962, DCU963, DCU964, DCU965, DCU966, DCU967, DCU968, DCU969, DCU96C, DCU96D Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: 410 - Controlled Vocabulary?: No - Stemming Algorithm: Porters - Morphological Analysis: No - Term Weighting: Yes - Phrase Discovery?: Yes - Method Used (statistical, syntactic, other): Statistical - Syntactic Parsing?: No - Word Sense Disambiguation?: No - Heuristic Associations (including short definition)?: No - Spelling Checking (with manual correction)?: No - Spelling Correction?: No - Proper Noun Identification Algorithm?: No - Tokenizer?: No - Patterns which are tokenized: No - Manually-Indexed Terms?: No - Other Techniques for building Data Structures: No Statistics on Data Structures built from TREC Text - Inverted index - Run ID: DCU968, DCU969, DCU96C, DCU96D - Total Storage (in MB): 98 - Total Computer Time to Build (in hours): 7 - Automatic Process? (If not, number of manual hours): Yes - Use of Term Positions?: No - Only Single Terms Used?: No - Inverted index - Run ID: DCU961, DCU962, DCU963, DCU964 - Total Storage (in MB): 412 - Total Computer Time to Build (in hours): 28 - Automatic Process? (If not, number of manual hours): Yes - Use of Term Positions?: No - Only Single Terms Used?: No - Inverted index - Run ID: DCU965, DCU966, DCU967 - Total Storage (in MB): 78 - Total Computer Time to Build (in hours): 6.5 - Automatic Process? (If not, number of manual hours): Yes - Use of Term Positions?: No - Only Single Terms Used?: No - Clusters - N-grams, Suffix arrays, Signature Files - Knowledge Bases - Use of Manual Labor - Special Routing Structures - Other Data Structures built from TREC text Query construction Automatically Built Queries (Ad-Hoc) - Topic Fields Used: Description - Average Computer Time to Build Query (in cpu seconds): < 1 - Method used in Query Construction - Term Weighting (weights based on terms in topics)?: Yes - Phrase Extraction from Topics?: Yes - Syntactic Parsing of Topics?: No - Word Sense Disambiguation?: No - Proper Noun Identification Algorithm?: No - Tokenizer?: No - Heuristic Associations to Add Terms?: No - Expansion of Queries using Previously-Constructed Data Structure?: No - Automatic Addition of Boolean Connectors or Proximity Operators?: No Manually Constructed Queries (Ad-Hoc) - Topic Fields Used: Title, Description, Narrative - Average Time to Build Query (in Minutes): < 1 - Type of Query Builder - Tools used to Build Query - Knowledge Base Browser?: - Other Lexical Tools?: - Method used in Query Construction - Addition of Terms not Included in Topic?: Searching Search Times - Run ID: DCU968, DCU969, DCU96C, DCU96D - Computer Time to Search (Average per Query, in CPU seconds): 15 - Search Times - Run ID: DCU961, DCU962 - Computer Time to Search (Average per Query, in CPU seconds): 11 - Search Times - Run ID: DCU963, DCU964 - Computer Time to Search (Average per Query, in CPU seconds): 4 - Search Times - Run ID: DCU965 - Computer Time to Search (Average per Query, in CPU seconds): 5 - Search Times - Run ID: DCU966 - Computer Time to Search (Average per Query, in CPU seconds): 3 - Search Times - Run ID: DCU967 - Computer Time to Search (Average per Query, in CPU seconds): 8 Machine Searching Methods - Vector Space Model?: Yes Factors in Ranking - Term Frequency?: Yes - Inverse Document Frequency?: Yes - Document Length?: Yes Machine Information - Machine Type for TREC Experiment: Sparc Station 5 - Was the Machine Dedicated or Shared: Dedicated - Amount of Hard Disk Storage (in MB): 6000 - Amount of RAM (in MB): 64 System Comparisons - Given appropriate resources - Could your system run faster?: Yes - By how much (estimate)?: 100+% - Features the System is Missing that would be beneficial: More sophisticated query document matching algorithms. Significant Areas of System - Brief Description of features in your system which you feel impact the system and are not answered by above questions: Query and Document Accumulator thresholding techniques coupled with the modified Inverted Index structure which allows effective and efficient retrieval.