Text REtrieval Conference (TREC)
|
Organization Name: University of Wales Bangor | Run ID: uwbqitekat03 |
Section 1.0 System Summary and Timing |
---|
Section 1.1 System Information |
Hardware Model Used for TREC Experiment:PC Cluster System Use:SHARED Total Amount of Hard Disk Storage:4 Gb Total Amount of RAM:1024 MB Clock Rate of CPU:500*8 MHz |
Section 1.2 System Comparisons |
Amount of developmental "Software Engineering":SOME List of features that are not present in the system, but would have been beneficial to have:Synonym expansion, word sense disambiguation, inference on knowledge based relations List of features that are present in the system, and impacted its performance, but are not detailed within this form:Distributed knowledge base. PPM named entity extraction. Automated external answer corroboration. |
Section 2.0 Construction of Indices, Knowledge Bases, and Other Data Structures |
---|
Length of the stopword list:250 words Type of Stemming:NONE Controlled Vocabulary:YES Term weighting:NO
Phrase discovery:YES
Type of Spelling Correction:NONE Manually-Indexed Terms:YES Proper Noun Identification:YES Syntactic Parsing:YES Tokenizer:YES Word Sense Disambiguation:NO Other technique:NO Additional comments:Hybrid approach adopting Rule based, Statistical and PPM methods in a cascading finite state architecture. |
Section 3.0 Statistics on Data Structures Built from TREC Text |
---|
Section 3.1 First Data Structure |
Structure Type:KNOWLEDGE BASE Type of other data structure used: Brief description of method using other data structure: Total storage used:2 Gb Total computer time to build:72 hours Automatic process:YES Manual hours required:hours Type of manual labor:NONE Term positions used:NO Only single terms used:NO Concepts (vs. single terms) represented:YES
Type of representation:logical tuple relation Auxilary files used:NO
Additional comments:Question oriented 'knows' and 'knows about' relations specifying high level abstractions of knowledge. Used to adopt an agant based approach. |
Section 3.2 Second Data Structure |
Structure Type: Type of other data structure used: Brief description of method using other data structure: Total storage used:Gb Total computer time to build:hours Automatic process: Manual hours required:hours Type of manual labor:NONE Term positions used: Only single terms used: Concepts (vs. single terms) represented:
Type of representation: Auxilary files used:
Additional comments: |
Section 3.3 Third Data Structure |
Structure Type: Type of other data structure used: Brief description of method using other data structure: Total storage used:Gb Total computer time to build:hours Automatic process: Manual hours required:hours Type of manual labor:NONE Term positions used: Only single terms used: Concepts (vs. single terms) represented:
Type of representation: Auxilary files used:
Additional comments: |
Section 4.0 Data Built from Sources Other than the Input Text |
---|
File type:OTHER Domain type:DOMAIN INDEPENDENT Total Storage:0.2 Gb Number of Concepts Represented:1000 concepts Type of representation:RULES Automatic or Manual:MANUAL
Type of Manual Labor used:MOSTLY MANUALLY BUILT USING SPECIAL INTERFACES Additional comments:Syntactic and Semantic rule based systems combined with regular expression, using abck substitution of named entities to auto-generate docuemnt specific rules. External text data used for PPM training. |
File is:NONE Total Storage:Gb Number of Concepts Represented:concepts Type of representation:NONE Additional comments: |
Section 5.0 Computer Searching |
---|
Average computer time to search (per query): <1 CPU seconds |
Times broken down by component(s): |
Section 5.1 Searching Methods |
Vector space model: Probabilistic model: Cluster searching: N-gram matching: Boolean matching: Fuzzy logic: Free text scanning: Neural networks: Conceptual graphic matching: Other:YES Additional comments:Grid based cluster of Agents in direct communication, able to query for available knowledge (domain and context) and route questions to appropriate knowledge base. |
Section 5.2 Factors in Ranking |
Term frequency:YES Inverse document frequency:NO Other term weights:YES Semantic closeness:NO Position in document:NO Syntactic clues:YES Proximity of terms:NO Information theoretic weights:NO Document length:NO Percentage of query terms which match:YES N-gram frequency:NO Word specificity: Word sense frequency: Cluster distance: Other: Additional comments: |
Section 6.0 Query Construction |
---|
Section 6.1 Automatically Built Queries for Ad-hoc Tasks |
Topic fields used: Average computer time to build query CPU seconds Term weighting (weights based on terms in topics): Phrase extraction from topics: Syntactic parsing of topics: Word sense disambiguation: Proper noun identification algorithm: Tokenizer: Expansion of queries using previously constructed data structures: Automatic addition of: |
Section 6.2 Manually Constructed Queries for Ad-hoc Tasks |
Topic fields used: Average time to build query? minutes Type of query builder: Tool used to build query: Method used in intial query construction? Total CPU time for all iterations: seconds Clock time from initial construction of query to completion of final query: minutes Average number of iterations: Average number of documents examined per iteration: Minimum number of iterations: Maximum number of iterations: The end of an iteration is determined by: Automatic term reweighting from relevant documents: Automatic query expansion from relevant documents: Other automatic methods: Manual methods used: |
Disclaimer: Contents of this online document are not necessarily the official views of, nor endorsed by the U.S. Government, the Department of Commerce, or NIST. |