Text REtrieval Conference (TREC)
|
Organization Name: NSA Speech Group | Run ID: nsasgsr1 |
Section 1.0 System Summary and Timing |
---|
Section 1.1 System Information |
Hardware Model Used for TREC Experiment:Sun Enterprise 5000 System Use:SHARED Total Amount of Hard Disk Storage:80 Gb Total Amount of RAM:1024 MB Clock Rate of CPU:275 MHz |
Section 1.2 System Comparisons |
Amount of developmental "Software Engineering":ALL List of features that are not present in the system, but would have been beneficial to have:automatic relevance feedback; Boolean logic; word sense disambiguation List of features that are present in the system, and impacted its performance, but are not detailed within this form: |
Section 2.0 Construction of Indices, Knowledge Bases, and Other Data Structures |
---|
Length of the stopword list:* words Type of Stemming:OTHER Controlled Vocabulary: Term weighting:YES
Phrase discovery:YES
Type of Spelling Correction:NONE Manually-Indexed Terms:NO Proper Noun Identification:YES Syntactic Parsing:NO Tokenizer:NO Word Sense Disambiguation:NO Other technique:YES Additional comments:Re. stop words: eliminated high-frequency words and non-topical parts of speech Re. phrase discovery: also inserted phrases by hand |
Section 3.0 Statistics on Data Structures Built from TREC Text |
---|
Section 3.1 First Data Structure |
Structure Type:OTHER DATA STRUCTURE Type of other data structure used:topic list Brief description of method using other data structure: Total storage used:0.004 Gb Total computer time to build:0.13 hours Automatic process:YES Manual hours required:hours Type of manual labor:NONE Term positions used:NO Only single terms used:NO Concepts (vs. single terms) represented:YES
Type of representation:multi-word unit Auxilary files used:YES
Additional comments:RAM required is ~7MB |
Section 3.2 Second Data Structure |
Structure Type:INVERTED INDEX Type of other data structure used: Brief description of method using other data structure: Total storage used:0.003 Gb Total computer time to build:14 secs hours Automatic process:YES Manual hours required:hours Type of manual labor:NONE Term positions used:YES Only single terms used:NO Concepts (vs. single terms) represented:YES
Type of representation:multi-word unit Auxilary files used:NO
Additional comments:Inverted index is of topic lists. Term position refers to rank of word in topic list. |
Section 3.3 Third Data Structure |
Structure Type:NONE Type of other data structure used: Brief description of method using other data structure: Total storage used:Gb Total computer time to build:hours Automatic process: Manual hours required:hours Type of manual labor:NONE Term positions used: Only single terms used: Concepts (vs. single terms) represented:
Type of representation: Auxilary files used:
Additional comments: |
Section 4.0 Data Built from Sources Other than the Input Text |
---|
File type:LEXICON Domain type:DOMAIN INDEPENDENT Total Storage:0.003 Gb Number of Concepts Represented:38,000 base concepts Type of representation:OTHER Automatic or Manual:MANUAL
Type of Manual Labor used:OTHER Additional comments:Lexicon included head word, POS, frequency and definitions. Manual modification included adding new head words and definitions. New words added automatically without definitions. |
File is:NONE Total Storage:Gb Number of Concepts Represented:concepts Type of representation:NONE Additional comments: |
Section 5.0 Computer Searching |
---|
Average computer time to search (per query): ~0.09 CPU seconds |
Times broken down by component(s): |
Section 5.1 Searching Methods |
Vector space model:NO Probabilistic model:YES Cluster searching:NO N-gram matching:NO Boolean matching:NO Fuzzy logic:NO Free text scanning: Neural networks:NO Conceptual graphic matching: Other:YES Additional comments:Look up query's topic words in linked list of message topic words. |
Section 5.2 Factors in Ranking |
Term frequency:YES Inverse document frequency:YES Other term weights:YES Semantic closeness:YES Position in document:NO Syntactic clues:NO Proximity of terms:NO Information theoretic weights: Document length:NO Percentage of query terms which match:YES N-gram frequency:NO Word specificity:NO Word sense frequency:NO Cluster distance:NO Other:YES Additional comments:Used both scores and ranks derived from semantic closeness. Queries took about 0.07 sec to build. |
Disclaimer: Contents of this online document are not necessarily the official views of, nor endorsed by the U.S. Government, the Department of Commerce, or NIST. |