Text REtrieval Conference (TREC)
System Description

Organization Name: NSA Speech Group Run ID: nsasgsr1
Section 1.0 System Summary and Timing
Section 1.1 System Information
Hardware Model Used for TREC Experiment: Sun Enterprise 5000
System Use: SHARED
Total Amount of Hard Disk Storage: 80 Gb
Total Amount of RAM: 1024 MB
Clock Rate of CPU: 275 MHz
Section 1.2 System Comparisons
Amount of developmental "Software Engineering": ALL
List of features that are not present in the system, but would have been beneficial to have: automatic relevance feedback; Boolean logic; word sense disambiguation
List of features that are present in the system, and impacted its performance, but are not detailed within this form:
Section 2.0 Construction of Indices, Knowledge Bases, and Other Data Structures
Length of the stopword list: * words
Type of Stemming: OTHER
Controlled Vocabulary:
Term weighting: YES
  • Additional Comments on term weighting: topically, see formulas
Phrase discovery: YES
  • Kind of phrase: multi-word unit
  • Method used: STATISTICAL
Type of Spelling Correction: NONE
Manually-Indexed Terms: NO
Proper Noun Identification: YES
Syntactic Parsing: NO
Tokenizer: NO
Word Sense Disambiguation: NO
Other technique: YES
Additional comments: Re. stop words: eliminated high-frequency words and non-topical parts of speech Re. phrase discovery: also inserted phrases by hand
Section 3.0 Statistics on Data Structures Built from TREC Text
Section 3.1 First Data Structure
Structure Type: OTHER DATA STRUCTURE
Type of other data structure used: topic list
Brief description of method using other data structure:
Total storage used: 0.004 Gb
Total computer time to build: 0.13 hours
Automatic process: YES
Manual hours required: hours
Type of manual labor: NONE
Term positions used: NO
Only single terms used: NO
Concepts (vs. single terms) represented: YES
  • Number of concepts represented: 2143
Type of representation: multi-word unit
Auxilary files used: YES
  • Type of auxilary files used: electronic dictionary
Additional comments: RAM required is ~7MB
Section 3.2 Second Data Structure
Structure Type: INVERTED INDEX
Type of other data structure used:
Brief description of method using other data structure:
Total storage used: 0.003 Gb
Total computer time to build: 14 secs hours
Automatic process: YES
Manual hours required: hours
Type of manual labor: NONE
Term positions used: YES
Only single terms used: NO
Concepts (vs. single terms) represented: YES
  • Number of concepts represented: 2143
Type of representation: multi-word unit
Auxilary files used: NO
  • Type of auxilary files used:
Additional comments: Inverted index is of topic lists. Term position refers to rank of word in topic list.
Section 3.3 Third Data Structure
Structure Type: NONE
Type of other data structure used:
Brief description of method using other data structure:
Total storage used: Gb
Total computer time to build: hours
Automatic process:
Manual hours required: hours
Type of manual labor: NONE
Term positions used:
Only single terms used:
Concepts (vs. single terms) represented:
  • Number of concepts represented:
Type of representation:
Auxilary files used:
  • Type of auxilary files used:
Additional comments:
Section 4.0 Data Built from Sources Other than the Input Text
Internally-built Auxiliary File

File type: LEXICON
Domain type: DOMAIN INDEPENDENT
Total Storage: 0.003 Gb
Number of Concepts Represented: 38,000 base concepts
Type of representation: OTHER
Automatic or Manual: MANUAL
  • Total Time to Build: hours
  • Total Time to Modify (if already built): 100 hours
Type of Manual Labor used: OTHER
Additional comments: Lexicon included head word, POS, frequency and definitions. Manual modification included adding new head words and definitions. New words added automatically without definitions.
Externally-built Auxiliary File

File is: NONE
Total Storage: Gb
Number of Concepts Represented: concepts
Type of representation: NONE
Additional comments:
Section 5.0 Computer Searching
Average computer time to search (per query): ~0.09 CPU seconds
Times broken down by component(s):
Section 5.1 Searching Methods
Vector space model: NO
Probabilistic model: YES
Cluster searching: NO
N-gram matching: NO
Boolean matching: NO
Fuzzy logic: NO
Free text scanning:
Neural networks: NO
Conceptual graphic matching:
Other: YES
Additional comments: Look up query's topic words in linked list of message topic words.
Section 5.2 Factors in Ranking
Term frequency: YES
Inverse document frequency: YES
Other term weights: YES
Semantic closeness: YES
Position in document: NO
Syntactic clues: NO
Proximity of terms: NO
Information theoretic weights:
Document length: NO
Percentage of query terms which match: YES
N-gram frequency: NO
Word specificity: NO
Word sense frequency: NO
Cluster distance: NO
Other: YES
Additional comments: Used both scores and ranks derived from semantic closeness. Queries took about 0.07 sec to build.
Send questions to trec@nist.gov

Disclaimer: Contents of this online document are not necessarily the official views of, nor endorsed by the U.S. Government, the Department of Commerce, or NIST.