Text REtrieval Conference (TREC)
System Description

Organization Name: University of Maryland (Filtering Track) Run ID: umr-lsi
Section 1.0 System Summary and Timing
Section 1.1 System Information
Hardware Model Used for TREC Experiment: Intel x86 (P2-300)
System Use: DEDICATED
Total Amount of Hard Disk Storage: 36 Gb
Total Amount of RAM: 512 MB
Clock Rate of CPU: 300 MHz
Section 1.2 System Comparisons
Amount of developmental "Software Engineering": SOME
List of features that are not present in the system, but would have been beneficial to have:
List of features that are present in the system, and impacted its performance, but are not detailed within this form:
Section 2.0 Construction of Indices, Knowledge Bases, and Other Data Structures
Length of the stopword list: 571 words
Type of Stemming: SMART
Controlled Vocabulary: NO
Term weighting: YES
  • Additional Comments on term weighting: log-tf, idf, pivoted unique-term length normalization
Phrase discovery: NO
  • Kind of phrase:
  • Method used: OTHER
Type of Spelling Correction: NONE
Manually-Indexed Terms: NO
Proper Noun Identification: NO
Syntactic Parsing: NO
Tokenizer: NO
Word Sense Disambiguation: NO
Other technique: YES
Additional comments: Builds a latent semantic index from the collection of routing queries. Initial routing queries built from a query-zone computed from the topic statements (title and description only).
Section 3.0 Statistics on Data Structures Built from TREC Text
Section 3.1 First Data Structure
Structure Type: INVERTED INDEX
Type of other data structure used:
Brief description of method using other data structure:
Total storage used: 1.8 Gb
Total computer time to build: 0.73 hours
Automatic process: YES
Manual hours required: hours
Type of manual labor: NONE
Term positions used: NO
Only single terms used: YES
Concepts (vs. single terms) represented: NO
  • Number of concepts represented:
Type of representation:
Auxilary files used: YES
  • Type of auxilary files used: table of collection statistics for idf weighting
Additional comments: This includes not only plain-old inverted files but a lot of SMART baggage that comes along with them, such as non- inverted files, collection statistics, qrels databases, textloc databases, and the like. It also includes query collections: the regular filtering topics (used for query-zoning), and automated feedback queries used for routing and building the SVD. The proportion breakdown is roughly 60% non-inverted files, 35% inverted files, and 5% associated databases.
Section 3.2 Second Data Structure
Structure Type: OTHER DATA STRUCTURE
Type of other data structure used: sparse matrix and SVD
Brief description of method using other data structure:latent semantic indexing
Total storage used: 0.207 Gb
Total computer time to build: .046 hours
Automatic process: YES
Manual hours required: hours
Type of manual labor: NONE
Term positions used: NO
Only single terms used: YES
Concepts (vs. single terms) represented: NO
  • Number of concepts represented:
Type of representation:
Auxilary files used:
  • Type of auxilary files used:
Additional comments: This includes the sparse matrix representation of a collection of routing queries, and the sparse matrix representations of the three SVD matrices.
Section 3.3 Third Data Structure
Structure Type: NONE
Type of other data structure used:
Brief description of method using other data structure:
Total storage used: Gb
Total computer time to build: hours
Automatic process:
Manual hours required: hours
Type of manual labor: NONE
Term positions used:
Only single terms used:
Concepts (vs. single terms) represented:
  • Number of concepts represented:
Type of representation:
Auxilary files used:
  • Type of auxilary files used:
Additional comments:
Section 4.0 Data Built from Sources Other than the Input Text
Internally-built Auxiliary File

File type: NONE
Domain type: DOMAIN INDEPENDENT
Total Storage: Gb
Number of Concepts Represented: concepts
Type of representation: NONE
Automatic or Manual:
  • Total Time to Build: hours
  • Total Time to Modify (if already built): hours
Type of Manual Labor used: NONE
Additional comments:
Externally-built Auxiliary File

File is: NONE
Total Storage: Gb
Number of Concepts Represented: concepts
Type of representation: NONE
Additional comments:
Section 5.0 Computer Searching
Average computer time to search (per query): CPU seconds
Times broken down by component(s):
Section 5.1 Searching Methods
Vector space model:
Probabilistic model:
Cluster searching:
N-gram matching:
Boolean matching:
Fuzzy logic:
Free text scanning:
Neural networks:
Conceptual graphic matching:
Other:
Additional comments:
Section 5.2 Factors in Ranking
Term frequency:
Inverse document frequency:
Other term weights:
Semantic closeness:
Position in document:
Syntactic clues:
Proximity of terms:
Information theoretic weights:
Document length:
Percentage of query terms which match:
N-gram frequency:
Word specificity:
Word sense frequency:
Cluster distance:
Other:
Additional comments:
Send questions to trec@nist.gov

Disclaimer: Contents of this online document are not necessarily the official views of, nor endorsed by the U.S. Government, the Department of Commerce, or NIST.