System Summary and Timing
  Organization Name: NMSU/CRL
  List of Run ID's: nmsuc1, nmsuc2, nmsuc3

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list: English 571, Spanish 379 
    - Controlled Vocabulary?:  No 
    - Stemming Algorithm:   Spanish implementation of Porter Algorithm            
      - Morphological Analysis:  No 
    - Term Weighting:  TF-IDF style
    -  Phrase Discovery?:  No            
      - Kind of Phrase:  Not applicable
      - Method Used (statistical, syntactic, other): Not applicable 
    -  Syntactic Parsing?: No 
    -  Word Sense Disambiguation?: Yes, for NMSUC3 corpus disambiguation was 
applied to translation equivalents
    -  Heuristic Associations (including short definition)?:  No
    -  Spelling Checking (with manual correction)?:  No
    -  Spelling Correction?:  Fuzzy term expansion may serve as spelling 
correction
    -  Proper Noun Identification Algorithm?:  No 
    -  Tokenizer?:  English/Spanish word tokenizer            
      - Patterns which are tokenized:  Words
    -  Manually-Indexed Terms?:  No 
    -  Other Techniques for building Data Structures:  None

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID:  nmsuc1, nmsuc2, nmsuc3 
      - Total Storage (in MB):  146.7
      - Total Computer Time to Build (in hours):  1.3 
      - Automatic Process? (If not, number of manual hours): Fully Automatic
      - Use of Term Positions?: No
      - Only Single Terms Used?: Yes 
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
      - Run ID:  nmsuc1, nmsuc2, nmsuc3
      - Total Storage (in MB): 42.8 
      - Total Computer Time to Build (in hours): 1.3 simultaneous with 
inverted index
      - Automatic Process? (If not, number of manual hours): Fully automatic 
      - Brief Description of Method: compressed bit vectors of document-term 
contents, but not actually used by system for the reported results
    - Knowledge Bases            
      - Use of Manual Labor                  
    - Special Routing Structures           
    - Other Data Structures built from TREC text           
      - Run ID:  nmsuc1, nmsuc2, nmsuc3
      - Type of Structure:  btree words table
      - Total Storage (in MB):  6 Mb 
      - Total Computer Time to Build (in hours):  1.3 simultaneous with 
inverted index
      - Automatic Process? (If not, number of manual hours):  Yes
      - Brief Description of Method: standard btree to word no mapping

    Data Built from Sources Other than the Input Text

    -  Internally-built Auxiliary File            
      - Use of Manual Labor                   
    -  Externally-built Auxiliary File            
      - Type of File (Treebank, WordNet, etc.):  Collins Bilingual Dictionary
      - Total Storage (in MB): 1.2
      - Number of Concepts Represented:  24,000
      - Type of Representation:  lexical entries 

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used:  nmsuc1 used S-DESC only, nmsuc2 and nmsuc3 used 
E-DESC only
    - Average Computer Time to Build Query (in cpu seconds): 1.2
    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)?:  Yes
      - Phrase Extraction from Topics?: No
      - Syntactic Parsing of Topics?: No
      - Word Sense Disambiguation?:  Yes
      - Proper Noun Identification Algorithm?: No
      - Tokenizer?:   Yes, words              
      - Heuristic Associations to Add Terms?: fuzzy expansion for nmsuc2 and 
nmsuc3 for terms not in bilingual dictionary
      - Expansion of Queries using Previously-Constructed Data Structure?: 
nmsuc2 and nmsuc3 used bilingual dictionary for translation              
        -  Structure Used:  Bilingual dictionary 
      - Automatic Addition of Boolean Connectors or Proximity Operators?: No
      - Other:  None

  Searching

    Search Times

      - Run ID:  nmsuc1 
      - Computer Time to Search (Average per Query, in CPU seconds): 0.08 on 
Ultrasparc 

    Machine Searching Methods

      - Vector Space Model?:  Yes 
      - Probabilistic Model?:  No
      - Cluster Searching?:  No
      - N-gram Matching?:  No
      - Boolean Matching?:  No
      - Fuzzy Logic?:   No
      - Free Text Scanning?:  No
      - Neural Networks?:  No
      - Conceptual Graph Matching?:  No
      - Other:  None

    Factors in Ranking

      - Term Frequency?:  Yes 
      - Inverse Document Frequency?:  Yes
      - Other Term Weights?:  No 
      - Semantic Closeness?:  No 
      - Position in Document?:  No 
      - Syntactic Clues?:  No 
      - Proximity of Terms?:  No 
      - Information Theoretic Weights?:  No 
      - Document Length?:  No
      - Percentage of Query Terms which match?:  No
      - N-gram Frequency?: No
      - Word Specificity?:  No
      - Word Sense Frequency?:  No
      - Cluster Distance?:  No
      - Other:  None

    Machine Information

    - Machine Type for TREC Experiment:  Sun Sparcstation 5 and SunUltra Sparc
    - Was the Machine Dedicated or Shared:  Shared
    - Amount of Hard Disk Storage (in MB):  4 Gb
    - Amount of RAM (in MB):  154
    - Clock Rate of CPU (in MHz):  190 (??) for Ultrasparc

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of the 
System:  1 person months
    - Given appropriate resources            
      - Could your system run faster?:  oh yeah
      - By how much (estimate)?:  500%
    - Features the System is Missing that would be beneficial: faster document
score sort algorithm, heavier reliance on system memory