System Summary and Timing
  Organization Name: APLab, SCILS, Rutgers
  List of Run ID's: rutscn20

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list:  no 
    - Controlled Vocabulary? :  no 
    - Stemming Algorithm:               
      - Morphological Analysis: no 
    - Term Weighting:  no 
    -  Phrase Discovery? :              
      - Kind of Phrase: no 
      - Method Used (statistical, syntactic, other): no 
    -  Syntactic Parsing? : no  
    -  Word Sense Disambiguation? : no 
    -  Heuristic Associations (including short definition)? : no  
    -  Spelling Checking (with manual correction)? : no 
    -  Spelling Correction? :  no 
    -  Proper Noun Identification Algorithm? : no 
    -  Tokenizer? :              
      - Patterns which are tokenized: no 
    -  Manually-Indexed Terms? : no 
    -  Other Techniques for building Data Structures: 5-grams 

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID : no 
    - Clusters           
      - Run ID : no 
    - N-grams, Suffix arrays, Signature Files           
      - Run ID : rutscn20 
      - Total Storage (in MB): no 
      - Total Computer Time to Build (in hours): no database built 
      - Automatic Process? (If not, number of manual hours): n/a 
      - Brief Description of Method: scanning 
    - Knowledge Bases            
      - Run ID : no 
      - Use of Manual Labor                  
    - Special Routing Structures           
      - Run ID : no 
    - Other Data Structures built from TREC text           
      - Run ID : no 

    Data Built from Sources Other than the Input Text

    -  Internally-built Auxiliary File            
      - Domain (independent or specific): no 
      - Use of Manual Labor                   
    -  Externally-built Auxiliary File            
      - Type of File (Treebank, WordNet, etc.):  no 

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used:  no 
    - Method used in Query Construction          
      - Tokenizer? :                 
      - Expansion of Queries using Previously-Constructed Data Structure? :              
    Automatically Built Queries (Routing)

    - Topic Fields Used:  no 
    - Method used in Query Construction          
      - Terms Selected From            
      - Term Weighting with Weights Based on terms in            
      - Phrase Extraction from            
      - Syntactic Parsing            
      - Word Sense Disambiguation using            
      - Proper Noun Identification Algorithm from            
      - Tokenizer             
      - Heuristic Associations to Add Terms from            
      - Expansion of Queries using Previously-Constructed Data Structure: 
      - Automatic Addition of Boolean connectors or Proximity Operators using 
        information from             

    Manually Constructed Queries (Ad-Hoc)

    - Topic Fields Used:  all 
    - Average Time to Build Query (in Minutes): 2 
    - Type of Query Builder          
    - Tools used to Build Query          
      - Knowledge Base Browser? :                 
      - Other Lexical Tools? :                
    - Method used in Query Construction          
      - Addition of Terms not Included in Topic? :               
      - Other:  manual elimination of all stop words and non-content words

  Searching

    Search Times

      - Run ID : rutscn20 
      - Computer Time to Search (Average per Query, in CPU seconds): 18,000 
        using nawk 
      - Component Times :  automated construction of scanning script 0%,
        scanning 100% 

    Machine Searching Methods

      - N-gram Matching? : yes 
      - Free Text Scanning? : yes 

    Factors in Ranking

      - Term Frequency? :  yes
      - Inverse Document Frequency? : no  
      - Document Length? : yes  
      - N-gram Frequency? : yes
      - Other: Partial match 5-grams, with any 4 of 5 characters correct. 

    Machine Information

    - Machine Type for TREC Experiment: Sun SparcStation 20 
    - Was the Machine Dedicated or Shared: mostly dedicated 
    - Amount of Hard Disk Storage (in MB):  9,000
    - Amount of RAM (in MB): 48 
    - Clock Rate of CPU (in MHz): 50 

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of the 
      System: a few hours 
    - Given appropriate resources            
      - Could your system run faster? : yes  
      - By how much (estimate)? : 100 times probably 
    - Features the System is Missing that would be beneficial: The list is 
      endless.  

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the 
      system and are not answered by above questions:  Partial match 5-grams 
      to provide some level of robustness against data corruption.