System Summary and Timing
  Organization Name: OIT/GMU/NCR
  List of Run ID's: English: gmu96au1, gmu96au2, gmu96ma1, gmu96ma2 
                    Spanish: gmu96sp1, gmu96sp2 
                    Chinese: gmu96ca1, gmu96ca2, gmu96cm1, gmu96cm2 
                    Confusion: gmu96v00, vmu96v10, gmu96v20,gmu96v01, 
                               gmu96v11, gmu96v21  
                    Large: gmu96lg4

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list:  144 
    - Controlled Vocabulary?:  no 
    - Stemming Algorithm:               
      - Morphological Analysis: NO  
    - Term Weighting:  tf-idf 
    -  Phrase Discovery?:              
      - Kind of Phrase:  Yes, any two terms that were not separated by a 
punctuation mark or a stop term. 
    -  Syntactic Parsing?:  No 
    -  Word Sense Disambiguation?: No 
    -  Heuristic Associations (including short definition)?:  No 
    -  Spelling Checking (with manual correction)?:  No 
    -  Spelling Correction?:  No 
    -  Proper Noun Identification Algorithm?:  No 
    -  Tokenizer?:              
      - Patterns which are tokenized:  No 
    -  Manually-Indexed Terms?:  No 

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID:   gmu96au2 
      - Total Storage (in MB):  500 
      - Total Computer Time to Build (in hours): 2.4 
      - Automatic Process? (If not, number of manual hours): Y 
      - Use of Term Positions?: No 
      - Only Single Terms Used?: No 
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
      - Run ID: gmu96v00, vmu96v10, gmu96v20, gmu96v01, gmu96v11,gmu96v21 
      - Brief Description of Method: 4-grams were used that spanned words 
    - Knowledge Bases            
      - Use of Manual Labor                  
    - Special Routing Structures           
    - Other Data Structures built from TREC text           

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used:  DESC and NARRATIVE 
    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)?:  Yes 
      - Phrase Extraction from Topics?: Yes 
      - Syntactic Parsing of Topics?: No 
      - Word Sense Disambiguation?: No 
      - Proper Noun Identification Algorithm?: No
      - Tokenizer?:                 
        - Patterns which are Tokenized:  No 
      - Heuristic Associations to Add Terms?: No 
      - Expansion of Queries using Previously-Constructed Data Structure?:
      - Other:  Yes, automatic relevance feedback was used for English, Spanish
and corrupted data.          

    Manually Constructed Queries (Ad-Hoc)

    - Topic Fields Used:  DESC and NARR used for gmu96ma1 and gmu96ma2
    - Average Time to Build Query (in Minutes): 10 minutes
    - Type of Query Builder          
      - Computer System Expert: Yes 
    - Tools used to Build Query          
      - Word Frequency List?: Yes 
      - Knowledge Base Browser?:                 
      - Other Lexical Tools?:                
    - Method used in Query Construction          
      - Term Weighting?:  Yes 
      - Boolean Connectors (AND, OR, NOT)?:  Yes 
      - Proximity Operators?: No 
      - Addition of Terms not Included in Topic?: Yes               
        - Source of Terms:  Manual sources, thesaurus, etc. 

  Searching

    Search Times

      - Run ID:  gmu96au2 
      - Computer Time to Search (Average per Query, in CPU seconds): 265.28 

    Machine Searching Methods

      - Vector Space Model?:  Yes
      - N-gram Matching?: Yes 

    Factors in Ranking

      - Term Frequency?:  Yes 
      - Inverse Document Frequency?: Yes   
      - Document Length?:  Yes 
      - N-gram Frequency?: Yes 

    Machine Information

    - Machine Type for TREC Experiment:  English, Chinese, and confusion tracks
were done on a single processor, Intel Pentium processor.  A second English 
run and Spanish were done on a 4 processor DBC-1012.
    - Was the Machine Dedicated or Shared:  Dedicated 
    - Amount of Hard Disk Storage (in MB):  4 GB 
    - Amount of RAM (in MB):  62 MB 

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of the 
System:  Yes, 1 person year for each prototype 
    - Given appropriate resources            
      - Could your system run faster?: Yes  
      - By how much (estimate)?:  IR could run 10-20 percent faster, relational
could be improved several orders of magnitude by adding additional processors. 
All initial results have shown the system to be scalable.       

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the 
system and are not answered by above questions:  Ability to run on multiple 
processors.  Also, it was not easy to add information about our different 
variations using relevance feedback.