System Summary and Timing
  Organization Name: Xerox PARC
  List of Run ID's: xerox1 xerox2

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Stemming Algorithm:               
    - Term Weighting:  mixed: no weighting / LSI 
    -  Phrase Discovery?:              
      - Kind of Phrase:  two-word phrases
      - Method Used (statistical, syntactic, other): statistical 
    -  Syntactic Parsing?:  no 
    -  Word Sense Disambiguation?: no 
    -  Heuristic Associations (including short definition)?: no  
    -  Spelling Checking (with manual correction)?: no 
    -  Spelling Correction?: no 
    -  Proper Noun Identification Algorithm?: no  
    -  Tokenizer?:              
    -  Manually-Indexed Terms?:  no 
    -  Other Techniques for building Data Structures: none 

    Statistics on Data Structures built from TREC Text

    - Inverted index           
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases            
      - Use of Manual Labor                  
    - Special Routing Structures           
      - Run ID:  xerox1 xerox2
      - Type of Structure:  lsi
      - Total Storage (in MB): 40 
      - Total Computer Time to Build (in hours): 5 
      - Automatic Process? (If not, number of manual hours): yes 
      - Brief Description of Method: local LSI, one for each topic on 
        2000 chisquare selected terms
    - Other Data Structures built from TREC text           

    Data Built from Sources Other than the Input Text

    -  Internally-built Auxiliary File            
      - Use of Manual Labor                   
    -  Externally-built Auxiliary File            

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Method used in Query Construction          
      - Tokenizer?:                 

      - Expansion of Queries using Previously-Constructed Data Structure?:              
    Automatically Built Queries (Routing)

    - Topic Fields Used:  all fields 


    - Average Computer Time to Build Query (in cpu seconds): less than 5
    - Method used in Query Construction          
      - Terms Selected From            
        - Topics:  yes
        - All Training Documents:  yes 
        - Only Documents with Relevance Judgments: yes 
      - Term Weighting with Weights Based on terms in            
        - Topics:  no weights or lsi weights 
        - All Training Documents: no weights or lsi weights  
        - Documents with Relevance Judgments: no weights or lsi weights  
      - Phrase Extraction from            
        - Topics:  yes
        - All Training Documents: yes 
        - Documents with Relevance Judgments: yes 
      - Syntactic Parsing            
        - Topics:  no
        - All Training Documents:  no
        - Documents with Relevance Judgments:  no
      - Word Sense Disambiguation using            
        - Topics:  no
        - All Training Documents: no 
        - Documents with Relevance Judgments: no 
      - Proper Noun Identification Algorithm from            
        - Topics:  no
        - All Training Documents: no 
        - Documents with Relevance Judgments: no 
      - Tokenizer             
      - Heuristic Associations to Add Terms from            
        - Topics:  no
        - All Training Documents: no 
        - Documents with Relevance Judgments:  no
      - Expansion of Queries using Previously-Constructed Data Structure:
      - Automatic Addition of Boolean connectors or Proximity Operators
using information from             

  Searching

    Machine Searching Methods

      - Vector Space Model? :  yes
      - Probabilistic Model? : yes 
      - Neural Networks? :  yes

    Factors in Ranking

      - Term Frequency? : yes 
      - Inverse Document Frequency? : yes 
      - Other Term Weights? :  lsi 
      - Document Length? :  yes

    Machine Information

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of 
the System:  little 
    - Given appropriate resources            
      - Could your system run faster?: yes 
      - By how much (estimate)?:  factor of 10