System Summary and Timing
  Organization Name: Open Text Corporation
  List of Run ID's: colm1, colm1A, colm2, colm4, colm5

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list: configured to 0 
    - Controlled Vocabulary?: No 
    - Stemming Algorithm: divided between build and run-time            
      - Morphological Analysis:  disabled
    - Term Weighting: No 
    -  Phrase Discovery?: No             
    -  Syntactic Parsing?: No 
    -  Word Sense Disambiguation?: No 
    -  Heuristic Associations (including short definition)?: No 
    -  Spelling Checking (with manual correction)?: No 
    -  Spelling Correction?: No 
    -  Proper Noun Identification Algorithm?: No 
    -  Tokenizer? Yes:              
      - Patterns which are tokenized: General Mealy machine specified in 
configuration file, Single Token Class, User Replaceable 
    -  Manually-Indexed Terms?: No 
    -  Other Techniques for building Data Structures: Multi-tree data model 
constructed based on configuration information and text parsing.  Content and 
structure schemas applied in the construction of trees.                             
    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID: colm1, colm1a, colm2, colm4, colm5 
      - Total Storage (in MB): 4,000 
      - Total Computer Time to Build (in hours): 6 
      - Automatic Process? (If not, number of manual hours): Yes 
      - Use of Term Positions?: Yes 
      - Only Single Terms Used?: No, arbitrary fast phrase support 
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases            
      - Use of Manual Labor                  
    - Special Routing Structures           
    - Other Data Structures built from TREC text           
      - Run ID: Same as for Inverted Index 
      - Type of Structure: other data structures that allow for efficient
implementation of hybrid tree operations
      - Total Storage (in MB): Included under inverted index 
      - Total Computer Time to Build (in hours): Included under inverted index 
      - Automatic Process? (If not, number of manual hours): Yes 
      - Brief Description of Method: No data 

    Data Built from Sources Other than the Input Text

    -  Internally-built Auxiliary File            
      - Use of Manual Labor                   
    -  Externally-built Auxiliary File            
      - Type of File (Treebank, WordNet, etc.): thesaurus, lexicon 
      - Total Storage (in MB): 2 

  Query construction

    Manually Constructed Queries (Ad-Hoc)

    - Topic Fields Used: No 
    - Average Time to Build Query (in Minutes): 0.5-1 
    - Type of Query Builder          
      - Domain Expert: No  
      - Computer System Expert: No 
    - Tools used to Build Query          
      - Word Frequency List?: No 
      - Knowledge Base Browser?: No              
        - Structure Used: None 
      - Other Lexical Tools?: No               
    - Method used in Query Construction          
      - Term Weighting?: not enabled 
      - Boolean Connectors (AND, OR, NOT)?: Yes 
      - Proximity Operators?: Yes 
      - Addition of Terms not Included in Topic?: Yes              

  Searching

    Search Times

      - Run ID: colm1, colm1A 
      - Computer Time to Search (Average per Query, in CPU seconds): Average 
elapsed time is less than one second
      - Run ID: colm2, colm4, colm5 
      - Computer Time to Search (Average per Query, in CPU seconds): Average 
elapsed time is less than 5-10 seconds

    Machine Searching Methods

      - Probabilistic Model?: Yes 
      - Boolean Matching?: Yes 

    Machine Information

    - Machine Type for TREC Experiment: DEC alpha server 2100 
    - Was the Machine Dedicated or Shared: shared  
    - Amount of Hard Disk Storage (in MB): 75,000  
    - Amount of RAM (in MB): 128  
    - Clock Rate of CPU (in MHz): 200 

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of the 
System: 300 man years  
    - Given appropriate resources            
      - Could your system run faster?: Yes 
      - By how much (estimate)?:  factor of 50

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the 
system and are not answered by above questions: The data model and data 
structures of this system allow the concept of a document to be defined as 
part of the query.   This capability to dynamically project text views permits 
the deployment of systems that are impractical or impossible with other 
technologies.