System Summary and Timing
  Organization Name: InTEXT Systems R and D Labs
  List of Run ID's: INTXM, INTXA

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Stemming Algorithm:               
    - Term Weighting: Automatic  
    -  Phrase Discovery?:              
    -  Tokenizer?:              

    Statistics on Data Structures built from TREC Text

    - Inverted index           
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases            
      - Use of Manual Labor                  
    - Special Routing Structures           
      - Run ID: INTXA, INTXM  
      - Type of Structure: Query index inverted on all words in all queries  
      - Total Storage (in MB): 0.005  
      - Total Computer Time to Build (in hours): 0.0005  
      - Automatic Process? (If not, number of manual hours): YES  
      - Brief Description of Method: All queries are indexed in memory. The 
documents are passed through all queries in one pass. 
    - Other Data Structures built from TREC text           

  Query construction

    Automatically Built Queries (Routing)

    - Average Computer Time to Build Query (in cpu seconds): 1200  
    - Method used in Query Construction          
      - Terms Selected From            
        - Only Documents with Relevance Judgments: Yes  
      - Term Weighting with Weights Based on terms in            
        - Documents with Relevance Judgments: No  
      - Phrase Extraction from            
        - Documents with Relevance Judgments: Yes  
      - Syntactic Parsing            
        - Documents with Relevance Judgments: Yes  
      - Word Sense Disambiguation using            
      - Proper Noun Identification Algorithm from            
        - Documents with Relevance Judgments: Yes  
      - Tokenizer             
      - Heuristic Associations to Add Terms from            
      - Expansion of Queries using Previously-Constructed Data Structure:              
      - Automatic Addition of Boolean connectors or Proximity Operators using 
information from             
      - Other: To create queries for INTXA, Documents from training data that 
had relevance judgements were passed through Queries written manually in INTXM.
The top 15 relevant documents for each query were concatenated and the 
paragraphs containing search terms were extracted and run as a single document 
through InTEXT Precision s/w (See TREC-4) to generate weighted keywords and 
phrases. The INTXM queries were transformed into the INTXA filters by a set of 
rules to insert new alternatives, add new terms and delete original ones. 

    Manually Constructed Queries (Routing)

    - Average Time to Build Query (in Minutes): 15 
    - Type of Query Builder          
      - Computer System Expert: YES 
    - Tools used to Build Query          
      - Knowledge Base Browser?:                 
      - Other Lexical Tools?: See previous TREC queries              
    - Data Used for Building Query from           
    - Method used in Query Construction          
      - Boolean Connectors (AND, OR, NOT)?: YES  
      - Proximity Operators?: YES 
      - Addition of Terms not Included in Topic?:               
      - Other: This was the method used to build the INTXA query set

  Searching

    Search Times

      - Run ID: INTXA, INTXM  
      - Computer Time to Search (Average per Query, in CPU seconds): 360 

    Machine Searching Methods

      - Boolean Matching?: YES  
      - Free Text Scanning?: YES  

    Factors in Ranking

      - Term Frequency?: YES  
      - Other Term Weights?: Computed dynamically during running from document 
set to identify good discriminating terms  
      - Position in Document?: YES  
      - Proximity of Terms?: YES  
      - Document Length?: YES  
      - Percentage of Query Terms which match?: YES  

    Machine Information

    - Machine Type for TREC Experiment: 486 
    - Was the Machine Dedicated or Shared: Dedicated  
    - Amount of Hard Disk Storage (in MB): 2000  
    - Amount of RAM (in MB): 16  
    - Clock Rate of CPU (in MHz): 66  

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of the 
System:  25 days to put commercial components together  
    - Given appropriate resources            
      - Could your system run faster?: YES  
      - By how much (estimate)?: 2  

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the 
system and are not answered by above questions: The documents were run through 
all 100 queries simultaneously. The mean time per query quoted is the total 
CPU time divided by 100.