System Summary and Timing
  Organization Name:  University of Kansas
  List of Run ID's:  KU1

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list: 23 
    - Controlled Vocabulary? : no 
    - Stemming Algorithm: no              
      - Morphological Analysis: no 
    - Term Weighting:  yes 
    -  Phrase Discovery? :              
    -  Heuristic Associations (including short definition)? : yes 
    -  Tokenizer? :              

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID : KU1 
      - Total Storage (in MB): 325 
      - Total Computer Time to Build (in hours): 11 hours 
      - Automatic Process? (If not, number of manual hours): yes  
      - Use of Term Positions? : no
      - Only Single Terms Used? : yes 
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases            
      - Run ID : KU1 
      - Total Storage (in MB): 196 
      - Total Computer Time to Build (in hours):  43 hours
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Manual Labor                  
      - Number of Concepts Represented: 10022 
      - Type of Representation:  similarity matrix
      - Auxiliary Files Needed: none 
    - Special Routing Structures           
    - Other Data Structures built from TREC text           

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used: wsj:  LP, TEXT; sjm: LEADPARA, TEXT 
    - Average Computer Time to Build Query (in cpu seconds):  2.4 minutes 
      elapsed time (cpu time unavailable)
    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)? : yes 
      - Phrase Extraction from Topics? : no
      - Syntactic Parsing of Topics? :no 
      - Word Sense Disambiguation? : no 
      - Proper Noun Identification Algorithm? :no 
      - Tokenizer? :  no               
      - Heuristic Associations to Add Terms? :yes 


      - Expansion of Queries using Previously-Constructed Data Structure? : yes              
        -  Structure Used: similarity matrix 
      - Automatic Addition of Boolean Connectors or Proximity Operators? :no 

  Searching

    Search Times

      - Run ID : KU1 
      - Computer Time to Search (Average per Query, in CPU seconds): 144
      - Component Times :  query expansion 10 document retrieval 134

    Factors in Ranking

    - Factors in Ranking       
      - Term Frequency? :  yes 
      - Inverse Document Frequency? : yes  
      - Other Term Weights? : yes 
      - Semantic Closeness? :  no 
      - Position in Document? :  no
      - Syntactic Clues? : no 
      - Proximity of Terms? : no 
      - Information Theoretic Weights? : no 
      - Document Length? : yes 
      - Percentage of Query Terms which match? : yes 
      - N-gram Frequency? : no 
      - Word Specificity? : no  
      - Word Sense Frequency? :  no 
      - Cluster Distance? : no  
      - Other: Term similarity between original term and terms added from 
        similarity matrix by automatic expansion 


    Machine Information

    - Machine Type for TREC Experiment: Sun SPARC 10 
    - Was the Machine Dedicated or Shared: Shared 
    - Amount of Hard Disk Storage (in MB): 9 GB 
    - Amount of RAM (in MB): 128 
    - Clock Rate of CPU (in MHz):  50 

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development 
      of the System:  modest 
    - Given appropriate resources            
      - Could your system run faster? :  yes 
      - By how much (estimate)? :  20% 
    - Features the System is Missing that would be beneficial: disambiguation, 
      browser for viewing term similarity matrix 

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the 
      system and are not answered by above questions:  automatic calculation 
      of term similarity based on the contexts in the corpus in which the 
      terms instances appear.