System Summary and Timing
  Organization Name: The University of Kansas
  List of Run ID's: KUSG2, KUSG3

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list: 580 
    - Controlled Vocabulary?: no 
    - Stemming Algorithm: no              
      - Morphological Analysis: no 
    - Term Weighting:  yes 
    -  Phrase Discovery?:              
    -  Heuristic Associations (including short definition)?: yes 
    -  Tokenizer?:              

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID: KUSG2, KUSG3 
      - Total Storage (in MB): 1020 
      - Total Computer Time to Build (in hours): 5 hours 
      - Automatic Process? (If not, number of manual hours): yes  
      - Use of Term Positions?: no
      - Only Single Terms Used?: yes 
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases            
      - Run ID: KUSG2, KUSG3 
      - Total Storage (in MB): 316 
      - Total Computer Time to Build (in hours):  129 hours
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Manual Labor                  
      - Number of Concepts Represented: 42944 
      - Type of Representation:  similarity matrix
      - Auxiliary Files Needed: none 
    - Special Routing Structures           
    - Other Data Structures built from TREC text           

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used: ap: HEAD, TEXT; fr: TEXT; wsj:  LP, TEXT; 
ziff: TITLE, TEXT; cr: TEXT; fr: TEXT; ft: HEADLINE, TEXT
    - Average Computer Time to Build Query (in cpu seconds): 14 
    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)?: yes 
      - Phrase Extraction from Topics?: no 
      - Syntactic Parsing of Topics?: no 
      - Word Sense Disambiguation?: no 
      - Proper Noun Identification Algorithm?: no 
      - Tokenizer?:  no               
      - Heuristic Associations to Add Terms?: yes 
      - Expansion of Queries using Previously-Constructed Data Structure?: yes
        -  Structure Used: similarity matrix 
      - Automatic Addition of Boolean Connectors or Proximity Operators?: no 

  Searching

    Search Times

      - Run ID: KUSG2, KUSG3 
      - Computer Time to Search (Average per Query, in CPU seconds): 26 
      - Component Times:  query expansion 4; document retrieval 22 

    Factors in Ranking

      - Term Frequency?:  yes 
      - Inverse Document Frequency?: yes  
      - Other Term Weights?: yes 
      - Semantic Closeness?: no 
      - Position in Document?: no 
      - Syntactic Clues?: no 
      - Proximity of Terms?: no 
      - Information Theoretic Weights?: no 
      - Document Length?: yes 
      - Percentage of Query Terms which match?: yes 
      - N-gram Frequency?: no 
      - Word Specificity?: no  
      - Word Sense Frequency?:  no 
      - Cluster Distance?: no 
      - Other: Term similarity between original term and terms added from 
similarity matrix by automatic expansion 

    Machine Information

    - Machine Type for TREC Experiment: Sun SPARCcenter 2000 
    - Was the Machine Dedicated or Shared: Shared 
    - Amount of Hard Disk Storage (in MB): 9 GB 
    - Amount of RAM (in MB): 512 
    - Clock Rate of CPU (in MHz):  67 

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of the 
System:  modest 
    - Given appropriate resources            
      - Could your system run faster?:  yes 
      - By how much (estimate)?:  20% 
    - Features the System is Missing that would be beneficial: disambiguation, 
browser for viewing term similarity matrix 

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the 
system and are not answered by above questions:  automatic calculation of term 
similarity based on the contexts in the corpus in which the terms instances 
appear.