System Summary and Timing
  Organization Name: University of North Carolina
  List of Run ID's: uncis1, uncis2 (both are Category B manual ad hoc runs)

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list: 571 
    - Controlled Vocabulary?:  no 
    - Stemming Algorithm: Modified Lovins (SMART v. 11.0)              
      - Morphological Analysis: no 
    - Term Weighting: SMART's "lnc" weights for document term weights.
SMART's "ltc" weights for query term weights for initial ranking. 
    -  Phrase Discovery?:              
    -  Tokenizer?:              

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID:  uncis1, uncis2 
      - Total Storage (in MB):  184 
      - Total Computer Time to Build (in hours): approx. 1.3 hours total 
(Estimated 53 min. to pre-process documents for SMART using SPARCcenter 1000. 
25.5 min. for SMART to index documents and topics [here, used SPARCcenter but 
time is from Sun Ultra].)   
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: yes 
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases            
      - Use of Manual Labor                  
    - Special Routing Structures           
    - Other Data Structures built from TREC text           
      - Run ID: uncis1, uncis2 
      - Type of Structure: Sequential document index for each topic 
      - Total Storage (in MB):  Average of 13.6 MB per topic 
      - Total Computer Time to Build (in hours):  Average of 11 minutes per 
topic. (Time does not include some pre-processing. Total time: ?? - Greater 
than an hour on SPARCcenter 1000.) 
      - Automatic Process? (If not, number of manual hours): yes 
      - Brief Description of Method: Inverted index created by SMART used for 
initial ranking of documents only. A sequential document index is made for each
topic consisting of top 5000 documents of initial ranking.  Documents in index 
are sorted in increasing order of rank (i.e., 1, 2, 3, ...). 

  Query construction

    Interactive Queries

    - Initial Query Built Automatically or Manually: automatically (DESC field 
used), using SMART v. 11.0 
    - Type of Person doing Interaction            
      - Domain Expert: no 
      - System Expert: yes 
    - Average Time to do Complete Interaction            
      - CPU Time (Total CPU Seconds for all Iterations): uncis1 - estimated at 
approx. 125 minutes (involves reading from and writing to files)
uncis2 - real time no more than 5 seconds 
      - Clock Time from Initial Construction of Query to Completion of Final 
Query (in minutes): approx. 50 minutes, but varied radically
    - Average Number of Iterations: An "iteration" is defined as the number of 
sets of retrieved documents examined. The documents were ranked again after the
last set of retrieved documents examined.  Therefore, there would be 3 separate
"iterations," for example, but 4 separate "rankings."  uncis1 - 2.28
uncis2 - 2.32 
    - Average Number of Documents Examined per Iteration: Documents retrieved 
in a previous iteration were not examined again, and, therefore, are not 
included in determining the average.  uncis1 - 23.42 - Varied radically.
uncis2 - approx. 30, but varied radically 
    - Minimum Number of Iterations: (no relevant documents found in initial 
ranking) 
    - Maximum Number of Iterations: uncis1 - 4; uncis2 - 5 
    - What Determines the End of an Iteration: No further benefit anticipated. 
    - Methods used in Interaction         
      - Automatic Term Reweighting from Relevant Documents?: yes 
      - Automatic Query Expansion from Relevant Documents?: yes                
        - All Terms in Relevant Documents added: yes  uncis1 - all terms also 
added from selected non-relevant documents 
      - Manual Methods: Only manual intervention is relevance assessments.               
  Searching

    Search Times

      - Run ID:  uncis1, uncis2 
      - Computer Time to Search (Average per Query, in CPU seconds): 24 seconds
(for SMART initial ranking); time includes writing to files 
      - Component Times: ?? 

    Machine Searching Methods

      - Vector Space Model?: yes - initial ranking; uncis1 - yes for feedback 
iterations 
      - Probabilistic Model?: uncis2 - yes for feedback iterations 

    Factors in Ranking

      - Term Frequency?: yes (within-document frequency in "lnc" weights; 
within-query frequency in "ltc" weights) 
      - Inverse Document Frequency?: yes (in "ltc" weights) 
      - Other Term Weights?: unics2 - relevance term weights  Also see 
"Interactive Queries" section. 
      - Document Length?:  yes (cosine normalization) 

    Machine Information

    - Machine Type for TREC Experiment: Sun Ultra running Solaris 2.5 (Some 
initial work done on SPARCcenter 1000 running Solaris 2.5.  All times, however,
are from Sun Ultra unless otherwise stated. All feedback iterations were done 
on the Sun Ultra.  The Ultra was substantially faster than the SPARCcenter 
1000. 
    - Was the Machine Dedicated or Shared:  Shared (but we were primary users) 
    - Amount of Hard Disk Storage (in MB):  Ultra - 14 GB (partitioned). 
    - Amount of RAM (in MB): Ultra - 128 MB (SPARCcenter - 512 MB) 
    - Clock Rate of CPU (in MHz): Ultra - 173 MHz 

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of the 
System: about 9 months 
    - Given appropriate resources            
      - Could your system run faster?: Yes. 
      - By how much (estimate)?: ??