System Summary and Timing
  Organization Name: NEC
  List of Run ID's: virtu3, virtu4

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list: 430 
    - Controlled Vocabulary? : Yes 
    - Stemming Algorithm: Yes              
      - Morphological Analysis: Yes 
    - Term Weighting: Yes 
    -  Phrase Discovery? : No             
    -  Syntactic Parsing? : No 
    -  Word Sense Disambiguation? : No 
    -  Heuristic Associations (including short definition)? : No 
    -  Spelling Checking (with manual correction)? : No 
    -  Spelling Correction? : No 
    -  Proper Noun Identification Algorithm? : No 
    -  Tokenizer? : Yes             
      - Patterns which are tokenized: common patterns 
    -  Manually-Indexed Terms? : No 
    -  Other Techniques for building Data Structures: No 

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID : virtu3, virtu4 
      - Total Storage (in MB): 3000 
      - Total Computer Time to Build (in hours): 200 
      - Automatic Process? (If not, number of manual hours): Yes 
      - Use of Term Positions? : Yes 
      - Only Single Terms Used? : Yes 
    - Clusters           
      - Run ID : No 
    - N-grams, Suffix arrays, Signature Files           
      - Run ID : No 
    - Knowledge Bases            
      - Run ID : No 
      - Use of Manual Labor                  
    - Special Routing Structures           
      - Run ID : No 
    - Other Data Structures built from TREC text           
      - Run ID : virtu3 
      - Type of Structure: word co-occurrency 
      - Total Storage (in MB): 630 
      - Total Computer Time to Build (in hours): 120 
      - Automatic Process? (If not, number of manual hours): Yes 
      - Brief Description of Method: Frequency of two words occuring in the 
        same paragraph. 

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used: All 
    - Average Computer Time to Build Query (in cpu seconds): 20 min per query 


    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)? : Yes 
      - Phrase Extraction from Topics? : Yes 
      - Syntactic Parsing of Topics? : Yes 
      - Word Sense Disambiguation? : No 
      - Proper Noun Identification Algorithm? : Yes 
      - Tokenizer? : Yes                
        - Patterns which are Tokenized: part of noun phrase identification 
      - Heuristic Associations to Add Terms? : No 
      - Expansion of Queries using Previously-Constructed Data Structure? : Yes              
        -  Structure Used: thesaurus (WordNet) 
      - Automatic Addition of Boolean Connectors or Proximity Operators? : No 
      - Other: No 

    Automatically Built Queries (Routing)

    - Topic Fields Used: All 
    - Average Computer Time to Build Query (in cpu seconds): 30 min per query 
    - Method used in Query Construction          
      - Terms Selected From            
        - Topics: All 
        - All Training Documents: No 
        - Only Documents with Relevance Judgments: No 
      - Term Weighting with Weights Based on terms in            
        - Topics: Yes 
        - All Training Documents: No 
        - Documents with Relevance Judgments: Yes 
      - Phrase Extraction from            
        - Topics: Yes 
        - All Training Documents: No 
        - Documents with Relevance Judgments: No 
      - Syntactic Parsing            
        - Topics: Yes 
        - All Training Documents: No 
        - Documents with Relevance Judgments: No 
      - Word Sense Disambiguation using            
        - Topics: No 
        - All Training Documents: No 
        - Documents with Relevance Judgments: No 
      - Proper Noun Identification Algorithm from            
        - Topics: Yes 
        - All Training Documents: No 
        - Documents with Relevance Judgments: No 
      - Tokenizer             
        - Patterns which are tokenized (dates, phone numbers, common patterns, 
          etc): part of noun phrase identificaton 
        - from Topics: Yes 
        - from All Training Documents: No 
        - from Documents with Relevance Judgments: No 
      - Heuristic Associations to Add Terms from            
        - Topics: No 
        - All Training Documents: No 
        - Documents with Relevance Judgments: No 
      - Expansion of Queries using Previously-Constructed Data Structure:              
        -  Structure Used: word co-occurrency 
      - Automatic Addition of Boolean connectors or Proximity Operators using 
        information from             
        - Topics: No 
        - All Training Documents: No 


        - Documents with Relevance Judgments: No 

  Searching

    Search Times

      - Run ID : virtu3, virtu4 
      - Computer Time to Search (Average per Query, in CPU seconds): 1200 

    Machine Searching Methods

      - Vector Space Model? : Yes 
      - Probabilistic Model? : No 
      - Cluster Searching? : No 
      - N-gram Matching? : No 
      - Boolean Matching? : No 
      - Fuzzy Logic? : No  
      - Free Text Scanning? : No 
      - Neural Networks? : No 
      - Conceptual Graph Matching? : No 
      - Other: No 

    Factors in Ranking

      - Term Frequency? : Yes 
      - Inverse Document Frequency? : No 
      - Other Term Weights? : No 
      - Semantic Closeness? : No 
      - Position in Document? : No 
      - Syntactic Clues? : No 
      - Proximity of Terms? : No 
      - Information Theoretic Weights? : No 
      - Document Length? : Yes 
      - Percentage of Query Terms which match? : No 
      - N-gram Frequency? : No 
      - Word Specificity? : No 
      - Word Sense Frequency? : No 
      - Cluster Distance? : No 
      - Other: No 

    Machine Information

    - Machine Type for TREC Experiment: sparc10 
    - Was the Machine Dedicated or Shared: shared 
    - Amount of Hard Disk Storage (in MB): 10000 
    - Amount of RAM (in MB): 128 
    - Clock Rate of CPU (in MHz): 40 

    System Comparisons 

    - Amount of "Software Engineering" which went into the Development of the 
      System: Three people in two month 
    - Given appropriate resources            
      - Could your system run faster? : Yes 
      - By how much (estimate)? : 50% 
    - Features the System is Missing that would be beneficial: The combined 
      use of thesaurus and word co-occurrence information