Text REtrieval Conference (TREC)
|
Organization Name: Weill Cornell Medical College | Run ID: icbdoc |
Section 1.0 System Summary and Timing |
---|
Section 1.1 System Information |
Hardware Model Used for TREC Experiment:Dell System Use:SHARED Total Amount of Hard Disk Storage:450 Gb Total Amount of RAM:4096 MB Clock Rate of CPU:3000 MHz |
Section 1.2 System Comparisons |
Amount of developmental "Software Engineering":ALL List of features that are not present in the system, but would have been beneficial to have: List of features that are present in the system, and impacted its performance, but are not detailed within this form: |
Section 2.0 Construction of Indices, Knowledge Bases, and Other Data Structures |
---|
Length of the stopword list:0 words Type of Stemming:MORPHOLOGICAL Controlled Vocabulary:NO Term weighting:YES
Phrase discovery:NO
Type of Spelling Correction:NONE Manually-Indexed Terms:NO Proper Noun Identification:NO Syntactic Parsing:NO Tokenizer:YES Word Sense Disambiguation:NO Other technique:YES Additional comments:With the following runs, one per line: 0 tag=F-M-V-MIS-000,scorerName=vigna,runType=manual,shortfn=true,passages=false 80 tag=F-M-V-MIS-080,scorerName=vigna,runType=manual,shortfn=true,passages=false 200 tag=F-M-V-MIS-200,scorerName=vigna,runType=manual,shortfn=true,passages=false 80 tag=F-A-B-P-080-TFE-5-10-W300-TS,scorerName=bm25ec,runType=automatic,passageScorer=twease.passages.TransitionSumOfWeightsPassageScorer,passageTokenizer=twease.passages.Window300WithBoundariesPassageTokenizer,shortfn=true,passages=true,k1=1.89D,b=0.05D,freqType=average twease.query.blindexpansion.DisjunctiveQueryDistributor maxWordKeep=8 twease.query.blindexpansion.TfIdfPseudoRelQueryExpander maxNewTerms=5,documentsToInspect=10 80 tag=F-M-V-P-080,scorerName=vigna,runType=manual,shortfn=true,passages=true 80 tag=F-A-B-P-080,scorerName=bm25ec,runType=automatic,shortfn=true,passages=true,k1=1.89D,b=0.05D,freqType=average twease.query.blindexpansion.DisjunctiveQueryDistributor maxWordKeep=8 0 tag=F-A-B-P-000,scorerName=bm25ec,runType=automatic,shortfn=true,passages=true,k1=1.75D,b=0.05D,freqType=max twease.query.blindexpansion.DisjunctiveQueryDistributor maxWordKeep=8 RankFused with the following weights 5 runs/Submission-2007/trec-gen-twease,tag=F-M-V-MIS-000.txt.legal 4 runs/Submission-2007/trec-gen-twease,tag=F-M-V-MIS-080.txt.legal 1 runs/Submission-2007/trec-gen-twease,tag=F-M-V-MIS-200.txt.legal 2 runs/Submission-2007/trec-gen-twease,tag=F-A-B-P-080-TFE-5-10-W300-TS.txt.legal 4 runs/Submission-2007/trec-gen-twease,tag=F-M-V-P-080.txt.legal 3 runs/Submission-2007/trec-gen-twease,tag=F-A-B-P-080.txt.legal 4 runs/Submission-2007/trec-gen-twease,tag=F-A-B-P-000.txt.legal Keeping one passage per document. The RankFusion item ranker was ItemRankingByGradientInterleaveByRank:max=5. |
Section 3.0 Statistics on Data Structures Built from TREC Text |
---|
Section 3.1 First Data Structure |
Structure Type:INVERTED INDEX Type of other data structure used: Brief description of method using other data structure: Total storage used:7.49 Gb Total computer time to build:2 hours Automatic process:YES Manual hours required:hours Type of manual labor:NONE Term positions used:YES Only single terms used:YES Concepts (vs. single terms) represented:NO
Type of representation: Auxilary files used:YES
Additional comments: |
Section 3.2 Second Data Structure |
Structure Type:NONE Type of other data structure used: Brief description of method using other data structure: Total storage used:Gb Total computer time to build:hours Automatic process: Manual hours required:hours Type of manual labor:NONE Term positions used: Only single terms used: Concepts (vs. single terms) represented:
Type of representation: Auxilary files used:
Additional comments: |
Section 3.3 Third Data Structure |
Structure Type:NONE Type of other data structure used: Brief description of method using other data structure: Total storage used:Gb Total computer time to build:hours Automatic process: Manual hours required:hours Type of manual labor:NONE Term positions used: Only single terms used: Concepts (vs. single terms) represented:
Type of representation: Auxilary files used:
Additional comments: |
Section 4.0 Data Built from Sources Other than the Input Text |
---|
File type:THESAURUS Domain type:DOMAIN SHARED Total Storage:1 Gb Number of Concepts Represented:190347 concepts Type of representation:NONE Automatic or Manual:AUTOMATIC
Type of Manual Labor used:NONE Additional comments: |
File is:NONE Total Storage:Gb Number of Concepts Represented:concepts Type of representation:NONE Additional comments: |
Section 5.0 Computer Searching |
---|
Average computer time to search (per query): 58 CPU seconds |
Times broken down by component(s): |
Section 5.1 Searching Methods |
Vector space model:NO Probabilistic model:YES Cluster searching:NO N-gram matching:NO Boolean matching:NO Fuzzy logic:NO Free text scanning:NO Neural networks:NO Conceptual graphic matching:NO Other:NO Additional comments: |
Section 5.2 Factors in Ranking |
Term frequency:YES Inverse document frequency:YES Other term weights:NO Semantic closeness:YES Position in document:YES Syntactic clues:NO Proximity of terms:YES Information theoretic weights:NO Document length:YES Percentage of query terms which match:NO N-gram frequency:NO Word specificity:NO Word sense frequency:NO Cluster distance:NO Other:YES Additional comments:Minimal interval semantics, BM25EC |
Section 6.0 Query Construction |
---|
Section 6.1 Automatically Built Queries for Ad-hoc Tasks |
Topic fields used: NARRATIVE Average computer time to build query 0.01 CPU seconds Term weighting (weights based on terms in topics): YES Phrase extraction from topics: NO Syntactic parsing of topics: NO Word sense disambiguation: NO Proper noun identification algorithm: NO Tokenizer: YES Expansion of queries using previously constructed data structures: NO Automatic addition of: PROXIMITY OPERATORS |
Section 6.2 Manually Constructed Queries for Ad-hoc Tasks |
Topic fields used: NARRATIVE Average time to build query? 1 minutes Type of query builder: DOMAIN EXPERT Tool used to build query: NONE Method used in intial query construction? BOOLEAN CONNECTORS Total CPU time for all iterations: 0 seconds Clock time from initial construction of query to completion of final query: 0 minutes Average number of iterations: 0 Average number of documents examined per iteration: 0 Minimum number of iterations: 0 Maximum number of iterations: 0 The end of an iteration is determined by: NA Automatic term reweighting from relevant documents: NO Automatic query expansion from relevant documents: NO Other automatic methods: NO Manual methods used: YES |
Disclaimer: Contents of this online document are not necessarily the official views of, nor endorsed by the U.S. Government, the Department of Commerce, or NIST. |