CLARIT TREC Design, Experiments, and Results
            David A. Evans, Robert G. Lefferts, Gregory Grefenstette,
           Steven K. Handerson, William R. Hersh, Armar A. Archbold

                   Laboratory for Computational Linguistics
                          Depart ment of Philosophy
                          Carnegie Mellon University


1    Introduction

This report presents an abbreviated description1 of the approach and the results of~the CLARIT
team in completing the tasks of the "Text Retrieval Conference" (TREC) organized by the
National Institute of Standards and Technology (NIST) and the Defense Advanced Research
Projects Agency (DARPA) in 1992.2

1.1   A Characterization of the TREC Tasks

TREC activities required participants to `retrieve' 200 documents for each of 100 different `top-
ics' from a large database of full-text documents. Each topic was given as a one-page description
of an item of interest. This feature of the TREC tasks was somewhat unusual, at least com-
pared to many traditional `bibliographic'~retrieval evaluations, in which the topic or `query `is
a minimal, often telegraphic, single-phrase statement of a `subject' or `an interest'.3 However,
the principal distinguishing features of the TREC tasks were (1) their scal~involving a total
of approximately 2 gigabytes of text, representing approximately 750,000 full-text documents
of varying length-and (2) the careful attention of the organizers in evaluating the results
submitted by each participating group.
   More specifically, TREC tasks were designed to simulate two general types of information
`retrieval' situations, "routing" and "ad-hoc" querying. "Routing" corresponds to situations in
which a topic is possibly well documented (e.g., with examples) and the user desires to find
more similar documents. In the case of TREC tasks, 50 topics were designated as "routing"
topics; each was accompanied by a set of documents judged to be "relevant" to the topic.4 The
first installment of the full set of documents, representing approximately 1.1-gigabytes of text,
was available to each team for use in identifying possible other relevant documents for each

   1A more complete and detailed description of the CLARIT-TREC activities and results is available as a
technical report, [Evans et al. in preparation].

   2The TREC activities were organized at the end of 1991. Data was made available in the Spring of 1992.
All processing results were submitted by September 1, 1992, to NIST. The "Conference" itself-a Workshop
involving the approximately two dozen groups that submitted partial or full processing results-took place on
November ~6, 1992, in Rockville, MD.

   3The longer statements of `topics' in TREC were arguably more interesting as a test of systems and more
representative of many contemporary information-seeking situations. See Figure 9 for a sample topic statement.

   4The number of sample relevant documents varied greatly from topic to topic. Some topics had almost 100
sample relevants; other had only about ten.


                                    251

routing topic. The rules of the exercise required each group to submit `models' for each routing
topic (e.g., a set of procedures or a `query vector'), which then were `on record' and had to
be used in the final evaluation of the routing task. That evaluation required that each group
retrieve documents from the second installment of documents, approximately 0.9-gigabytes of
text. "Ad-hoc" querying corresponds to situations in which a topic is presented to a system
and appropriate documents must be found; no example documents are available. In TREC, the
second 50 topics were designed as "ad-hoc-query" topics. The rules required each group to use
the full 2-gigabyte database as the search space for ad-hoc queries. All results were reported
as a ranked list of the 200 `top' documents in response to each topic, whether a routing topic
or an ad-hoc-query topic.

1.2  Notes on CLARIT Team Participation

The CLARIT team submitted results, labeled "A" and "B" `5 representing the top 200 docu-
ments at the end of each of two sequential steps in the processing of topics. Since the actual
processing of topics was designed to give `best' results only after both stages of processing were
completed, the "A" results are known to be suboptimal; the "B" results represent the true test
of the CLARIT-TREC design.
   The large scale of the tasks challenged the resources that were available to the CLARIT
team. Storage for the source data and topics alone required 2 gigabytes of space. The research-
prototype version of the CLARIT system, which was used in the task, generates various sec-
ondary and intermediate resources in the course of processing.  Such intermediate files also
require temporary storage. In all, approximately 8 gigabytes of disk space was used for the
process. The system-engineering work required to manage the data represented a significant ef-
fort for the team; more than 75% of the team effort was devoted to (a) re-implementing critical
CLARIT processes to deal with larger volumes of data and limited space and (b) monitoring
and directing the use of resources and the sequence of processes when making actual `runs' over
the data.
   Data for the final tests was made available from NIST only after preliminary processing
results were submitted.  The CLARIT team submitted its preliminary results (the `frozen'
forms of the routing queries) on Friday, August 21, 1992. NIST express-mailed the new test
data to Carnegie Mellon on the same day, but the package was misaddressed and did not arrive.
A second mailing finally did arrive on Tuesday, August 25, one week before the deadline for
final results. Thus, all final processing took place in seven days.
   The CLARIT team utilized, variously, six machines (including a DECsystem 5820 and DEC-
station 5000s and 3100s) and the approximately 8 gigabytes of dedicated storage for TREC-
processing tasks. Actual processing occurred in batch mode over several machines and across
a network (as some storage was remote).

2   Background Description of Basic CLARIT Processing in TREC

Basic CLARIT processing is described elsewhere.6 A schematic representation of the `standard'
CLARIT process for document indexing is given in Fignre 1. A representation of the simplified
CLARIT process that was employed in the case of CLARIT-TREC document indexing is given
in Figure 2.

   5The Conference provided a special category ("Category B") for groups that intended to work only with a
subset (100 megabytes) of the TREC data. This should not be confused with what we call the CLARIT "A"
and "B" results: all CLARIT processing involved the full set of TREC data.

   6Cf. [Evans 1990], [Evans et al. 1991a,b,c].


                                         252

         Formating

                    Text Prep


                       T

             NLP

                         Morph


                           I
                         Parsing


                           I
                      Candidate NPs


  Term/Doc
  Statistics         ~NP Scoring -

                     ~ Matching

   Thesaurus
                         Scoring -

          Filtering

                                     Lexicon
                                      `Core' Lex (100,000 items)
                                      Optional Su~Domain Lex


                                     `Heuristic' Grammar
                                      `Simplex' NP
                                      Optional "Complex" NP
                                      Optional "Full Sentence" Constituents


                                     General Set of Terms


                                     "Exact"


                                     "Novel"

                                     "General"

                                                Specific
                                                Set of Terms

   "1s~Order" Thesaurus: Flat List of Terms,
                     Implicit Compositional, Hierarchical Structure

                    Figure 1: `St~d&d' CLARIT Indexing Overview


         Formating


[~w

                    Text Prep

  NLP


Scoring

    I
   Morph


    I
   Parsing


    I
Candidate NPs

Lexicon
 `Core' Lex (100,000 items)


`Heuristic' Grammar
 `Simplex' NP

  Term/Doc_________       I
                   { NPs & Words{      General Set of Terms
  Statistics


                       Figure 2: Modified CLARIT Indexing in TREC


                                       253

2.1  Selective NLP to Nominate Information Units

In brief, the CLARIT indexing process as shown in Figure 1 involves several steps, one of
which utilizes selective natural-language processing (NLP) to identify noun phrases (NPs) in
texts, which are taken as the relevant information units in all further processing. Subsequent
steps take advantage of several statistical measures of `importance' to evaluate NPs as potential
index terms. One special feature of CLARIT processing is the use of an automatically-generated
`first-order' thesaurus for a domain to support the selection of appropriate terms. The standard
CLARIT process returns three categories of index terms, corresponding to terms that (1) occur
in the document and exactly match terms in the thesaurus, (2) terms that are in the thesaurus
and are more general than near-matching terms in the document, and (3) terms that are `novel'
to the document and not found in the thesaurus. In addition to being categorized as an exact,
general, or novel index term, each term is given a numerical relevance weight deemed to reflect
its relative value in characterizing the contents of the document.

2.2  `Thesaurus Discovery' to Nominate Sets of Terms for Collections

First-order thesauri are `discovered' via another CLARIT process, distinct from indexing. The
process requires a sample of documents representing a `domain'. The sample must be moder-
ately large (e.g., minimally 2 megabytes of text) and must be composed of documents that are
more or less `about' the topic of the domain.7
   In general, CLARIT `thesaurus discovery' comprises algorithms and techniques for cluster-
ing phrases in collections of documents to construct first-order thesauri that optimally `cover'
an arbitrary percentage of all the terminology in the domain represented by the document col-
lection. `Normal' thesaurus discovery involves (1) decomposition of candidate NPs from the
documents to build a term lattice in which nodes are organized hierarchically from words to
phrases based on the number of phrases subsumed by the term associated with each node and
(2) selection of nodes that have high subsumption scores and that also satisfy certain structural
and statistical characteristics (such as being legitimate NPs, well distributed in the corpus, and
relatively uncommon in general English). Terms thus selected represent a subset of vocabulary
that accurately characterizes the domain. Thesaurus discovery is quite fast8 and typically yields
a subset of terminology that represents less than 5% of all the available terms in the corpus.9
   Since the TREC experiments involved a heterogeneous collection of documents and since
it was not possible to identify specific subsets of documents in the database as `about' one or
another topic, it was not possible to discover and use relevant thesauri in TREC tasks. Thus,
as shown in Figure 2, the simpllfied CLARIT indexing process in TREC tasks did not involve
matching' of terms against a first-order thesaurus and did not result in three-way-categorized
index terms.


   ~An example of an appropriate sample might be 50 full-text articles involving "AIDS Research"; or 2,000
abstracts about "Silicon Engraving"; or even one's personal file of recent e-mail correspondence, provided it is
sufficiently large and topically coherent.

   8At present, using the CLARIT research system, a thesaurus can be found for a 3-megabyte corpus in less
than 10 minutes on a DECstation 5000/200.

   91n fact, the number of terms returned will vary depending on parameters the user selects when generating
the thesaurus.


                                    254

2.3  Vector-Space `Similarity' Measures

The principal method used by the CLARIT system in comparing `information objects' (e.g.,
in retrieval, in routing) is vector-space distance.'0 The basic metric is that of `similarity' of
terms. `Similarity' is determined by different procedures in different contexts. Partial or `fuzzy'
matching of terms is facilitated by noting whether terms share words or attested subphrases.
For example, in vector-space modeling of documents, the contained words of all terms (in the
document vector as well as the query vector) are broken out, giving, in effect, the possibility of
matching parts of terms, though, technically, the individual words are realized as independent
dimensions of the term space.11

2.4  Notes on the Limited Version of CLARIT Processing in TREC

Because of the time and space limitations in the task, the CLARIT team did not utilize several
features of CLARIT processing that normally produce enhanced results. One of the features-
the automatic `tokenization' or identification of proper names-would certainly have assisted
processing of some topics. Another featur~the identification of equivalence classes of terms-
also would have aided the task.
   In addition, no attempt was made to establish `uniform-length' documents or sub-documents
(e.g., by setting a maximum word count or sentence length for such units). Though CLARIT
processing supports the treatment of documents as sub-document collections, that feature of
CLARIT processing was not utilized in the experiments.
   All topic statements were treated uniformly and simply: no attempt was made to handle
implicit or explicit quantification, time intervals, satisfaction conditions, etc., except as literally
encoded in the topics.
   Though CLARIT NLP modules can produce full sentence analyses or complex-NP analyses,
neither of these features was utilized in TREC processing. All documents were processed only
for simplex NPs; inevitably, some non-NP information was lost.
   In indexing TREC documents, term weights were based on a general IDF-TF score12 for
topic `domains'. In the case of multi-word terms (the norm), the full terms are assigned an inde-
pendent IDF-TF score, and each word in the term was broken out and assigned an independent
IDF-TF score.
   While all CLARIT processing is designed to be fully automatic, we did not employ fully
automatic processing in TREC tasks. In particular, there were two steps in the CLARIT-
TREC process that required non-automatic processing: (1) initial review and weighting of the
index terms automatically-nominated and derived from each topic statement and (2) review
of first-pass retrieved documents to identify 5-10 relevant ones for `feedback'. The two steps
involved minimal user intervention (and, in fact, required very little time and effort); however,
they do qualify the CLARIT-TREC system as a manual process
   In general, we regard the CLARIT-TREC system as a minimal system for purposes of
evaluation. The results of CLARIT-TREC processing are useful in helping us establish baseline
performance for core but abbreviated CLARIT functions.

 100f. [Salton & McGill 1983] for background on vector-space modeling in information retrieval applications.

 11 Cf. [Evans et al. 1992) and [Hersh et al. 1992] for an evaluation of CLARIT vector-space `similarity' measures.
 12"IDF-TF" represents the standard inver8e document frequenc~ x intradocument term frequenc~ score for
terms.


                                    255

3   Overview of CLARIT-TREC Processing

There were three major phases of processing for the CLARIT-TREC retrieval experiments.
Initially, the entire corpus, along with the topic statements, was parsed to extract candidate NPs
via CLARIT NLP. In the special case of topics, the candidate NPs were manually reviewed and
evaluated to produce weighted query terms. Second, the entire corpus (in noun phrase form) was
passed through a quick, and somewhat rough, ranking procedure that was designed to nominate
a large subset of documents for further analysis. This step is referred to as "partitioning".
A "partitioning thesaurus", or list of weighted, representative terminology was automatically
created for each topic. In the final phase of processing, referred to as "querying", a "query
vector" was produced for each topic. The query vector was used to retrieve (= rank) documents
in the selected partition for the topic using a vector-space `similarity' metric. The details of
these phases of processing are presented below, along with a discussion of different techniques
used for "routing" and "ad-hoc" queries.

3.1  Design Philosophy-"Evoke" and "Discriminate"

In approaching the principal TREC task of returning 200 ranked documents for each topic,
we used a two-stage processing strategy, illustrated in Figure 3. The first stage of processing
was designed to identify candidate documents that seemed likely to contain information related
to a topic. Of course, since the topic was represented as a set of weighted terms, this step
involved scoring each document based on the set of terms. Because this step involved scoring
every document in the database against every topic, it was important to design the scoring
procedure so that it was not computationally expensive. In fact, it was based on summing
the value and number of `hits' between the topic's set of terms and the terms (NPs) in each
document and was expected to result in an over-generated set of candidate documents. The
highest-scoring documents were retained as a candidate `partition' of the database with respect
to the topic. The second stage was designed to find the subset of documents in each partition
that best matched the topic. In theory, greater (= more discriminating) processing resources
could be devoted to this second-stage task, as the total number of documents involved was
small compared to the whole collection.
   In practice, as illustrated in Figure 3, partitioning resulted in a set of 2,000 ranked docu-
ments. The top 200 documents from the partition were submitted to NIST as the CLARIT
"A" set of results. Final querying or `discrimination' among the documents in each partition
yielded another, more accurately ranked set of 200 ranked documents, which were submitted
as the CLARIT "B" results.

3.2  Overview of the Task

As Figure 4 shows, different portions of the total TREC database were used for the "routing"
and "ad-hoc" phases of the experiment. The routing task required `training' of the first fifty
topics on the first set of data (represented as the darkened block in Figure 4). In the second step
of processing, the partitioning and query vectors that were derived from step one were used to
identify, first, 2000-document partitions in the second set of data (represented as a light block
in Fignre 4) and, second, the top-200 ranked documents in each partition. The ad-hoc query
task involved the whole database, but the CLARIT team actually used the first set of data for
a preliminary retrieval of documents (based on partitioning). A few (5-10) of the top 2~50
were chosen by quick manual inspection to supplement the query vector and then a second
automated round of partitioning over the total database was performed. The final top-200
ranked documents ultimately derived from these second-pass, 2000-document partitions.


                                   256

       Partitioning            2000-Doc       Top 200

       "Feature Scoring"


       Discrimination                    Do
  O    Ranking
       via
       ~ctor-Space "Similarity"

                                          CLARIT "A"


                                            CLARiT "B"

        Figure 3: Overview of CLARIT-TREC Processing


                                       200 of 2000


                               5-lOof5O

             2000
                                  00 of 2000


Figure 4: Overview of Processing for "Routing" vs  "Ad-Hoc" Queries

                         257

WS3891102-0187


McDermott International Inc. said its Babcock & Wilcox
unit completed the sale of its Bailey Controls Operations
to Finmeccanica S.p.A. for $295 million.

Finmeccanica is an Italian state-owned holding company
with interests in the mechanical engineering industry.

Bailey Controls, based in Wickliffe, Ohio,
makes computerized industrial controls systeRs.
It employs 2,700 people and has annual revenue
of about $370 million.


                Figure 5: Sample of Data-Document After Text Formating


4  Details of the CLARIT-TREC Experiments

Both "routing" and "ad-hoc" query experiments took advantage of basic CLARIT processing.
There are several features the two experiments share. The experiments are distinct in that
 routing" involved a special step of creation of a partitioning thesaurus using larger sets of
supplied relevant documents and "ad-hoc" queries involved partitioning the document set once
using only automatically derived (but manually weighted) query terms and choosing a small
set of relevant documents to expand the final query vector.

4.1  Preparing Data

   Each TREC document had to be formated for CLARIT processing. This involved making
the unique text ID accessible to CLARIT as a special field and delimiting the beginning and
end of each text in a file. Figure 5 gives a sample formated document. As can be seen in the
sample, the beginning and end of the record is marked by a backslash followed by "*". The
unique ID is set off by a backslash followed by "#". The beginning and end of the text of the
document is marked by a backslash followed by "!". Each paragraph is separated from the
next by a backslash followed by "C".13

4.2  Processing TREC Corpora (NLP)

   Figure 6 gives a schematic representation of the processing steps that occurred subsequent
to data formating. The process labeled "NLP" in the figure includes all the steps illustrated in
the "NLP" portion of Figure 2: morphological analysis of words and parsing for simplex NPs.
Simplex NPs were extracted for all TREC documents; words were morphologically normalized.14

 13Though CLARIT data preparation demarks paragraph units, the CLARIT-TREC process did not distinguish
divisions of text at this level. For CLARIT-TREC purposes, all the text between the "!"-marks was used as the
source of information about a document. Thus, longer and shorter documents were treated uniformly as `unit'
texts.
 14
   The manually-supplied keywords attached to some TREC documents in a "keyword field" were discarded.


                                    258

     Step:        Input:          Process:            Output:

       1    Document(s)             NLP               TermsD0~

                                                      "Parsed-Doc"


       2        Topic(s)            NLP               TermsT0~


       3

                            Hand Filter:
              TermsT0~      1. "Eliminate"            Weighted-TermsT0~
                            2. "Weight"-3/2/1

                                                    "Source- Query"

                 Figure 6: Schematic R~presentation of Data Preparation


WS3891102-0187


"ucdersott" na Rcdermott
"international" adj international
"inc." ukw? inc.
"sajd" vt-past say vt-pastprt say
"its" gen its
"babcock" na babcock
"\&" *and* and
"wilcox" na wilcox
"unit" Sn unit
"coRpleted" vt-past couplete vt-pastprt couplete
"the" d the
"sale" sn sale sn sell
"of" prep of
"its" gen its
"bailey" sn bailey
"controls" vt-pressg3 control pn control vt-pressg3 control
"operations" pn operation

"of" prep of
"about" prep about
"$370" ukw? $370
"uillion" quant uillion
"\." *period* \.


            Figure 7: Sample of Data-Document After Morphological Analysis


                                   259

#1

WS3891102-0187


-1
-1
-1
-1
-1
-1
-1
-1

-1
-1
-1
-1
-1

-1
-i
-1
-1
-1
-1
-1
-1

(mcdermott) 0 0
(international inc.) 0 0
(babcock) 0 0
(wilcox unit) 0 0
(sale) 0 0
(bailey control operation) 0 0
(iinmeccanica s.p.a.) 0 0
($295)  0 0

(finmeccanica) 0 0
(italian state) 0 0
(owned holding company)
(interest) 0 0
(mechanical engineering

0 0

industry) 0 0

(bailey control) 0 0
(vickliffe) 0 0
(ohio) 0 0
(computerized industrial control
(employ) 0 0
(people) 0 0
(annual revenue) 0 0
($370) 0 0

system) 0 0

               Figure 8: Sample of Data-Document After NP Extraction


                                 260

   A sample of a document after morphological analysis is given in Figure 7. A sample of the
same document after simplex-NP extraction is given in Figure 8. Note that "owned holding
company" and "$295" or "$370" are treated as NPs along with legitimate phrases like "comput-
erized industrial control system". While CLARIT does have facilities to discover and eliminate
inappropriate participles (such as "owned" in isolation) and can recognize nonce adjectives,
such as "state-owned", such processing was not employed in the TREC tasks. Hence, the cor-
rect expression, "Italian state-owned holding company" was not found or used in this case. In
addition, as noted previously, the CLARIT-TREC system did not `tokenize' company names
or dates or other `regular-expression'-like phrases; there was no time in our schedule for such
processing.
   All NLP (and other) processing steps were piped through the system; intermediate files
were not retained. The parsed representation of all the texts took up approximately 98% of the
space occupied by the original text. Intermediate (but unretained) files generated in CLARIT
processing included a file of the words in each text, in their original order, annotated with
morphological categories. Other files contained the output of the parser as a list of NPs in the
order in which they occurred in each text. The parsed representation of the text was retained
and used at all subsequent steps of processing. Indeed, hereafter, unless otherwise specified,
any reference to a document or collection of documents refers to the CLARIT representation
of the text, viz., a sequence of normalized simplex NPs.15

4.3   Identifying Terms from Topics

   All fields of topic statements, such as given in Figure 9, were similarly processed for NPs.
Team members reviewed the NPs and assigned weights of "1", "2", or "3" to each NP according
to whether the term was central or peripheral to the topic. (Some extracted NPs were discarded
as irrelevant or ill-formed; the vast majority were retained.) A sample set of weighted terms
for the topic in Figure 9 is given in Figure 10. The manual review and weighting of terms from
the topic statement took less than 5 minutes per topic. All subsequent processing of the query
was performed automatically.

4.4   Establishing Sets of `Relevant' Documents

Given the need to `evoke' candidate documents and to `partition' the database into subsets
that were easier to manage, we were naturally interested in identifying features in the topics
that would be useful as discriminators. We had little confidence, however, that the specific
terms in topics, which constitute the "source query", were either most respresentative of the
domain of the topic (= the `satisfaction class') or reasonably comprehensive. We thus decided
to supplement the source query with additional terms.
   In particular, we used the CLARIT thesaurus-discovery technique on known relevant doc-
uments to identify terminology that might be better representative of the satisfaction-class
documents than the source query alone. The process produced a list of terms from the avail-
able topic-relevant documents (or from a small sample of relevent documents that we may have
found) and automatically nominated the top (approximately 20%) ranked terms to supplement
the original query (as derived from the topic statement) to produce a "routing/partitioning
thesaurus" for the topic.
   Since the routing topics already had accompanying relevant documents, we used these as
a source of additional terminology.  Ad-hoc queries, on the other hand, had no associated
relevant documents, so we designed a preliminary, partial `retrieval' step that would help us

 15From the point of view of the CLARIT system, the information in a document is entirely represented by the
extracted noun phrases.


                                          261

(top>
(head> Tipster Topic Description
(num> Number:  057
(dom> Domain:  U.S. EconoRics
(title> Topic:  NCI
(desc> Description:
Document will discuss how NCI has been doing since the Bell System breakup.
(narr> Narrative:
A relevant document will discuss the financial health of NCI Communications
Corp. since the breakup of the Bell System (AT&T and the seven regional Baby
Bells) in January 1984.  The status indicated may not necessarily be a direct
or indirect result of the breakup of the systeR and ensuing regulation and
deregulation of Na Bell or of the restrictions placed upon the seven Bells; it
Ray result from any number of factors, such as advances in telecommunications
technology, NCI initiative, etc.  NCI's financial health may be reported
directly: a broad statement about its earnings or cash flow, or a report
containing financial data such as a quarterly report; or it. Ray be reflected
by one or more of the following: credit ratings, share of custoRers, volume
growth, cuts in capital spending, $$ figure net loss, pre-tax charge,
analysts' or NCI's own forecast about how well they will be doing, or NCI's
response to price cuts that AT&T makes at its own initiative or under orders
from the Federal Communications Comission (FCC), such as price reductions,
layoffs of employees out of a perceived need to cut costs, etc.  Daily OTC
trading stock Rarket and monthly short interest reports are NOT relevant; the
inventory must be longer term, at least quarterly.
(con> Concept(s):
1.   NCI Comunicat ions Corp.
2.   Bell System breakup
3.   Federal Communications Comission, FCC
4.   regulation, deregulation
5.   profits, revenue, net income, net loss, write-downs
6.   NOT daily OTC trading, NOT monthly short interest
(f ac> Factor(s):
Time:   after January 1984
(Ifac>
(def> Definition(s):
(Itop>


                                Figure 9: Sample of Data-Topic 57


                                            262

057


2 (bell system breakup) 0 0

2 (capital spending) 0 0
2 (cash flow) 0 0
2 (credit rating) 0 0
2 (custoRer) 0 0

1
3
3
1
2
2
2
1
2
2
2
2
2
1

1 (telecommunication technology) 0 0

  (Ra bell) 0 0
  (mci communication corporation) 0 0
  (mci financial health) 0 0
  (mci initiative) 0 0
  (mci) 0 0
  (net income) 0 0
  (net loss) 0 0
  (order) 0 0
  (pre tax charge) 0 0
  (price cut) 0 0
  (price reduction) 0 0
  (profit) 0 0
  (quarterly report) 0 0
  (regional baby bell) 0 0

            Figure 10: Sample of Data-Hand-Weighted Term-Set for Topic 57


                                  263

Step:            Input:            Process:            Output:

  4a         Parsed-Doc        JFeatureScoring~        Scored-DocT0~
                                      t
                              I Weighted-TerrnsT0~ I

  4b       Scored-DocT0~            Ranking            Top-2000 Scored-Doc(s)T0P

      50-Top/2000
4c    Scored-Doc(s)T0P

Hand Filter         I
= Review Top_I)oc~

5-10 Rel-DOC(s)T0P

                             "Relevance- Feedback"
                             Step in Ad-Hoc Cases

    Figure 11: Schematic Representation of Processin~ When `Relevant' Documents are not C'Tiv~n


find candidate relevant documents. In practice, this required a partitioning of a sample of data
and a review of the returned top-ranked documents. This phase of processing is illustrated in
Figure 11.
   As shown in Figure 11, Step 4a, the weighted, relevant terms were taken as a query vector
representing a subset of positive instances of concepts in the equivalence class of the topic. In
the case of ad-hoc querying, the query vector was used to identify a sample of 50 candidate
documents from a subset of the corpus, which were reviewed in rank order by team members
until 5-10 `true' relevant documents were identified (Step 4c).  This can be regarded as a
`relevance-feedback' step in the querying process. In the case of routing, the sample of `true'
relevants provided by the TREC organizers was accepted as valid and no review was performed.

4.5  Using Relevant Documents to Create `Partitioning Thesauri'

   As indicated in Figure 11 Step 4d, the `authoritative' set of relevant documents was processed
with CLARIT `thesaurus~discovery' modules to produce a set of terms that (arguably) bear
some relation to the topic. We refer to the output of this process as a "pseudo-thesaurus".
The actual routing/partitioning thesaurus was generated by CLARIT by combining the set of
weighted terms for the topic with the pseudo-thesaurus, as shown in Step 5. Note that partial
noun phrases, derived from pseudo-thesaurus entries, and attested in the documents, were also
added to the routing/partitioning thesaurus with a partial score.
   As illustrated in Figures 13 (and Figure 14), the partitioning thesaurus itself is a list of
terms, where each term has an associated vector of information specifying its importance in
any number of topics. In the case illustrated for Topic 57, for example, the term "bell system
breakup" has the triple "<057 1 2.0>" associated with it. The "057" indicates that the term is
relevant to Topic 57; the "1" indicates that the term is a full term (not an attested sub-phrase
of a term); and the "2.0" gives the term's relative weight or importance (in this case, reflecting
the score that was assigned by hand).

4.6  `Feature Scoring' to Partition Documents

   Figure 14 gives a portion of the composite or `super thesaurus' for all 100 topics. Each


                                    264

  Step:             Input:        Process:         Output:


   4d          Rel-DocsT0~      IThesaurusi        Pseudo-ThesT.~
                                IDiscoveryl

    Weighted-TermsT0~
5
       Pseudo-ThesT0~

D]Mer~e        Part-ThesT0~

    6           Parsed-Doc                              Scored-DocT0~
                                  Feature Scoring
                                         t
                                   IPartThesTopI


    7         Scored-DocT0~          Ranking            Top-2000 Scored-Doc(s)T0p

   Figure 12: Schematic Pepresentation of Processing When `Pelevant' Documents are Available


advance I (057 1 1.0>
at&t I <057 1 1.0>
bell system breakup I (057 1 2.0>
bell system I <057 1 1.0>
bell I <057 1 1.0>
breakup I <057 1 1.0>
broad statement I <057 1 1.0>
capital spending I <057 1 2.0>
cash ~lov I <057 1 2.0>
credit rating I <057 1 2.0>
customer I <057 1 2.0>
cut cost I <057 1 1.0>
cut I <057 1 1.0>
deregulation I <057 1 1.0>
direct indirect result I <057 1 1.0>

Ra bell I <057 1 1.0>
Rci comunication corporation I <057 1 3.0>
mci ~inancial health I <057 1 3.0>
mci initiative I <057 1 1.0>
mci I <057 1 2.0>
net income I <057 1 2.0>
net loss I <057 1 2.0>
order I <057 1 1.0>

telecommunication technology I <057 1 1.0>
united states economics I <057 1 1.0>
volume growth I <057 1 2.0>

           Figure 13: Sample of Data-i ,201-Term Partitioning Thesaurus for Topic 57


                                        265

advance I <057 1 1.0> <065 1 0.30> <075 1 0.30> <076 1 0.30>
american telephone I <057 1 0.30>
analyst I <054 1 0.30> <055 1 0.50> <057 1 0.50> <074 1 0.50> <080 1 0.50> <082 1 0.50> <088 1 0.50>
announcement I <057 1 0.30>
at&t cut I <057 1 0.50>
at&t price I <057 1 0.50>
at&t I <057 1 1.0>
bell system breakup I <057 1 2.0>
bell system I <057 1 1.0>
bell I <057 1 1.0>
benefit I <057 1 0.30> <060 1 0.30> <073 1 0.50> <074 1 0.50> <075 1 0.50> <088 1 0.30> <099 1 0.30>
breakup I <057 1 1.0>
broad statement I <057 1 1.0>
business customer I <057 1 0.50>
capital spending I <057 1 2.0>

jack grubman I <057 1 0.50>

late price cut I <057 1 0.30>
layoff I <057 1 1.0>
least quarterly I <057 1 2.0>
local phone company I <057 1 0.50>
local telephone company I <057 1 0.30>
long distance carrier I <057 1 0.30>
long distance telephone rate I <057 1 0.30>
ma bell I <057 1 1.0>
margin I <057 1 0.30>
market share I <057 1 0.30> <089 1 0.30>
mci comunication corporation I <057 1 3.0>
mci communication I <057 1 0.50>
mci earning I <057 1 0.50>
mci executiye I <057 1 0.50>
mci financial health I <057 1 3.0>
mci initiative I <057 1 1.0>
mci move I <057 1 0.50>
mci official I <057 1 0.50>
mci price I <057 1 0.50>
mci spokesman I <057 1 0.50>
mci I <057 1 2.0>

result I <057 1 1.0> <070   1 1.0> <081 1 2.0>
revenue I <057 1 1.0> <081 1 2.0> <053 1 0.30> <054 1 0.30> <093 1 0.30>
rising cost I <057 1 0.30>

telecommunication technology I <057 1 1.0>
telegraph company I <057 1 0.30>
telegraph I <057 1 0.30>
united states economics I <057 1 1.0> <072 1 1.0>
united telecommunication inc.  I <057 1 0.50>
united telecommunication I <057 1 0.50>

Washington based mci I <057 1 0.50>
Washington based telecommunication concern I <057 1 0.50>
villiam e conway jr. I <057 1 0.50>


             Figure 14: Sample of Data-15,287-Term `Super' P&titioning Thesaurus


                                            266

* Data from Partition:ng Theaaurua:
  Thes~Weight(term):

   Thes~Whole(term):


* Data from Document Test:

Real number weight assigned to term in par-
titioning thesaurus
Boolean value indicating that term is a whole
term in the thesaurus (1) or an attested su~
term of a whole term (0)

Tot~Terms:   Number of terms in a document

      Num~Terms:


  Term~eq(term):
Term~ength(term):
 Text~Who1e(term):

Number of unique terms (or sub-terms) found
in document that match terms (or sub-terms)
in the partitioning thesaurus
Frequency of term in document
Number of words in term
Boolean value indicating that term is a whole
term in the text (1) or an attested sub-term
of a whole term (0).

              Figure 15: Feature Matching Score (Partitioning) Input Data

Doc�core =

Num~Terms
  ~   Term~core(termi)

  ln(Tot~Terms +1.72)

  [ Num Term8              2
      ~      II'erm~Freq(term~)~
      1=1                  I
x                   i)j
        (Tot~Terms +

Term�core(term) =
                I

                    if (Thes~Whole(term) = 1)
Raw~core(term)
                       A (Text~Whole(term) = 1)

Raw~core(term)      if (Thes~Whole(term) = 1)
    4.0                A (Text~Whole(term) = 0)


R&w ~core(term)     if (Thes~Whole(term) = 0)
    8.0                A (Text~Whole(term) = 1)


                    if (Thes~Whole(term) = 0)
0.0
                       A (Text~Whole(term) = 0)

             RAW�core(term) - 2[~~n(Term~ength(term)3)-1I )( Thes~Weight(term) x Term~Freq(term)

                       Figure 16: Formula for Scoring Documents in `Partitioning'


 Doc-                  Hit-                                                Phrasal-
Length             Opportunity                                               Term         Status
Factor                Factor                                                 Factor       Factor


[~n(to1tGL)]    X   [(~th0i~t:1)2]    x      ~    term~weight x term~froq x  [2~~~1    x  ~ 0I~ ~


                          Figure 17: Schematization of `P&titioning' Formula


                                                   267

ZFO9-435-245 7.720000
WS3870123-0031 6.470000
FR89214-0026 6.310000
WS3870519-0094 6.130000
WS39009 12-0046 5.830000
WSJ870305-0055 5.360000
ZFO7-783-164 5.310000
ZFO7-189-244 5.100000
ZF07-443-642 4.980000
AP881122-0107 4.060000
WS3911018-0122 4.060000
ZFO7-971-724 4.050000
ZFO7-251-245 3.930000
WS3870421-0065 3.780000
ZFO8-084-048 3.740000
ZFO7-621-948 3.670000
WSJ911030-0170 3.610000
ZFO9-584-807 3.570000
ZFO7-294-735 3.420000
ZFO9-526-239 3.390000
ZFO7-789-516 3.330000
ZFO7-218-520 3.300000
ZFO7-800-964 3.300000
ZFO7-495-528 3.280000
WS3900629-0110 3.210000
ZFO9-559-173 3.200000
ZFO7-118-812 3.170000
WS3871030-0149 3.160000
ZFO7-878-828 3.120000
WSJ870309-01 10 3.070000
AP880419-0280 3.050000


              Figure 18: Sample of Data-First~Pass Partitioning Results for Topic 57


TREC document was `scored' against the super thesaurus in a single pass (Step 6 in Figure 11):
effectively, each document was scored against the routing/partitioning thesaurus for each topic
in parallel. In particular, every NP in each document was matched against the NPs (terms)
in the routing thesaurus; partial matches were allowed. The definitions in Figure 15 and the
formula in Figure 16 (given schematically in Figure 17) were used to yield a composite score for
the document based on the number of exact and partial hits as a function of document length
and term `value'.
   The routing/partitioning thesaurus was used to score the full database, yielding a ranking
of all documents relative to all topics simultaneously. As shown in Step 7 in Figure 11, the top
2000 documents for each topic were retained as the partition for the topic for the next stage of
processing.
   Figure 18 gives sample results of the rankings of documents based on feature scoring for
Topic 57. Figure 19 shows the set of `true' relevants chosen by manual review of the top 10-50
ranked documents.

4.7   Final `Querying'

   Figure 20 gives the final steps in the process. There were two essential phases in querying at
this point: building the final query vector and querying a partition of the database to retrieve
the final set of relevant documents.    Note that the query was weighted based on statistics


                                               268

WS3861204-0059
WSJ870123-0031
ZFO8-695-706
WS3870305-0055
WSJ870519-0094
WS3871030-0149
ZFO8-096-680
WSJ861208-0024
ZFO8-318-964
ZFO8-338-122

            Figure 19: Sample of Data-Hand-Selected 10 Relevants for Topic 57

      Step:            Input:       Process:       Output:

                  Source- Query    ~ IDFxTF         Indexed Docs
        8
                                   ~Indexin~~       (Words Broken Out)
                    2000 Docs


                     Rel-Docs      Ilntersectl      Q uery-Sup-Docs
        9                                           (Possibly ~)
                    2000 Docs

10

  Source- Query
Query-Sup-Docs

 I Merge

 Wjli&brate

_  Final Query Vector

11

    Indexed-Docs

Final Query Vector

  Vector
  Space    _  Top 200 Docs
L-JRanking

              Figure 20: Schematic Representation of Final Steps in Processing


                                     269

58.274554 (Rci) 0 0
54.311172 (uci couuunication corporation) 0 0
27.003252 (custouer) 0 0
22.711744 (price cut) 0 0
20.039764 (sprint) 0 0
19.527712 (price reduction) 0 0
19.383471 (at&t) 0 0
15.882106 (pre tax charge) 0 0
15.090805 (uake) 0 0
14.682230 (industrywide price cut) 0 0
14.592388 (price)  0 0
13.906499 (comunication) 0 0
13.878939 (distance) 0 0
13.527033 (industrywide) 0 0

12.290086 (capital spending) 0 0
12.258577 (response) 0 0
11.278964 (cash flow) 0 0
10.960371 (telecomunication) 0 0
10.681927 (conway) 0 0
10.480049 (williau e conway jr.) 0 0

1.260289 (product) 0 0
1.134040 (york) 0 0
1.133174 (couputer) 0 0
1.121569 (reported) 0 0
1.121431 (gain) 0 0
\I.


                       Figure 21: Sample of Data-Final Query for Topic 57


                                              270

extracted from a partition of the database. For the ad-hoc queries, the partition used for the
statistics was the same as the partition actually being queried. For the routing queries, however,
the final query vector was fixed before processing the new text (i.e., the second set of TREC
documents). In particular, in this case, the partition used to weight the routing-query vector
was extracted from the training corpus (the first set of TREC documents); this vector was then
queried against a partition extracted from the new, test corpus.
   The NPs and their contained words among the documents in each partition were scored
for distribution and frequency; each NP/term- and word-type was given an IDF-TF score. As
noted above, for routing queries, the IDF-TF score was based on statistics from the original
partition of 2000 documents from the training corpus; it was a static query vector. For the
ad-hoc queries, on the other hand, the final partition of 2000 documents was used as the source
of statistics for the IDF-TF scoring. Therefore, the scores for terms in the query vector for
the ad-hoc queries could vary depending on the set of documents selected in the partitioning
process. Figure 21 gives a sample of a final query.
   The terms in each topic's routing/partitioning thesaurus were given IDF-TF scores based
on the sample; original-query terms were added and the factors of those terms ("1", "2", or
"3") were used to multiply their IDF-TF-based scores; the combined terms and their contained
words thus formed an extended-query vector (the final query vector).
   The 2000 documents for each topic were modeled in vector space (in which all terms and
their contained words formed the dimensions) and the final query vector was used to identify
and rank the 200 `best' documents, which constituted our results.

4.8   Summary of the Process

Figures 22 and 23 summarize the CLARIT-TREC processes described in detail in the preceed-
mg sections. As noted previously, there were only two steps in the CLARIT-TREC process
that required non-automatic processing: (1) initial review and weighting of the index terms
automatically-nominated and derived for the topic and (2) in the case of ad-hoc queries, review
of first-pass retrieved documents to identify 5-10 relevant ones for use in creating a pseud~
thesaurus for further processing.

5   Results and Evaluation

This section presents the CLARIT-TREC results in several forms, including broad overviews
of the performance, the "official" results tables, and tables of data that focus on statistics
that are especially relevant to the CLARIT-TREC approach. Results are presented with only
abbreviated explanations.16
   As noted previously, the CLARIT team submitted both intermediate results ("A") and final
results ("B"). The intermediate results were generated by taking the highest-scoring 200 (out of
2000) documents as determined by the routing/partitioning process. Since the strategy of rout-
ing/partitioning was to nominate a moderately large candidate subset of documents in which
all the true relevants would be found and since the procedure and scoring were designed to over-
generate candidates, we expected to have many `false positives' in each set of 2000. We had no
reason to expect the relative ranking of these documents by their evoking routing/partitioning
scores would be a good measure of fit to the source topic. By contrast, we expected the final
steps (which utilize subset-specific term scoring and vector-space similarity measures) to induce
a relative ranking of documents that would represent a good fit to the source topic.

 16More detailed analysis of the results is given in [Evans et al. in preparation].


                                    271

lbpic

                                                                       Corptts

              Hand
              ~ighting

                             Source-Query
                                                                          4


                                                Part-Thes         ~ FTmLUrC
                                                                     I Sco[iIl~   I


                             Pseudo-Thes

                                        �

              Thesaurus
              Discovery

        A
Relevant ________1Hand1 ~                2000-Doc
Does                 Review              Partition                   (��OoO-DOC
                                                                     ~1t'on


                            Figure 22: (~LARIT ~A"~E~'okiii.g


                                                                Top 200 Do~~


                                                                \~tor Space
                                                                Ranking

                              m
         SOQICO-Query            IDF.TF   ~   2000 Scored Does

           2000 Does        ~ ~Scoring~


           Relevant    �
                         ~ m]Intemect    ~
           Does

                                       Query
                                       Supplementing
                                       Does

                                                         FinaiQuery~ctor


                                        Source-Query

                         Figure 23: CLARIT ~


                                      272

                                  >Median      =Median    <Median

                    lipt Average    33 [3]        3         13 [2]

                  Rels in Top 100   31 [4]        6         12 [1]

                  Rels in Top 200   33 [3]        4         12 [1]

             Note: An Average of 0.53 Total Relevants were in the "A" 2000

                    Table 1: Summary of Results for Routing (1-50)


                                  >Median      =Median    <Median

                    lipt Average    34 [7]        2         14 [0]

                  Rels in Top 100   32 [7]        3         15 [0]

                  Rels in Top 200   31 [5]        2         17 [0]

             Note: An Average of 0.39 Total Relevants were in the "A" 2000

                Table 2: Summary of Results for Ad-Hoc Queries (51-100)


   Except where otherwise indicated, all reported results and all analyses in the following
sections are based on the CLARIT "B" results.

5.1  General Summary of Results

   Tables 1 and 2 give the results of applying the techniques described above to the routing
topics, 1-50, and to the ad-hoc query topics, 51-100. The numbers in each cell give the
number of times the CLARIT-TREC system produced results above, equal to, or below the
median for all TREC-participant systems. Numbers in brackets give the instances of `extreme'
performanc~best and worst-among all systems.
   For the routing topics, the quick partitioning of documents (which produced our "A" set of
2000 candidate documents per topic out of the approximately 300,000 possible documents in
the second data set) captured 53% of all the documents judged relevant by the TREC judges.
These candidate sets were then processed by the baseline CLARIT-TREC system and ranked
results were produced. The results of this ranking were better than the median results of all
systems tested at TREC for more than 30 of the topics, according to the measures of average
precision ("lipt Average"), the number of relevants in the top 100 returned, and the number
of relevants in the top 200 returned. For three or four topics, CLARIT's results were the best
of all systems tested in TREC for routing.
   For the ad-hoc queries, the "A" sets of 2000 candidate documents per query contained 39%
of documents considered as relevant by the human judges, yet results of the discrimination
phase of CLARIT provided even better results.  Seven times over the 50 queries, CLARIT
processing produced the best ranking of all systems tested in terms of average precision and in
terms of the number of relevant documents in the first-100 documents returned.


                                       273

5.2  Official Results

Table 3 gives the official results as reported by NIST. The figures for precision at 30 docs"
show, for example, that on average, in the first-30 documents returned by the CLARIT-TREC
system, more than half of the routing and 60% of the ad-hoc query documents were relevant.
   Tables 4 and 5 present the official calculations of precision by topic, compared to the best,
median, and worst performance across all evaluated TREC-participant systems. The tables also
give the ranking of CLARIT precision relative to the best precision for each topic ("B200/Best").
   Tables 6 and 7 show the official results of CLARIT for the first-100 and full-200 documents
retrieved for each query, along with 11-pt precision scores. Each line in the table gives a topic
number ("T") followed by the total number of documents found relevant by the TREC judges
("ReP'). This is followed by the CLARIT results for the first 100 hundred documents ("B1oo")
and the global results (based on all TREC-participant systems) for the greatest ("Best"), the
average ("Med"), and the smallest ("Worst") number of documents returned in the first 100 for
each topic. This, in turn, is followed by results for the first 200 documents along with the global
best, average, and worst performance and the 11-pt average precision figure for CLARIT, along
with the best, average, and worst 11-pt performances.

5.3  CLARIT "A" I "B" Comparative Results

   Tables 8 and 9 present the official results with a focus on CLARIT-TREC differential pr~
cessing. Here "R#" gives the number of documents found relevant by the TREC judges. "T"
is the topic number, followed by "A20~", which gives the number of the relevant documents
that were present in the partition of 2000 documents created by the evoking routing thesaurus
for the topic. Since the actual identifiers of relevant documents were not reported for some
topics, there are zeroes (signifying missing data) for some A2~0 amounts. (For example, we
do not know how many relevant documents were in our partitions for Topic 22, 45, 49, etc.).
When the A2~ number is present, we can measure the effectiveness of our `discrimination'
processing-the final steps in the CLARIT-TREC process. As a measure of effectiveness in
bringing the relevant documents to the top of the final ranked list, we give the percentage of
relevant documents present in the 2000 document partition that were promoted to the first 100
returned ("% A2~"). For the routing queries, these values range from 3% for Topic 18 in which
only 3 of the 118 relevant documents available were promoted to the first 100, up to 95% for
Topic 21 and 100% for Topic 6 and 23 in which all relevant documents were promoted into the
first 100 documents returned. The average was about 42% promoted from among all the 2000
into the top 100. For the ad-hoc queries, the discrimination step was more successful, averaging
a 52% promotion rate, and promoting all the relevant topics in the partition six times out of
48 (Topic 51, 52, 70, 78, 81, and 92).
   These tables also present the results ranked according to performance, taking the average
results of all TREC systems as a baseline. The columns marked "B 100/m" and "B200/m" show
the ratio of CLARIT results to the baseline, for the first-100 and final- 200 documents returned.

6   Analysis

We are continuing to evaluate CLARIT-TREC processing results and to interpret CLARIT-
TREC performance. This section presents initial observations.

6.1  General Observations

It is extremely difficult to evaluate system performance on a task such as the TREC experiments.
First, as with many such experiments involving information retrieval, it is difficult to establish

                                   274

     Routing Queries, 1-50                          Ad-Hoc Queries, 51-100
             Num~quenes:          50                         Num4uenes:       50
  Total number of documents                       Total number of documents
            over all queries                                over all queries
                Retrieved:      10000                           Retrieved:  10000
                 Relevant:      14216                           Relevant:   16400
                  Reijet:       3427                              Rel~et:   3409

   Recall-Precision Averages:                      Recall-Precision Averages:
                   at 0.00     0.7479                             at 0.00  0.9010
                   at 0.10     0.5440                             at 0.10  0.5497
                   at 0.20     0.3641                             at 0.20  0.3609
                   at 0.30     0.2513                             at 0.30  0.1997
                   at 0.40     0.1725                             at 0.40  0.1304
                   at 0.50     0.0663                             at 0.50  0.0514
                   at 0.60     0.0336                             at 0.60  0.0423
                   at 0.70     0.0162                             at 0.70  0.0169
                   at 0.80     0.0000                             at 0.80  0.0000
                   at 0.90     0.0000                             at 0.90  0.0000
                   at 1.00     0.0000                             at 1.00  0.0000

Average precision for all points
                11-pt Avg:  0.1996

      Average precision for 3
         intermediate points
           (0.20, 0.50, 0.80)
                3-pt Avg:   0.1435

Average precision for all points
                11-pt Avg:  0.2048

      Average precision for 3
         intermediate points
           (0.20, 0.50, 0.80)
                3-pt Avg:   0.1374

                    Recall:                                    Recall:
                 at 5 docs: 0.0130                           at S docs:   0.0179
                at 15 docs: 0.0431                          at 15 docs:   0.0455
                at 30 docs: 0.0802                          at 30 docs:   0.0804
              at 100 docs:  0.2080                          at 100 docs:  0.1834
              at 200 docs:  0.2917                          at 200 docs:  0.2567

                 Precision:                                  Precision:
                At 5 docs:  0.5120                          At 5 docs:    0.7560
                At 15 docs: 0.5320                          At 15 docs:   0.6533
                At 30 docs: 0.5140                          At 30 docs:   0.6020
              At 100 docs:  0.4560                          At 100 docs:  0.4570
              At 200 docs:  0.3427                          At 200 docs:  0.3409

         Table 3: Offical Results, CLARIT "B"-Topics 1-50 and 51-100


                                      275

                  Results by Topic                      Results Ranked by Best Precision
Topic     B200     Best  Median    Worst    B200/Best   Topic     B200    Best      B200/Best
    1    0.0991  0.1897   0.0640   6.0000       0.5224     40   0.1818   0.1818        1.0000
    2    0.0466  0.1470   0.0631   0.0021       0.3170     22   0.3327   0.3327        1.0000
    3    0.2004  0.3374   0.1515   0.0182       0.5940     14   0.2908   0.2908        1.0000
    4    0.3074  0.3088   0.1457   0.0028       0.9955      4   0.3074   0.3088        0.9955
    5    0.0862  0.5286   0.1585   0.0687       0.1631     36   0.5526   0.5694        0.9705
    6    0.3544  0.3879   0.1371   0.0264       0.9136     37   0.4349   0.4506        0.9652
    7    0.1540  0.2886   0.1806   0.0010       0.5336     42   0.1735   0.1806        0.9607
    8    0.1008  0.1225   0.0325   0.0020       0.8229      6   0.3544   0.3879        0.9136
    9    0.0759  0.1765   0.1188   0.0040       0.4300     34   0.3345   0.3776        0.8859
   10    0.4889  0.5764   0.3789   0.0455       0.8482     21   0.6065   0.6889        0.8804
   11    0.2154  0.3005   0.0849   0.0063       0.7168     10   0.4889   0.5764        0.8482
   12    0.1249  0.2494   0.1146   0.0022       0.5008     26   0.2459   0.2928        0.8398
   13    0.3298  0.9058   0.2847   0.0061       0.3641      8   0.1008   0.1225        0.8229
   14    0.2908  0.2908   0.1122   0.0000       1.0000     49   0.2005   0.2438        0.8224
   15    0.0709  0.1591   0.1016   0.0202       0.4456     27   0.2625   0.3326        0.7892
   16    0.1930  0.2763   0.0641   0.0015       0.6985     20   0.3436   0.4361        0.7879
   17    0.2469  0.4791   0.2436   0.0000       0.5153     45   0.2099   0.2722        0.7711
   18    0.0146  0.2158   0.1044   0.0146       0.0677     19   0.1378   0.1801        0.7651
   19    0.1378  0.1801   0.1378   0.0531       0.7651     24   0.1937   0.2552        0.7590
   20    0.3436  0.4361   0.2307   0.0083       0.7879     35   0.3626   0.4800        0.7554
   21    0.6065  0.6889   0.5016   0.0000       0.8804     11   0.2154   0.3005        0.7168
   22    0.3327  0.3327   0.2262   0.0303       1.0000     16   0.1930   0.2763        0.6985
   23    0.4088  0.6626   0.2672   0.0101       0.6170     38   0.1782   0.2619        0.6804
   24    0.1937  0.2552   0.0952   0.0440       0.7590     39   0.2271   0.3597        0.6314
   25    0.1611  0.2726   0.1204   0.0000       0.5910     23   0.4088   0.6626        0.6170
   26    0.2459  0.2928   0.1338   0.0227       0.8398     46   0.1725   0.2824        0.6108
   27    0.2625  0.3326   0.2388   0.0182       0.7892      3   0.2004   0.3374        0.5940
   28    0.1088  0.2095   0.1210   0.0210       0.5193     25   0.1611   0.2726        0.5910
   29    0.0909  0.3677   0.0455   0.0000       0.2472     32   0.1768   0.3157        0.5600
   30    0.0578  0.2613   0.1374   0.0108       0.2212      7   0.1540   0.2886        0.5336
   31    0.0422  0.2235   0.0668   0.0000       0.1888      1   0.0991   0.1897        0.5224
   32    0.1768  0.3157   0.0909   0.0000       0.5600     28   0.1088   0.2095        0.5193
   33    0.1450  0.3147   0.1411   0.0152       0.4608     17   0.2469   0.4791        0.5153
   34    0.3345  0.3776   0.2380   0.0130       0.8859     12   0.1249   0.2494        0.5008
   35    0.3626  0.4800   0.3041   0.0000       0.7554     33   0.1450   0.3147        0.4608
   36    0.5526  0.5694   0.2263   0.0000       0.9705     15   0.0709   0.1591        0.4456
   37    0.4349  0.4506   0.3483   0.0088       0.9652      9   0.0759   0.1765        0.4300
   38    0.1782  0.2619   0.1536   0.0227       0.6804     44   0.1540   0.3761        0.4095
   39    0.2271  0.3597   0.2271   0.0182       0.6314     13   0.3298   0.9058        0.3641
   40    0.1818  0.1818   0.1554   0.0267       1.0000      2   0.0466   0.1470        0.3170
   41    0.0182  0.1360   0.0182   0.0000       0.1338     29   0.0909   0.3677        0.2472
   42    0.1735  0.1806   0.1632   0.0427       0.9607     30   0.0578   0.2613        0.2212
   43     NA       NA       NA      NA           NA        31   0.0422   0.2235        0.1888
   44    0.1540  0.3761   0.0909   0.0029       0.4095     48   0.0303   0.1700        0.1782
   45    0.2099  0.2722   0.1823   0.0000       0.7711      5   0.0862   0.5286        0.1631
   46    0.1725  0.2824   0.1855   0.0199       0.6108     47   0.0373   0.2569        0.1452
   47    0.0373  0.2569   0.0932   0.0031       0.1452     41   0.0182   0.1360        0.1338
   48    0.0303  0.1700   0.0492   0.0015       0.1782     18   0.0146   0.2158        0.0677
   49    0.2005  0.2438   0.1682   0.0130       0.8224     50   0.0000   0.2374        0.0000
   50    0.0000  0.2374   0.0039   0.0000       0.0000     43     NA      NA             NA

             Table 4: Official Results, CLARIT "B"-11-Point P-R, Topics 1-50


                                             276

                 Results by Topic                    Results Ranked by Best Precision
Topic     B200    Best  Median   Worst   B200/Best   Topic     B200    Best    B200/Best
  51    0.1722  0.7088   0.3101  0.0000     0.2429      94   0.1950  0.1950       1.0000
  52    0.0909  0.3625   0.2526  0.0015     0.2508      90   0.2709  0.2709       1.0000
  53    0.1975  0.2282   0.1350  0.0216     0.8655      86   0.5920  0.5920       1.0000
  54    0.3345  0.4840   0.3139  0.0455     0.6911      79   0.3545  0.3545       1.0000
  55    0.2577  0.2615   0.1718  0.0312     0.9855      76   0.2158  0.2158       1.0000
  56    0.2627  0.2708   0.1788  0.1332     0.9701      68   0.3482  0.3482       1.0000
  57    0.1987  0.3151   0.1230  0.0202     0.6306     100   0.4166  0.4166       1.0000
  58    0.1770  0.5602   0.2285  0.0211     0.3160      92   0.0909  0.0913       0.9956
  59    0.0992  0.2331   0.1229  0.0040     0.4256      55   0.2577  0.2615       0.9855
  60    0.1584  0.2181   0.0799  0.0041     0.7263      56   0.2627  0.2708       0.9701
  61    0.3252  0.4957   0.3175  0.0909     0.6560      85   0.1625  0.1691       0.9610
  62    0.1671  0.2261   0.1352  0.0138     0.7391      67   0.2148  0.2313       0.9287
  63    0.1299  0.2737   0.1086  0.0029     0.4746      98   0.1408  0.1538       0.9155
  64    0.1279  0.2088   0.1279  0.0055     0.6125      93   0.5258  0.5879       0.8944
  65    0.1184  0.2023   0.1184  0.0070     0.5853      96   0.1500  0.1717       0.8736
  66    0.0714  0.3977   0.1324  0.0011     0.1795      53   0.1975  0.2282       0.8655
  67    0.2148  0.2313   0.0909  0.0126     0.9287      80   0.1225  0.1434       0.8543
  68    0.3482  0.3482   0.1143  0.0035     1.0000      99   0.2768  0.3324       0.8327
  69    0.2408  0.6548   0.1478  0.0051     0.3677      70   0.6451  0.7798       0.8273
  70    0.6451  0.7798   0.4824  0.0023     0.8273      72   0.1673  0.2073       0.8070
  71    0.0455  0.2239   0.0909  0.0000     0.2032      62   0.1671  0.2261       0.7391
  72    0.1673  0.2073   0.1144  0.0012     0.8070      60   0.1584  0.2181       0.7263
  73    0.1343  0.1870   0.0303  0.0000     0.7182      73   0.1343  0.1870       0.7182
  74    0.0606  0.1493   0.0909  0.0076     0.4059      54   0.3345  0.4840       0.6911
  75    0.0909  0.1573   0.0331  0.0027     0.5779      83   0.1586  0.2308       0.6872
  76    0.2158  0.2158   0.1604  0.0107     1.0000      82   0.1804  0.2651       0.6805
  77    0.3503  0.5503   0.2380  0.0051     0.6366      61   0.3252  0.4957       0.6560
  78    0.3319  0.6278   0.3830  0.0006     0.5287      87   0.2262  0.3490       0.6481
  79    0.3545  0.3545   0.1763  0.0354     1.0000      77   0.3503  ~5503        0.6366
  80    0.1225  0.1434   0.1082  0.0136     0.8543      57   0.1987  0.3151       0.6306
  81    0.0909  0.2873   0.1409  0.0036     0.3164      64   0.1279  0.2088       0.6125
  82    0.1804  0.2651   0.2284  0.0883     0.6805      65   0.1184  0.2023       0.5853
  83    0.1586  0.2308   0.1601  0.0262     0.6872      75   0.0909  0.1573       0.5779
  84    0.0147  0.1150   0.0455  0.0000     0.1278      88   0.1491  0.2670       0.5584
  .85   0.1625  0.1691   0.1477  0.0147     0.9610      78   0.3319  0.6278       0.5287
  86    0.5920  0.5920   0.1348  0.0038     1.0000      63   0.1299  0.2737       0.4746
  87    0.2262  0.3490   0.0314  0.0005     0.6481      89   0.1997  0.4286       0.4659
  88    0.1491  0.2670   0.1410  0.0061     0.5584      59   0.0992  0.2331       0.4256
  89    0.1997  0.4286   0.0909  0.0043     0.4659      74   0.0606  0.1493       0.4059
  90    0.2709  0.2709   0.1223  0.0182     1.0000      69   0.2408  0.6548       0.3677
  91    0.0490  0.2394   0.0455  0.0000     0.2047      97   0.0455  0.1306       0.3484
  92    0.0909  0.0913   0.0091  0.0000     0.9956      95   0.0909  0.2743       0.3314
  93    0.5258  0.5879   0.4020  0.0028     0.8944      81   0.0909  0.2873       0.3164
  94    0.1950  0.1950   0.0909  0.0101     1.0000      58   0.1770  0.5602       0.3160
  95    0.0909  0.2743   0.1012  0.0083     0.3314      52   0.0909  0.3625       0.2508
  96    0.1500  0.1717   0.1262  0.0045     0.8736      51   0.1722  0.7088       0.2429
  97    0.0455  0.1306   0.0753  0.0035     0.3484      91   0.0490  0.2394       0.2047
  98    0.1408  0.1538   0.1243  0.0341     0.9155      71   0.0455  0.2239       0.2032
  99    0.2768  0.3324   0.2393  0.1157     0.8327      66   0.0714  0.3977       0.1795
  100   0.4166  0.4166   0.2500  0.0455     1.0000      84   0.0147  0.1150       0.1278

          Table 5: Official Results, CLARIT "B"-11-Point P-R, Topics 51-100


                                          277

T    Rel   B100    Best Med   Worst    B200   Best  Med   Worst    11-pt   Best     Med     Worst
 1   216     29    42    25      0       45   62    30      0     0.0991  0.1897   0.0640   0.0000
 2   384     24    50    24      1       51   72    43      1     0.0466  0.1470   0.0631   0.0021
 3   431     61    91    58      6       95   167   84      8     0.2004  0.3374   0.1515   0.0182
 4    48     28    29    14      2       31   33    18      2     0.3074  0.3088   0.1457   0.0028
 S   150     20    67    27      6       35   116   38      10    0.0862  0.5286   0.1585   0.0687
 6   137     51    57    32      13      78   78    45      15    0.3544  0.3879   0.1371   0.0264
 7   200     44    55    41      0       55   87    63      1     0.1540  0.2886   0.1806   0.0010
 8   159     11    28    11      2       18   43    18      3     0.1008  0.1225   0.0325   0.0020
 9   638     37    81    58      2       78   117   87      8     0.0759  0.1765   0.1188   0.0040
10   233     87    88    76      10     121   153   110     15    0.4889  0.5764   0.3789   0.0455
11   196     41    57    29      3       67   89    52      7     0.2154  0.3005   0.0849   0.0063
12   262     42    61    36      1       54   103   54      4     0.1249  0.2494   0.1146   0.0022
13   112     46    99    36      2       53   111   46      5     0.3298  0.9058   0.2847   0.0061
14   203     55    55    28      0       85   85    48      0     0.2908  0.2908   0.1122   0.0000
15   624     41    75    50      14      62   114   80      17    0.0709  0.1591   0.1016   0.0202
16    88     25    34    19      0       27   44    24      1     0.1930  0.2763   0.0641   0.0015
17   303     69    88    69      0       87   154   81      0     0.2469  0.4791   0.2436   0.0000
18   147      3    36    19      1       16   61    31      2     0.0146  0.2158   0.1044   0.0146
19   985     67    98    74      48     102   161   102     74    0.1378  0.1801   0.1378   0.0531
20   403     92    96    74      5      151   178   124     5     0.3436  0.4361   0.2307   0.0083
21    47     36    41    31      0       37   44    35      0     0.6065  0.6889   0.5016   0.0000
22   466     96    96    78      10     140   162   120     14    0.3327  0.3327   0.2262   0.0303
23   100     48    63    37      3       48   74    41      5     0.4088  0.6626   0.2672   0.0101
24   345     57    69    38      11      86   113   59      11    0.1937  0.2552   0.0952   0.0440
25    36     20    24    11      0       20   34    14      0     0.1611  0.2726   0.1204   0.0000
26   313     65    75    42      1      111   122   49      1     0.2459  0.2928   0.1338   0.0227
27   232     55    63    49      6      109   109   91      10    0.2625  0.3326   0.2388   0.0182
28   332     18    56    36      13      39   89    47      14    0.1088  0.2095   0.1210   0.0210
29   142      7    61    10      0       10   79    13      0     0.0909  0.3677   0.0455   0.0000
30   269     28    64    37      9       44   92    48      17    0.0578  0.2613   0.1374   0.0108
31   156     19    42    22      0       31   66    31      0     0.0422  0.2235   0.0668   0.0000
32   119     27    41    13      0       36   52    15      0     0.1768  0.3157   0.0909   0.0000
33   462     55    83    55      10      83   147   71      17    0.1450  0.3147   0.1411   0.0152
34   303     66    80    60      6      125   129   104     6     0.3345  0.3776   0.2380   0.0130
35   270     74    86    70      0      117   139   113     0     0.3626  0.4800   0.3041   0.0000
36   158     77    82    41      0      103   110   50      0     0.5526  0.5694   0.2263   0.0000
37   409    100    100   92      6      170   189   158     19    0.4349  0.4506   0.3483   0.0088
38   810     96    100   70      25     133   169   120     37    0.1782  0.2619   0.1536   0.0227
39   501     76    100   76      14     123   184   117     24    0.2271  0.3597   0.2271   0.0182
40   800     99    99    85      16     147   150   121     16    0.1818  0.1818   0.1554   0.0267
41   144      8    25    8       0       10   34    11      0     0.0182  0.1360   0.0182   0.0000
42   696     90    96    79      10     125   131   92      10    0.1735  0.1806   0.1632   0.0427
43     0      0    0     0       0        0    0     0      0     0.0000  0.0000   0.0000   0.0000
44   241     38    63    29      1       43   105   35      1     0.1540  0.3761   0.0909   0.0029
45   304     63    72    52      0       79   103   71      0     0.2099  0.2722   0.1823   0.0000
46    51     22    28    21      6       38   40    31      9     0.1725  0.2824   0.1855   0.0199
47   237     16    59    19      1       32   80    35      2     0.0373  0.2569   0.0932   0.0031
48   189     11    35    17      I       16   48    28      2     0.0303  0.1700   0.0492   0.0015
49   139     40    46    34      4       61   65    56      4     0.2005  0.2438   0.1682   0.0130
50    26      0    12    1       0        0   12     1      0     0.0000  0.2374   0.0039   0.0000

                       Table 6: Official Results, CLARIT "B", Topics 1-50


                                               278

  T   Rel   B100   Best  Med   Worst    B200  Best   Med   Worst    11-pt    Best    Med      Worst
 51   138    24    93    50      0       24   105    77       0    0.1722  0.7088   0.3101   0.0000
 52   535    39    100   89      1       39   191    157      3    0.0909  0.3625   0.2526   0.0015
 53   571    57    84    47     12       115  116    98      42    0.1975  0.2282   0.1350   0.0216
 54   171    53    71    52      7       84   107    81      14    0.3345  0.4840   0.3139   0.0455
 55   810    100   100   87     30       166  178    156     62    0.2577  0.2615   0.1718   0.0312
 56   878    100   100   95     50       178  194    160     102   0.2627  0.2708   0.1788   0.1332
 57   461    63    83    42     12       100  151    67      12    0.1987  0.3151   0.1230   0.0202
 58   159    31    73    42      9       52   117    60      16    0.1770  0.5602   0.2285   0.0211
 59   579    27    81    61      2       58   132    99       4    0.0992  0.2331   0.1229   0.0040
 60   60     14    19     9      2       16   25     15       3    0.1584  0.2181   0.0799   0.0041
 61   206    56    81    57     10       92   113    85      10    0.3252  0.4957   0.3175   0.0909
 62   298    39    57    41     9        69   92     70      29    0.1671  0.2261   0.1352   0.0138
 63   208    24    51    22      3       27   71     27       3    0.1299  0.2737   0.1086   0.0029
 64   375    44    66    44      2       83   108    79       2    0.1279  0.2088   0.1279   0.0055
 65   386    32    65    39     4        47   111    55       7    0.1184  0.2023   0.1184   0.0070
 66   197    17    66    32     0        20   100    40       2    0.0714  0.3977   0.1324   0.0011
 67   534    71    83    47     7       122   127    76      27    0.2148  0.2313   0.0909   0.0126
 68   195    64    64    20     2        95   95     27      7     0.3482  0.3482   0.1143   0.0035
 69   52     22    48    14     2        29   50     17       5    0.2408  0.6548   0.1478   0.0051
 70   55     43    52    36      1       43   54     37       3    0.6451  0.7798   0.4824   0.0023
 71   380    17    66    38     0        29   109    59       0    0.0455  0.2239   0.0909   0.0000
 72   119    27    34    20     0        35   47     33      2     0.1673  0.2073   0.1144   0.0012
 73   183    27    32    13     0        39   56     20      0     0.1343  0.1870   0.0303   0.0000
 74   499    28    63    28     6        37   90     37      8     0.0606  0.1493   0.0909   0.0076
 75   365    20    34    17     3        30   73     28      3     0.0909  0.1573   0.0331   0.0027
 76   294    56    62    42     4        79   84     62      10    0.2158  0.2158   0.1604   0.0107
 77   139    57    68    39     4        62   99     52      6     0.3503  0.5503   0.2380   0.0051
 78   162    54    81    56     0        54   128    86      1     0.3319  e.6278   0.3830   0.0006
 79   232    68    68    39     12      104   104    55      12    0.3545  0.3545   0.1763   0.0354
 80   374    30    50    29     10       69   94     43      21    0.1225  0.1434   0.1082   0.0136
 81   62      4    27    16     1         4   40     25      1     0.0909  0.2873   0.1409   0.0036
 82   602    75    95    79     51       86   177    128     55    0.1804  0.2651   0.2284   0.0883
 83   633    70    84    68     23      106   136    121     28    0.1586  0.2308   0.1601   0.0262
 84   396    14    41    14     0        14   70     21      0     0.0147  0.1150   0.0455   0.0000
 85   896    84    85    68     7       115   155    116     27    0.1625  0.1691   0.1477   0.0147
 86   214    97    97    40     3       130   130    63      6     0.5920  0.5920   0.1348   0.0038
 87   188    51    61     7     0        75   80     16      1     0.2262  0.3490   0.0314   0.0005
 88   166    37    48    28     3        61   75     50      8     0.1491  0.2670   0.1410   0.0061
 89   175    34    69    20     1        46   105    25      4     0.1997  0.4286   0.0909   0.0043
 90   266    64    66    35     12      100   100    53      26    0.2709  0.2709   0.1223   0.0182
 91   40      3    25    5      0         5   30      9      0     0.0490  0.2394   0.0455   0.0000
 92   88      6    18    3      0         6   21     5       0     0.0909  0.0913   0.0091   0.0000
 93   171    77    77    62     2       111   130    90      4     0.5258  0.5879   0.4020   0.0028
 94   310    53    63    24     8        72   89     31      14    0.1950  0.1950   0.0909   0.0101
 95   263    10    64    19     3        15   84     32      5     0.0909  0.2743   0.1012   0.0083
 96   693    79    80    49     1       104   133    82      1     0.1500  0.1717   0.1262   0.0045
 97   352    11    39    18     2        12   61     31      4     0.0455  0.1306   0.0753   0.0035
 98   666    57    67    53     25       95   109    80      27    0.1408  0.1538   0.1243   0.0341
 99   288    68    68    54     30      106   129    98      46    0.2768  0.3324   0.2393   0.1157
100   316    87    88    66     11      149   149    89      12    0.4166  0.4166   0.2500   0.0455

                     Table 7: Official Results, CLARIT "B",   Topics 51-100


                                              279

R#    T   A2000  % A2000  B100    b   m   w  B100/m   T   A2000  % A2000   B200    b    m   w  B200/m
119   32     82   0.33      27    41  13  0    2.08   32     82    0.44      36   52    15  0    2.40
 48   4      32   0.88      28    29  14  2    2.00   26    161    0.69     111  122    49  1    2.27
203   14    148   0.37      55    55  28  0    1.96   36       0   NA       103  110    50  0    2.06
158   36      0   NA        77    82  41  0    1.88   14    148    0.57      85   85    48  0    1.77
 36   25     20   1.00      20    24  11  0    1.82   6      91    0.86      78   78    45  15   1.73
137   6      91   0.56      51    57  32  13   1.59   4      32    0.97      31   33    18  2    1.72
313   26    161   0.40      65    75  42  1    1.55   1      92    0.49      45   62    30  0    1.50
345   24    193   0.30      57    69  38  11   1.50   24    193    0.45      86  113    59  11   1.46
196   11    112   0.37      41    57  29  3    1.41   25     20    1.00      20   34    14  0    1.43
810   38    449   0.21      96   100  70  25   1.37   42    327    0.38     125  131    92  10   1.36
 88   16     30   0.83      25    34  19  0    1.32   11    112    0.60      67   89    52  7    1.29
241   44     47   0.81      38    63  29  1    1.31   44     47    0.91      43  105    35  1    1.23
100   23     48   1.00      48    63  37  3    1.30   46       0   NA        38   40    31  9    1.23
112   13     53   0.87      46    99  36  2    1.28   20    195    0.77     151  178   124  5    1.22
403   20    195   0.47      92    96  74  5    1.24   40    467    0.31     147  150   121  16   1.21
466   22      0   NA        96    96  78  10   1.23   34       0   NA       125  129   104  6    1.20
304   45      0   NA        63    72  52  0    1.21   27    148    0.74     109  109    91  10   1.20
139   49      0   NA        40    46  34  4    1.18   2     120    0.42      51   72    43  1    1.19
262   12     86   0.49      42    61  36  1    1.17   22       0   NA       140  162   120  14   1.17
800   40    467   0.21      99    99  85  16   1.16   33    170    0.49      83  147    71  17   1.17
216    1     92   0.32      29    42  25  0    1.16   23     48    1.00      48   74    41  5    1.17
 47   21     38   0.95      36    41  31  0    1.16   13     53    1.00      53  111    46  5    1.15
696   42    327   0.28      90    96  79  10   1.14   3     189    0.50      95  167    84  8    1.13
233   10    167   0.52      87    88  76  10   1.14   16     30    0.90      27   44    24  1    1.12
232   27    148   0.37      55    63  49  6    1.12   38    449    0.30     133  169   120  37   1.11
303   34      0   NA        66    80  60  6    1.10   45       0   NA        79  103    71  0    1.11
409   37      0   NA       100   100  92  6    1.09   10    167    0.72     121  153   110  15   1.10
200    7     56   0.79      44    55  41  0    1.07   49       0   NA        61   65    56  4    1.09
270   35    241   0.31      74    86  70  0    1.06   37       0   NA       170  189   158  19   1.08
431   3     189   0.32      61    91  58  6    1.05   17    123    0.71      87  154    81  0    1.07
 51   46      0   NA        22    28  21  6    1.05   21     38    0.97      37   44    35  0    1.06
501   39    338   0.22      76   100  76  14   1.00   39    338    0.36     123  184   117  24   1.05
462   33    170   0.32      55    83  55  10   1.00   35    241    0.49     117  139   113  0    1.04
384    2    120   0.20      24    50  24  1    1.00   19    538    0.19     102  161   102  74   1.00
303   17    123   0.56      69    88  69  0    1.00   12     86    0.63      54  103    54  4    1.00
159   8      47   0.23      11    28  11  2    1.00   8      47    0.38      18   43    18  3    1.00
144   41     16   0.50        8   25   8  0    1.00   31    118    0.26      31   66    31  0    1.00
985   19    538   0.12      67    98  74  48   0.91   30    172    0.26      44   92    48  17   0.92
156   31    118   0.16      19    42  22  0    0.86   5      96    0.36      35  116    38  10   0.92
237   47      0   NA        16    59  19  1    0.84   47       0   NA        32   80    35  2    0.91
624   15    312   0.13      41    75  50  14   0.82   41     16    0.62      10   34    11  0    0.91
269   30    172   0.16      28    64  37  9    0.76   9     313    0.25      78  117    87  8    0.90
150    5     96   0.21      20    67  27  6    0.74    7     56    0.98      55   87    63   1   0.87
142   29      0   NA          7   61  10  0    0.70   28    139    0.28      39   89    47  14   0.83
189   48      0   NA        11    35  17  1    0.65   15    312    0.20      62  114    80  17   0.78
638    9    313   0.12      37    81  58   2   0.64   29       0   NA        10   79    13  0    0.77
332   28    139   0.13      18    56  36  13   0.50   48       0   NA        16   48    28   2   0.57
147   18    118   0.03        3   36  19   1   0.16   18    118    0.14      16   61    31  2    0.52
 26   50      0   NA          0   12   1  0    0.00   50       0   NA         0   12    1   0    0.00
 0    43      0   NA          0   0    0  0    NA     43       0   NA         0    0    0   0    NA


         Table 8: General Results, CLARIT "A"/"B", Topics 1-50, Ranked by Performance


                                            280

R#     T   A2000 % A2000   B100    b   m    w  B100lm        A2000   % A2000  B200    b   m     w  1320o1m
188    87    81    0.63      51    61  7    0   7.29     87     81     0.93     75   80   16    1    4.69
195    68      0   NA        64    64  20   2   3.20     68       0    NA       95   95   27    7    3.52
214    86      0   NA        97    97  40   3   2.42     94     78     0.92     72   89   31    14   2.32
310    94    78    0.68      53    63  24   8   2.21     86       0    NA       130  130  63    6    2.06
183    73      0   NA        27    32  13   0   2.08     73       0    NA       39   56   20    0    1.95
 88    92      6   1.00        6   18  3    0   2.00     90    173     0.58     100  100  53    26   1.89
266    90   173    0.37      64    66  35   12  1.83     79    138     0.75     104  104  55    12   1.89
232    79   138    0.49      68    68  39   12  1.74     89     92     0.50     46   105  25    4    1.84
175    89    92    0.37      34    69  20   1   1.70     69     35     0.83     29   50   17    5    1.71
693    96   144    0.55      79    80  49   1   1.61    100       0    NA       149  149  89    12   1.67
 52    69    35    0�3       22    48  14   2   1.57     67    308     0.40     122  127  76    27   1.61
 60    60    24    0.58      14    19  9    2   1.56     80    179     0.39     69   94   43    21   1.60
534    67   308    0.23      71    83  47   7   1.51     57    148     0.68     100  151  67    12   1.49
461    57   148    0.43      63    83  42   12  1.50     96    144     0.72     104  133  82    1    1.27
139    77    63    0.90      57    68  39   4   1.46     76     99     0.80     79   84   62    10   1.27
119    72      0   NA        27    34  20   0   1.35     93    134     0.83     111  130  90    4    1.23
294    76    99    0.57      56    62  42   4   1.33     88    155     0.39     61   75   50    8    1.22
316   100      0   NA        87    88  66   11  1.32     92       6    1.00       6  21    5    0    1.20
166    88   155    0.24      37    48  28   3   1.32     98    168     0.57     95   109  80    27   1.19
288    99   266    0.26      68    68  54   30  1.26     77     63     0.98     62   99   52    6    1.19
896    85   184    0A6       84    85  68   7   1.24     53    179     0.64     115  116  98    42   1.17
171    93   134    0.57      77    77  62   2   1.24     70     43     1.00     43   54   37    3    1.16
571    53   179    0.32      57    84  47   12  1.21     56       0    NA       178  194 160  102    1.11
 55    70    43    1.00      43    52  36   1   1.19     99    266     0.40     106  129  98    46   1.08
365    75    43    0.47      20    34  17   3   1.18     61    168     0.55     92   113  85    10   1.08
810    55   454    0.22     100   100  87   30  1.15     75     43     0.70     30   73   28    3    1.07
208    63    37    0.65      24    51  22   3   1.09     60     24     0.67     16   25   15    3    1.07
666    98   168    0.34      57    67  53   25  1.08     55    454     0.37     166  178 156    62   1.06
878    56      0   NA       100   100  95   50  1.05     72       0    NA       35   47   33     2   1.06
633    83   153    0.46      70    84  68   23  1.03     64    254     0.33     83   108  79     2   1.05
374    80   179    0.17      30    50  29   10  1.03     54    115     0.73     84   107  81    14   1.04
171    54   115    0.46      53    71  52   7   1.02     74    121     0.31     37   90   37    8    1.00
499    74   121    0.23      28    63  28   6   1.00     63     37     0.73     27   71   27    3    1.00
396    84      0   NA        14    41  14   0   1.00     85    184     0.62     115  155 116    27   0.99
375    64   254    0.17      44    66  44   2   1.00     62    123     0.56     69   92   70    29   0.99
206    61   168    0.33      56    81  57   10  0.98     83    153     0.69     106  136 121    28   0.88
162    78    54    1.00      54    81  56   0   0.96     58     91     0.57     52   117  60    16   0.87
602    82    95    0.79      75    95  79   51  0.95     65       0    NA       47   111  55     7   0.85
298    62   123    0.32      39    57  41   9   0.95     82     95     0.91     86   177 128    55   0.67
386    65      0   NA        32    65  39   4   0.82     84       0    NA       14   70   21     0   0.67
159    58    91    0.34      31    73  42   9   0.74     78     54     1.00     54   128  86     1   0.63
352    97    19    0.58      11    39  18   2   0.61     59    131     0.44     58   132  99     4   0.59
 40    91      5   0�0         3   25  5    0   0.60     91       5    1.00       5  30    9     0   0.56
263    95    23    0.43      10    64  19   3   0.53     66     23     0.87     20   100  40     2   0.50
197    66    23    0.74      17    66  32   0   0.53     71    171     0.17     29   109  59     0   0.49
138    51    24    1.00      24    93  50   0   0.48     95     23     0.65     15   84   32     5   0.47
380    71   171    0.10      17    6638     0   0.45     97     19     0.63     12   61   31     4   0.39
579    59   131    0.21      27    81  61   2   0.44     51     24     1.00     24   105  77     0   0.31
535    52    39    1.00      39   100  89   1   0.44     52     39     1.00     39   191 157     3   0.25
 62    81      4   1.00        4   27  16   1   0.25     81       4    1.00       4  40   25     1   0.16


       Table 9: General Results, CLARIT "A"/"B", Topics 51-100, Ranked by Performance


                                               281

a `gold standard' for the task-authoritative and comprehensive knowledge of the `correct'
responses to a query. A gold standard is difficult to establish in general and is genuinely
problematic in the case of the TREC experiments because of the sheer size of the corpus.
Second, for the CLARIT-TREC effort in particular, many errors resulted from simple mistakes
(i.e., human errors) made in the course of processing. It is difficult to isolate such incidental
errors from actual flaws in the design and performance of the CLARIT-TREC system.
   In the following sections, we offer thoughts about the `official' performance of the CLARIT-
TREC system, several hypotheses about sources of failure, and a list of known problems in the
design and application of the CLARIT-TREC system to the TREC tasks.

6.2  Observations About CLARIT-TREC Performance

An evaluation of CLARIT-TREC performance must certainly begin with the comparison of
CLARIT-TREC results to the NIST-identified `correct' results. Such results are reflected in,
but are not restricted to, recall-precision statistics, e.g., as given in Table 3 and the comparative
results in Tables 4 through 7. We must bear in mind, however, that recall-precision statistics
grossly under-simplify the analysis of a system's performance in a retrieval task.
   CLARIT recall-precision curves demonstrate very high precision at low percentages of recall.
The first few documents returned by the system are extremely likely to be relevant for the given
query. Such a result is encouraging, and suggests straightforward methods to improve the recall
rates of the system. A simple, automatic iteration of the query, augmented with the top few
relevant documents, should extend the `net' of retrieved relevant documents, as has been well
demonstrated in past IR experiments.17
   Such high precision at low recall tends to validate several hypotheses about performance
characteristics of the CLARIT system. A priori, we expect that one of the benefits of accurate
and appropriate NLP in information retrieval is an improved ability to discriminate among
similar documents. Furthermore, increased precision is an expected result of our `evoke-and-
discriminate' system design. Because only a small subset of candidate relevant documents was
considered in the discrimination phase of CLARIT-TREC processing, the distinctions among
the documents could be highlighted through more `expensive' processing of the smaller topical
partitions. We were able to use a vector-space model with a large number of dimensions (all
multi-word terms and individual words) relative to the number of documents under considera-
tion.
   The CLARIT-TREC results are clearly competitive with other state-of-the-art information
retrieval systems. As indicated in Tables 1 and 2 and 4 through 7, CLARIT performance relative
to other TREC-participant systems is quite good. CLARIT performs consistently above the
median and often at or near the top of the group. There are relatively few cases where CLARIT
performance is the worst; for the ad-hoc queries, CLARIT does not perform minimally on any
of the topics.

6.3  Hypotheses About Failure

Comparison of CLARIT "A" recall rates against the full results of CLARIT "B" (Tables 8 and 9)
helps to isolate some sources of failure and possible flaws in CLARIT processing. CLARIT "B"
processing is confined to the restricted document set identified by the partitioning procedure;
it is impossible for the final results to demonstrate recall rates better than the number of
documents present in the 2000 document partition. In some cases, many of the actually relevant
documents simply were not available in the partition.  As noted previously, on average, at

 17Cf. [Salton & McGill 1983J for discussion, for example.


                                       282

                             Feature: Exclusions
                               Total Yes   No
                                71    24   47

                               Good   12   25

                               Bad    12   22


                           Feature: Generalizations
                               Total Yes   No
                                71    30   41

                               Good   15   22

                               Bad    15   19


                         Feature: Te~~ LI )oral Constraints

                               Total Yes   No
                                71    11   60

                               Good   7    30

                               Bad    4    30


                Note: None of the above shows significant correlation!

                      Table 10: Topic Features x Performance


least half of the relevant documents were missing in both the ad-hoc and routing experiments.
Relative to the available relevant documents, the vector-space discrimination phase of CLARIT-
TREC processing demonstrated fairly accurate retrieval. Refinements in the techniques used
to nominate an initial partition, therefore, become a natural focus for efforts to improve overall
system performance.
  In some cases, CLARIT processing missed relatively large numbers of correct documents.
While it is difficult to imagine a single-strategy IR system that would perform optimally on
all types of queries, it is important to understand why certain queries may cause failures for
a given system. To this end, we conducted several experiments to attempt to identify features
in the queries that might have caused sub-optimal performance. We examined all of the topic
statements to determine if the presence or absence of certain features is correlated with `good'
or `bad' recall rates for that topic. ("Good" here is defined as being above the median, "bad"
as being below the median recall rate.) In particular, we tested three hypotheses:

  1. The presence of exclusions in the topic causes poor retrieval. "Exclusions" include any
    statements that specifically identify concepts or interpretations related to the topic that
    are not to be considered relevant for the purposes of retrieval. For example, one query
    asked for information on computer crimes, but specifically excluded computer viruses.
    In CLARIT~TREC processing, no attempt was made to accommodate such exclusions
    except that specifically negated phrases were deleted from the query term list during
    initial manual evaluation and term weighting.


                                    283

   2. The presence of generalizations in the topic causes poor retrieval. For example, one query
     asked for information on currently proposed acquisitions involving a U.S. company and
     a foreign company. Without a list of U.S. and foreign companies, one cannot accurately
     evaluate candidate documents about acquisitions.

   3. The presence of temporal constraints in the topic causes poor retrieval. For example,
     one query asked for the location of presidential candidates during a certain time period.
     Documents describing events outside of that time period were not considered relevant.

   Table 10 gives the instances of `good' and `bad' performance relative to the presence or
absence of the specific features in topic statements.  There is no correlation. The sources
of difficulty that the CLARIT-TREC system experienced cannot be traced to such simple
characteristics of queries. (Of course, it is still the case that such features of queries present
special problems to all IR systems.)

6.4  A Collection of Known Problems

As noted previously, we made a number of obvious mistakes in processing the TREC corpus
that we simply did not have time to correct. It is likely that some such mistakes contributed
to poor performance. The following is a list of known problems that occurred while processing
the TREC corpus:

   1. Errors in the Lexicon. All CLARIT NLP depends on information derived from the lexicon;
     incorrect lexical entries will cause errors throughout the system. As with morphological
     processing, these errors result in false analysis for some words.

   2. Morphological Processing Errors. A certain number of rather simple bugs have been
     discovered in the morpholgical analysis module. The bugs have caused incorrect analyses
     of some words.

   3. "Robust" Parsing of Training Corpus. Unfortunately, the initial parsing of half of the
     TREC corpus was done in a mode where all phrases (not just NPs) in the input were
     retained in the output. Therefore, this output contained quite a bit of `noise' in the form
     of verbs, adverbs, and adjective phrases.

   4. Limited Partition Size. In creating the partition using the partitioning thesauri, we were
     limited to sets of 2000 documents. As noted above, one result of the `low' cutoff is
     that many relevant documents were simply not included in the partition. Using larger
     partitions should improve overall performance.

   In addition, there are many facets of the CLARIT-TREC process that we believe are not
properly calibrated or configured. These include the following:

   1. Iteration of Retrieval. Automatic feedback of retrieval results can be used to expand
     the set of relevant documents retrieved. However, we did not have time to perform such
     feedback during the TREC experiment.

   2. Alternative Scoring Functions. It is not clear that IDF-TF is the best scoring function
     to use in biased collections of text, such as our 2000 document partitions. In fact, IDF
     scoring will specifically demote terms in the document that are known to be important,
     yet are very well distributed because of their prominence in the partitioning thesaurus.

   3. Refinements to Document Partitioning Formula. TREC represents our initial attempt at
     using the partitioning techniques on a large corpus. The feature-scoring formula has not
     been validated. Additional experiments will likely lead to refinements.

                                         284

   4. Construction of Query Vectors. The techniques used to compose query vectors from sam-
     pie documents are problematic. For example, all terms from the contributing documents
     were included in the query vector. It is clear that this will contribute `noise' to the final
     vector.

   5. Scoring of Query Vectors. The terms in the query vector were scored in their independent
     documents, and then the scores were combined using addition. This has the desirable
     effect of reinforcing terms that occur in several contributing documents, but it is a crude
     mechanism for doing so.

   Improvements can be made in many elements of the CLARIT-TREC system. It is clear
that we require further experimentation and analysis to evaluate the system.

7   Conclusion

The CLARIT system has been and is continuing to be developed as part of a university research
project. The specific configuration of the system used in the TREC experiments was developed
in less than one week. As a research prototype, the system has not been engineered for optimal
performance.
   The TREC task was challenging, in part, because of the size of the corpus. The CLARIT-
TREC team had not previously worked with such a large database; we `invented' solutions to
many engineering problems on the fly. We were often inefficient.
   Many text processing functions that are available in the CLARIT system or are near com-
pletion were not used on TREC documents. In future evaluations, we plan to utilize some of the
more sophisticated functionality in the system. For example, we have been developing gram-
mars for recognizing complex tokens such as proper names, dates, times, monetary values, etc.,
but did not use token recognition modules in CLARIT-TREC processing. We believe that such
token recognition would greatly improve the results for queries involving specific persons or time
intervals. In addition, we believe that it will be possible to improve results by taking advantage
of sub-document scoring.  By dividing a long document into smaller, multi-paragraph units
we will be able to score documents more accurately with respect to a topic. Finally, we have
also been experimenting with generating sub-corpus-derived equivalence classes for words and
terms. Equivalence classes will make it possible to expand query terms precisely and selectively.
   Clearly, there is a great deal more to be learned from the TREC experiment. In our contin-
uing analyses, we will attempt to parameterize CLARIT performance and to experiment with
extensions of CLARIT functionality that may result in superior retrieval.

8   Acknowledgements

CLARIT team participation in the TREC activities was made possible, in part, by grants
from DARPA/NIST and from the Digital Equipment Corporation. In addition, several groups
at Carnegie Mellon University-including the Laboratory for Computational Linguistics, the
University Libraries, and the School of Computer Scienc~provided resources to the CLARIT
team. All the groups that supported our effort have our sincerest thanks and appreciation.


                                        285

9  References

[Evans 1990]


[Evans et a'. 1991a]


[Evans et a'. 1991b]


[Evans et al. 1991c]


[Evans et al. 1992]


[Evans et al. in prep.]


[Hersh et al. 1992]


[Salton & McGill 1983]

Evans, David A., "Concept Management in Text via Natural~Language Pr~
cessing: The CLARIT Approach." Working Notes of the 1990 AAAI Sympo-
sium on "Text-Based Intelligent Systems", Stanford University, March, 27-29,
1990, 93-95
Evans, David A., Ginther-Webster, Kimberly, Hart, Mary, Lefferts, Robert G.,
Monarch, Ira A., "Automatic Indexing Using Selective NLP and First-Order
Thesauri." RIAO `91, April 2-5,1991, Autonoma University of Barcelona,
Barcelona, Spain, 624-644.
Evans, David A., Hersh, William R., Monarch, Ira A., Lefferts, Robert G., Han-
derson, Steven K., "Automatic Indexing of Abstracts via Natural~Language
Processing using a Simple Thesaurus." Medical Decision Makin9, 11, (Sup-
Plement), 1991, S108-S115.

Evans, David A., Handerson, Steven K., Lefferts, Robert G., and Monarch, Ira
A., A Summary of the CLARIT Project. Technical Report No. CMU-LCL-
91-2, Laboratory for Computational Linguistics, Carnegie Mellon University,
Pittsburgh, PA, 1991, l2pp.

Evans, David A., Monarch, Ira A., Lefferts, Robert G., Handerson, Steven
K., and Hersh, William R., "Automating Reliable Judgments of `Similarity'
among Medical Texts." Laboratory for Computational Linguistics, Carnegie
Mellon University, March 1,1992.
Evans, David A., Lefferts, Robert G., Grefenstette, Gregory, Handerson,
Steven K., Hersh, William R., and Archbold, Armar A., A Report on the
CLARIT TREC-1 Experiments. Technical Report No. CMU-LCL-9~2, Lab~
ratory for Computational Linguistics, Carnegie Mellon University, Pittsburgh,
PA, 1993. (In Preparation)

Hersh, William R., Evans, David A., Monarch, Ira A., Lefferts, Robert G.,
Handerson, Steven K., Gorman, P. N., "Indexing Effectiveness of Linguistic
and Non-Linguistic Approaches to Automated Indexing." K.C. Lun, P. De-
goulet, T.E. Piemme, and 0. Rienhoff (Editors), Medinfo 9~, Amsterdam, NL:
Elsevier Science Publishers B.V. (North-Holland), 1992,1402-1408.

Salton, G. & McGill, M., An Introduction to Modern Information Retrieval.
New York, NY: McGraw-Hill, 1983.


                  286