The University of Massachusetts TIPSTER Project

                                    \V. Bruce Croft
                             Coiiiputer Science Department
                               University of' Massachusetts
                                  Amherst, MA. 01003

   The TIP STER. project in the Information Retrieval Laboratory of the Computer Science
Department, University of Massachusetts, Amherst (which includes MCC and David Lewis
of the University of Chicago as s~)contractors), is focusing on the following goais:

   * Improving the effectiveness of information retneval techniques for large, full-text
     databases,

   * Improving the effectivelLess of r~1ti1ig techniques appropriate for long-term informa-
     tion needs, and

   * Demonstrating the effectiveijess of these retrieval and routing techniques for Japanese
     full-text (1a.tai)a.ses.

   Our general approaCh to (~clIievi1'g these go('ds haS been to use improved representations
of text and information lLee(ls ill tiLe fra.iiie~vork of a 1LC~V model of retrieval. Retrieval (and
routing) is vie~ved as a prol) a )ilis tic uLfereuce 1)ro cess which "conip ares" text represent a-
tions based on different fornis of linguistic and statistical evidence to representations of
hiformation needs based on siiiii1~'ir evideiLce froiii natural language queries and user inter-
action. New techniques for 1ear1LiiL~ (rclev(~.nce fee~)ack) and extracting term relationships
from text are also being studied. The det;~ils aud evaluation (witiL smaller test databases)
of the new model, known as the uLference net model can be found in other papers [3, 2, 4].
   Some of the specific research issues we are addressing are morphological analysis in En-
glish and Japanese, word Sense disan~)iguation in English, the use of phrases and other
syntactic structure in English and Ja.l)allese, the use of special purpose recognizers in rep-
resenting documents and queries, analyzing natural language queries to build structured
representations of information nee(lS, learnii~ techniques appropriate for routing and struc-
tured queries, and probability estimation techniques for indexing.
   Comparing the TIPSTER. experiments to previous IR. experiments done using the stan-
dard test collections (e.g.  CACM, CISI   NPL, etc.), there are a. number of interesting
differences:

   * The size of the corpus is iiiuch larger than previous collections, both in terms of
     the number of documents aiLd tiLe ailiount of text. This presents a challenge to the
     robustness and efficiciLcy of experiliieiLtal information retrieval systems. Experiments
     with indexing for exaiiiple  (alL take days ilLSte('L.d of minutes.

   * The documents in TIPSTER are nearly all full text, rather than abstracts.


                                          101

* The documents in TIPSTER
 They collie the general arc~a
 are the ~ Tall Street Journal,
 technology area, Departnicnt

are heterogeneous ill terms of both subject and length.
of science, technology and economics, but the sources
Associated Press newswire, Ziff magazines in the high
of Energy abstracts, and the Federal Register.

  * The queries (known as  `topics" in TIP STER) are longer and have more structure
    than those found in other test collections.

* The queries have specific and
 These criteria (specified in the
 between relevance judges but
 information retrieval Svsteiii.

strict cutena specified for documents to be relevant.
"narrative" part of the topic) will reduce inconsistency
are sometimes difficult to handle in the context of an

   * The routing experinielits are unhke any carried out before.

   * The retrieval and routilig experiments ~vith Japanese are also unique.
   The first TIPSTER. evaluatioii w('s liujited by a number of factors, the primary one
being the lack of relevance judgeiiieiit.~ for the jiutial query set. This made it difficult to
carry out experiments to sele(~t technique.~ appropriate for large, full-text databases. The
results from this evaluation 5lL()1l1(l~ therefore, be regarded as preliminary, and indeed raise
more questions than they answer.
   In the retrieval experinient, 50 new ~tol)ics" were used to search the "old" database,
which consisted of approxiiiiately 1 GByte of text. One of the major subjects of the eval-
uation was to try different forms of queries produced by processing the topics. Our basic
approach to topic processing is to parse them, selecting parts to be indexed, recognizing
phrases and "factors" such a." locations, dates, companies, etc. Some factors, such as "de-
veloping country", which have been specifically identified as important in the topic, will be
expanded using a synonym operator. Weights reflecting relative importance are attached
to the concepts (words and phrases). Phr~ise-based concepts are represented by operators
defined in the inference net l~u1guage. These operators use proximity of the words making
up the phrase as the major form  of evidence for the presence of the concept [1]. The result
of topic processing is an inference net representing the information nee(l.
   In addition to the automatic query processing, some query versions were generated by
simulating simple user interactioii with the results of the topic processing. The modifica-
tions to the automatically processed topics were limited to changing the weight of concepts,
deleting concepts considered uniniportant, and adding structure (such as specifying syn-
onymous concepts). The lilost signifi(~ant change in the last category was the introduction
of "unordered window" 01) er~i.tors to Si umlate paragraph-level retrieval. The equivalent in
terms of a user user interface would 1)e to ask users to group concepts that should occur
together.
   The results of the first evaluation are described here in terins of the average precision in
the top 5, 30 and 200 documents in the ranking produced by the inference net retrieval engine
(INQUERY). This evaluation method was chosen because only the top 200 documents for
each query were judged for relevance. The results were as follows:


                                          102

            Query Type
                             5 docs
          T+D+C+F+phrase     .64
          T+D+C+F            .62
          1+N                .60
          T+C+phrase         .66
          1+nian             .6.5
          1+man+para         .72

                                  Average Precision (50 topics)
                                           30 docs       200 docs
                                           .52           .35
                                           .52           .35
                                           .50           .34
                                           .53           .36
                                           .56           .36
                                           .61           .39

                                (-3.1%)
                                (-6.7%)
                                (+3.1%)
                                (+1.6%)
                                (+12.5%)

                                              (0%)
                                              (-3.8%)
                                              (+1.9%)
                                              (+7.7%)
                                              (+17.3%)

                                                           (0%)
                                                           (-2.8%)
                                                           (+2.8%)
                                                           (+2.8%)
                                                           (+10.3%)

Table 1: TIPSTER Retrieval Results: Query types refer to topic fields used. T is topic, D
is description, C is concepts, F is factors, N is narrative, phrase means phrase constructs
used, 1 refers to the basehue (the first hue), man means manual modification, para means
paragraph retrieval.


   These results support two main conclusions: the first being that the effectiveness of the
retrieval techniques is surprisingly good considering the difficulty of the queries; the second
is that paragraph-level retrieval ~i.s silLiulated by iua.nual creation of `ulLordered window"
queries significantly improves eff.ectivelLess.  Much of the short-term development of the
inference net retrieval system will concentrate on techniques to accomplish paragraph-level
retrieval automatically. The major questiolL raised by the results concerns the effectiveness
of phrases. In previous experiments with mediuni-sized full-text collections, phrase-based
retrieval led to significant effectiveness improvements.  This is not evident in the results
shown here.  A possible explan~~ion for this is the size of the TIPSTER topics, where
queries may have more than 50 terms, but it should also be remembered that these results
are very preliminary.
   The routing expenments used 20 `~old" topics to search the "new" database (approxi-
mately 1 GByte of text from the same sources as the "old" database, with the exception of
DOE abstracts). Since the aim of these experiments was to study techniques for represent-
ing and using long-term information needs, we assumed that users would be more involved
in query formulation and thus the baseline used was the "1+man+para" queries. The other
query types in this experiment used variations of relevance feedback to modify the baseline
queries. These modifications consist of adding concepts to the query and reweighting the
query concepts based on their fiequency of occurrence in the identified relevant documents.
For this experiment, we had a small numl)er of relevance judgements based on documents
retrieved by another system. The techniques used to select concepts to add to the query
were based on local and gl~)al application of the EMIM measure of association [5] The
number of terms added to a query was limited to 5.
   The results show that, once again  the effectiveness levels are quite good (note the
50% precision value at the 200 document level).      The relevance feedback techniques were
not effective, except at the high precision end of retrieval. The features selected were, on
inspection, reasonable, but they (10 not ~pear to be the features required by the narrative
in order to make a document relevant.  No definite conclusions can be made about the


                                               103

    Query Type         Average Precision
                     5 docs      30 docs
man                  .66
man+weights          .68 (+3.0%)
man+EMIM+weights      (1 (+7.6%)
Man+LEN1I~I+weights  .68 (+3.0%)

.65
.63 (-3.1%)
.61 (-6.2%)
.64 (-1.5%)

(20 topics)
   200 docs
   .50
   .48 (-4.0%)
   .49(-2.0%)
   .50 (0%)

Table 2: TIPSTER Routing Results: weights are based on frequency in relevant documents,
EMIM is a global selection measure, LEMIM is a local (window-based) selection measure.


feedback techniques until experillients ~vith larger sets of relevauce judgements are carried
out.
   The third set of results ~ire re1('~te(1 to tile retrieval of Japanese text. The goal of these
experiments was to comp are different app roaches to morp hologi cal analysis or word seg-
mentation.  Japanese text is lfl(ide up of characters from a nun~~er of alphabets (I(anji,
Katakana, Hiragana, and Engb.sii). There are, however, no word separators and therefore
a major part of indexing is deciding what to index. ~Ve tested two alternatives:

   1. An efficient, relatively crude technique where individual Kanji (Chinese) characters
     and St rings of Kat akana c h arac t ers are indexed.

   2. A more sophisticated dictionary and grammar-based segmentation algorithm devel-
     oped at Kyoto University (JUMAN).

   There is a significant difference in the indexing times required by these techniques. With
a database of 1,100 documents from a Japanese newspaper, the character-based indexing
took 4 minutes while the word-based (JUMAN) indexing took 31 minutes.  The relative
effectiveness of the two text representations was then tested using the average precision in
the top 10 documents for 30 queries. The queries were either treated as strings of characters,
or were automatically structured using tile JUMAN segmenter. In the character-based ap-
proach, words found in the query were expressed using the phrase operator to combine Kanji
and Katakana characters. The results slLow that the retrieval performance using Japanese
seems to be comparable to similar experiments with English databases, and the relatively
simple character- based indexing technique is ~irprisiiigly effective compared to more sophis-
ticated word-based techniques. The latter result is interesting, but the experiment must be
repeated when the larger TIPSTER Japanese dat~)ase and query set becomes avallable.
   We are currently carrying out a range of more detailed experiments using the relevance
judgements that are now avail~)le. The results from these experiments will allow us to
tune the techniques being used and to make more definite conclusions about their relative
effectiveness. In addition, we will contiiuie to incorporate new approaches into the retrieval
and routing software for the upcoming evaluations.


                                        104

        Query Type                 Average Precision in Top Ten (30 queries)
                                Character-Based Indexing  Word-Based Indexing
 Characters                     .61
 Words using phrase operator    .63 (+3.3%)
 Words                                                    .65 (+6.6%)

                   Table 3: TIPSTER Japanese Retrieval Results


References
 [1] W.B. Croft, H. Turtk~   D. Le'vis, `~The Use of Phrases aud Structured Queries in
   Information Retrieval", Proceedings of SIGIR 91, 32-45, (1991).

 [2] W.B. Croft and H. Turtle,   Text Retrieval and Inference", in Text-Based Intelligent
   Systems, Paul Jacobs (ed.), Lawrence ErH)aunl, New Jersey, 127-1.56, (1992).

 [3] H.R. Turtle and W.B. Croft, "Evaluation of an Inference Network-Based Retrieval
   Model", ACM Transactions on I?zforination Systems, 9(3), 187-222, (1991).

 [4] H. Turtle and W.B. Croft, ~A Comp~ison of Retrieval Models", Computer Journal,
   35(3), 279-290, (1992).

 [5] C. J. Van Rijsbergen, I'~forr'iation Rctrieval. Butterworths, (1979).


                                         105