Okapi at TREC
                         Stephen E. Robertson, Stephen Walker,
                 Micheline Hancock-Beaulieu, Aarron Gull, Marianna Lau

                           Centre for Interactive Systems Research
                            Department of Information Science
                                   City University
                                 Northampton Square
                                London BC1V OHB, UK

Advisers: Karen Sparck Jones (University of Cambridge); Peter Willett (University of Sheffield);
E. Michael Keen (University of Wales).

Abstract: The Okapi retrieval system is
described, technically and in terms of its design
principles. These include simplicity, robustness
and ease of use. The version of Okapi used for
TREC is further discussed. Designing
experiments within the TREC constraints but
using Okapi's supposed strengths proved
problematic, and some compromise was
necessary. The official TREC runs were (a) very
simple automatic processing of the ad-hoc topics;
(b) manually constructed ad-hoc queries; (c)
feedback on the manual queries from searchers'
relevance judgements; and (d) routing queries
automatically obtained using the training set in a
form of relevance feedback. The best run
(manual with feedback), although not up to the
best reported TREC results, was respectable, and
an encouragement to further development within
the same principles.


1. Introduction

Okapi is an experimental text retrieval system,
designed to use simple, robust techniques
internally and to present a user interface which
requires no training and no knowledge of
searching methods or techniques. It is presently
accessible by academic users at City University,
with the library catalogue and a scientific
abstracts journal as databases. It is used for
experimentation with and evaluation of novel
retrieval techniques.


                                      21

A design principle of Okapi is that simple
techniques, without Boolean logic but with best-
match searching, and with little in the way of a
manually constructed knowledge base, can give
effective and efficient retrieval. `Simple' also
implies minimum effort, either manual or
machine, either at the set-up stage or at input or at
search time. In particular, relevance feedback
(which requires little or no additional user effort,
since users must make such judgements anyway),
provides a mechanism whereby an initial query
formulated with no great effort can be improved.
Such a search process might be regarded as
having something of the character of browsing: an
exploration of a topic rather than a precise
specification.

In some respects (e.g. highly elaborate topic
specifications; no evaluation of interactive
systems) TREC does not at all represent the kind
of retrieval activities for which Okapi was
designed. However, our approach to TREC has
been to try to arrive at some compromise between
the aims of Okapi and those of TREC. The
resulting performance was not spectacular, but
was (we believe) respectable enough to encourage
us to pursue the ideas further.


2. Background: the Okapi project

The following is a description of Okapi as it
existed before the start of TREC-related work
("interactive Okapi9'). Section 3 discusses some

changes which happened concurrently with, and
were necessary for, the TREC work.

Okapi is a family of bibliographic retrieval
systems, developed under a series of grants from
the British Library. It is suitable for searching
files of records whose fields contain textual data
of variable length up to a few tens of thousands of
characters. It allows the implementation of a
variety of search techniques based on the
probabilistic retrieval model, with easy-to-use
interfaces, on databases of operational size and
under operational conditions (Walker, 1989;
Walker & De Vere, 1990; Walker & Hancock-
Beaulieu, 1991; Hancock-Beaulieu & Walker,
1992).

The main purpose of the Okapi installation at City
is to allow the use of a variety of evaluation
methods, including live-user evaluation in the
context of user information-seeking behaviour.

2.1 Search techniques
The interactive Okapi system uses probabilistic
"best match" searching, and can handle queries of
up to 32 terms. (There is no Boolean search
facility in interactive Okapi -- but see 3.2 below
concerning the development system.) Search
terms may be keywords or phrases, or any other
record component which has been indexed, and
are extracted automatically by very simple
"parsing" of an initial natural language query.
Search terms are assigned weights, based on
inverse document frequency in the absence of
relevance information and on the F4 formula
given in Robertson & Sparck Jones (1976) when
relevance information is available. The match
function is a simple sum-of-weights. There are
facilities for "adjusting" the weighting to favour
(for example) terms occurring in specified fields.
There is also a limited alphabetical browsing
facility (of records in index term order).

The F4 formula, point-5 version, is:
              (r+O.5) (N-R-n+r+O.5)
      w = log  (R-r+O.5) (n-r+O.5)


where N = collection size
      n = number of postings of term
      R = total known relevant documents
      r = number of these posted to the term

The inverse document frequency (IDF) weight is

                                       22

F4 with R=r=O, i.e.
     w = log (N-n+O.5)/(fl+O.5)

2.2 Relevance feedback and query expansion
The system can invite relevance judgments from
the user, and following one or more positive
relevance assessments it can perform an
"expanded" search, using the original query terms
together with additional terms extracted
automatically from the relevant records. This
procedure can be iterated.

2.3 Language processing

Very simple text and linguistic processing is
applied during indexing and searching.

There are two levels of automatic stemming, and
a mainly rule-based procedure for conflating
British and American spellings.

There are facilities for constructing and using a
simple linguistic knowledge base containing "go"
phrases, classes of terms to be treated as
synonymous, prefixes, stopwords and phrases,
and "semi-stopwords"--- words and phrases to be
treated as relatively unimportant in processing a
query.

2.4 Usage

The interactive system is intended for highly
interactive use by untrained users.

2.5 Logging
The system can produce detailed logs of both user
and system activity, down to keystroke level and
sub-second granularity.

2.6 Present use and status
The present use of Okapi is primarily as a tool for
the evaluation of highly interactive bibliographic
search systems with untrained users. It is also to
be used in an investigation of the use of linguistic
knowledge structures (e.g. thesauri) in text
retrieval systems.

The system is not commercially available. It is not
finished, maintained or documented to
commercial standards. It is, however, designed
for live use, and there has, over the years, been a

considerable amount of use under live conditions.
It is a set of functions from which experienced
designers and programrners can construct retrieval
systems, rather than a finished "product".


3. Concurrent developments
3.1 Towards a distributed system
This development reflects a long-standing plan
for the Okapi project, but was brought forward to
facilitate work on the TREC database.

Okapi has been split into a Basic Search System
(BS S) and a number of front-end systems. The
BSS is essentially a database engine offering
basic text retrieval functionality, extended in
various ways to allow weighting, ranking and
relevance feedback etc. Although the front-end
systems at present reside on the same machine,
the dialogue between the front-end and the BSS is
roughly comparable to that which might take
place using the Z39.50 or Search & Retrieve
protocols. It concerns mainly specifications for
and descriptions of search sets, and involves
actual records only at the time of display.

All automatic searching for the TREC project
involved purpose-written front-ends to the BSS.
A further front-end was developed for manual
searching. This was designed to include most of
the functionality of the old interactive version of
Okapi, but not to emulate its user interface; it is
command-driven.

3.2 Mixing Boolean and weighted searching
One characteristic of the BSS needs explaining.
The BSS is capable of conducting Boolean
searches as well as weighted (best match)
searches. Furthermore, any Boolean expression
(resulting in an undifferentiated search set) can be
treated as if it were a single term in the weighted
searching model. This is compatible with the
approach taken in the Cirt system (which acted as
a front-end to a Boolean host) (Robertson et al.,
1986); particular examples of uses in Cirt include
ORed synonyms and phrases constructed with the
ADJ operator. The Okapi BSS does not at present
allow proximity operators such as ADJ, but the
principle is the same.

To a very limited extent, this facility was used by


                                      23

the manual searchers (see 5.3).

3.3 Term selection for query expansion
Interactive Okapi automatically selected terms
from relevant documents for query expansion by
taking the top x (=20) terms according to their
relevance weights. The BSS version uses the
Robertson selection value (Robertson, 1990),
approximately r*w (where w is the usual F4
weight). (See also discussion in section 6.3,
which shows that there was an error in taking this
approximation.) Also, the interface used in the
manual TREC experiments allows semi-automatic
query expansion, in that the list of candidate terms
can be displayed for the searcher to make
selections from (and then entered manually), or
the top 20 terms can be used automatically.

Terms once selected are weighted using F4 in the
usual way, except with the modification indicated
below.

3.4 Bias towards query terms
In interactive Okapi, the terms in the original
query held no special position in the query
expansion process, except in the sense that a
"semi-stopword" in the original query would be a
candidate for the feedback query, whereas the
same term occurring in a relevant document but
not in the query would not be considered.

For the TREC experiments, some bias in favour
of query terms was built in, in the form of some
hypothetical relevant documents assumed to
contain the query terms (Harman, 1992;
Bookstein, 1983). These hypothetical relevant
documents then contributed to the calculation of
F4. Different quantitative assumptions were
made in different TREC experiments (see section
5), but once again an error crept into the
implementation of this facility (see section 6.3).


4. Input processing
4.1 Converting the raw files
The Okapi system needs databases to be in its
own format, in which each record consists of an
identical sequence of fields in the form of
terminated text strings. Fields are identified by
sequence number only. Using the given

information about the makeup and structure of the
source material, together with quite a lot of trial
and error, a program (lex + C) was written to
convert all the raw datasets into a unified 25-field
structure. The only fields common to all input
were "text't, document-ID and a "source" field
containing "fr", "doe" etc., so all records consisted
mainly of empty fields. The "source" fields were
intended solely as "limit" criteria, but were
unused except perhaps by one or two of the
human searchers. Fields other than the three
mentioned were used solely for display. Any
records longer than 64K were truncated at 64K.
This truncation only affected the text field (field
25).

Conversion, which included decompression,
conversion to Okapi format using the lex-C
program and a second-stage conversion to
runtime format ran at about 10 records/sec on a
SPARC machine.

4.2 Inversion

scratch the surface of the problem.

Stopwords:  120
Semi-stopwords: 256
  These were humanly selected following trial
  indexing runs. The criteria were (1) small
  retrieval value and (2) high posting count.
  Examples: 100, begin, carry, date, december,
  enough, include, meanwhile, run, take, why,
  without, yesterday.
Prefixes: 18
  The purpose of this list is to cause <prefix>-
  <word> and <prefix><word> to be treated
  identically for any value of <word>
Go phrases: 27
  Examples: cold war, middle class, saudi arabia
Synonym classes: 300, containing about 700 words
  Examples:
    australia, australian, australasia, australasian
    buyout, buy out
    mit, massachusetts institute of technology
    porn, porno, pornography, pornographic

The text field was reduced to "words", stemmed
using the moderate-strength Porter algorithm
(Porter, 1980) with modifications aimed at
conflating British and American spellings, filtered
through a local database (GSL, see below)
containing stopwords, semi-stopwords, prefixes, a
few "go" phrases (phrases to be treated as words),
and a list of classes of words and phrases to be
treated as synonymous. The document-ID field
was extracted unchanged. Inversion took about 33
hours CPU on a SPARC machine (about 6
documents per second, but this increases more
than linearly with number of documents). The
result was a simple inverted file structure with no
within-field positional information (insufficient
disk space). There were facilities for limiting
searches by source dataset, by various document
length ranges and by odd/even half-collection (for
comparison experiments).

4.4 Some statistics

              First    Second
               part
             511514

 part
230936

Total
  documents
Truncated         603
  (over 64K)
Size(MB)         1107
  (bibfile only, runtime format)
Inversion          44
  overheads (%)
Unique index terms
  (excluding document numbers)
Mean unique index terms/document
Postings
  (excluding document numbers)
Mean postings/document
  (document "length")

 Both

742450

531       1134

759       1866
N/K        44

       1040415

            1A3
      95898880

           132

4.3 The local GSL database
The Go-See-List (GSL) for the TREC
experiments was based on exisfing databases, but
was somewhat extended for TREC. Both original
and extensions were derived in a fairly ad-hoc
fashion (some entries were identified by
examining a list of the most frequent terms in the
first part of the TREC collection). This is not a
sophisticated facility, and can only be said to


                                       24

5. Experiments
Following the TREC design, results were
submitted for routing queries on the second set of
records, and for ad-hoc queries on the combined
set. Routing queries were processed
automatically only (section 5.2; results table
cityri); ad-hoc queries were processed
automatically (5.1; cityal) and manually, with
feedback on the manual searches (5.3; results
without feedback in table citymi, and with

feedback in citym2).

In accordance with the general philosophy of
Okapi, the TREC experiments were used to test
the use of simple statistical techniques, with
minimal linguistic processing, minimal searcher
knowledge of techniques of searching, and indeed
minimal effort generally. The routing test was
intended to address mainly the value of relevance
feedback (term selection and weighting) in a
routing context where relevance judgements are
accumulated from earlier runs. Automatic ad-hoc
queries tested the weighting scheme without
relevance information. Manual ad-hoc queries
tested the combination of human intelligence with
a simple weighting scheme, with and without
feedback.

5.1 Automatic processing of topics

The basic principle was to take specific section(s)
of the topic and parse them in standard Okapi
fashion, as if they had been typed in verbatim by a
searcher. Thus stopwords were removed; a few
phrases and/or members of synonym classes were
identified; remaining words were stemmed; all
search terms (stems or phrases or synonym
classes) were weighted using IDF (see section
2.1). No special account was taken of the
negative phrases which appear in some of the
TREC topics, so that negated words would have
been given positive weights by Okapi.

The selection of the topic sections was the subject
of a very small amount of initial experimentation
using the training set. The differences were not
very consistent and in some cases small, and more
testing would have been useful. However,
marginally the best overall was Concepts only,
and that was what we used for the returned
results.

The results for the above automatic analysis of
ad-hoc queries are given in the official tables as
cityal.

5.2 Routing queries

The principle on the routing queries was to
assume that all the known relevant documents
from the first document set were already available
for a relevance feedback process. Thus any actual
searches conducted on the first document set, and
their actual outputs, played no direct part in the


                                      25

formulation of the routing queries, with one
exception discussed below. However, the terms
extracted from the topics took part in the
relevance feedback process in the manner
indicated in section 3.4, with a bias equivalent to
10 supposed relevant documents in all of which
the topic terms were supposed to occur (10 out of
10 bias).

The exception to the above statement was that for
some topics, some additional relevance
assessments were made (that is, additional to
those provided centrally). These were based on
the top ranked documents retrieved in automatic
searches on the first document set. (See section
6.1 for a discussion on the local relevance
judgements and on the reasons for this decision.)

The results for the above analysis of routing
queries are given in the official tables as cityri.

5.3 Manual searching and feedback

The central idea behind these experiments was to
approach as closely as possible the situation of a
naive or inexperienced user. In other words, we
wanted to gain some idea of how the system
would perform if searched by an end-user with
little or no knowledge of information retrieval.
This intention reflects the design principles of the
interactive Okapi, as discussed in section 2 above.
To some degree, however, both the design of the
TREC experiment in general and the constraints
of the distributed system described in section 3.1
forced deviations from that ideal.

5.3.1 Searchers

The first constraint is of course that we had no
access to end-users (and more particularly, no
access to end-users with the specific
characteristics of the TREC analysts). We used a
panel of searchers, mainly information science
students who could be said to have some
knowledge of searching in general, limited
domain knowledge (depending on the topic), and
no particular knowledge of the system. ~or
reasons to do with the very limited time available
for these searches, it was necessary to use project
staff for a few searches; these staff obviously had
more knowledge of the system.) The somewhat
limited interface to the BSS which was used for
this experiment required some training of the
searchers.

Clearly one would in general expect end-users to
have more domain or subject knowledge,
especially for the kinds of queries provided for
TREC. Highly interactive systems in general, and
Okapi in particular, may be assumed to exploit
such subject knowledge; clearly relevance
feedback in ad-hoc searching can only work well if
it is relatively easy for the user to find some
relevant items from the initial search. In this
sense, we see the present experiment as to some
degree unfavourable to Okapi.

5.3.2 Searching
Searchers were expected to make whatever
interpretations of the topic they deemed
appropriate for the purpose of searching. In other
words, they could use words or phrases taken
from any part of the topic, or from their own
general or specific knowledge. They could also
have used other reference sources. However, they
were encouraged to use the system to help them
refine the search, in the way that an end-user
might explore the possibilities within the system
and try out different combinations of search
terms.

The combination of these ideas with the TREC
rules was a little clumsy and artificial. The
procedure was as follows:
(a) The searcher was given the topic in full, as
  received by us.
(b)The searcher examined the topic and chose
  some terms as candidates for searching
  (possibly including terms not in the topic as
  received).
(c) The searcher made exploratory searches,
  examining the results, making tentative
  relevance judgements and perhaps using the
  semi-automatic query expansion facility (see
  section 3.3) to suggest new terms.
(d)Having decided on an initial formulation, the
  searcher then finished the exploratory session
  and started the definitive session.
(e)The definitive session involved two stages, an
  initial search and a first iteration feedback
  search. The initial search was strictly in
  accordance with the selected initial
  formulation; the searcher examined the top few
  documents, making relevance judgements.
(f) The frrst iteration feedback was purely
  automatic from the relevance judgements,
  including re-weighting and automatic


                                     26

  expansion. No further iterations were
  conducted.

The guidelines to the searchers included the
following:
Time: Searchers were asked to allow very
  roughly 30 minutes per topic. In fact, the
  average was nearer 50 minutes.
Feedback: The guidance was to assess about the
  first 20 documents retrieved by the initial
  search, or to stop after finding about 8 relevant
  (if that was sooner).
Relevance: If it seemed to be difficult to find any
  relevant items, searchers were encouraged to
  make generous relevance judgements, so as to
  ensure that there was some basis for feedback
  (see also section 6.2 below).

5.3.3 Remarks on the system
The bias in favour of initial formulation terms in
the relevance feedback formula was 2 out of 3
(i.e. 3 supposed relevant documents out of which
2 were supposed to contain the term).

Searchers were able to use the Boolean facility
described in section 3.2, for example to treat an
expression such as (A and B) as if it were a single
term, to be weighted like any other. However, the
emphasis was on the usual (in the Okapi context)
weighted searching of single terms, and this
facility was used only occasionally, and only as
part of larger best-match searches. In other
words, this use did not compromise the
characteristic of weighted searching as truly "best
match", with all the flexibility that that implies.

5.3.4 Choice of terms
The terms chosen by the searchers may be briefly
characterized by the following statistics:
  Average num~r of terms 12.9
  Terms appearing in the topic 10.5 (81%)
  Terms appearing in different fields:
    Description 3A
    Narrative 6.0
    Concept 7.5
    Others 2.9
(these add up to more than the total because a term may
occur in more than one field).

For comparison, the Concept field has around 19-
20 terms on average.

The results for the manual ad-hoc queries are
given in the official tables as cityml (without
feedback) and citym2 (with feedback). For a
discussion of the results, and of the evaluation
method for citym2, see section 7.


6. Some observations on the experiments

6.1   Local relevance judgements

We experimented with making our own relevance
judgements, based on the topics as provided.
Although these experiments were on a very small
scale and not very systematic, our impression was
that it was usually possible to reproduce the
judgements provided centrally, with a high chance
of agreement. If this is so, it presumably reflects
(a) the relafively highly specified nature of the
topics (as compared to most IR queries!), and (b)
the fact that the centrally-provided judgements are
being made by experts other than the original
requester. Thus we felt justified in attempting to
improve our routing queries by providing some
more relevance judgements of our own,
particularly in cases where there were few
centrally-provided ones. Note that the relevance
weighting method used (F4 formula in section
2.2) takes account only of positive relevance
judgements; items judged non-relevant are
combined with items not judged (the complement
method: Harper and van Rijsbergen, 1978).

However, as indicated in section 5.3, there were
topics for which (under strict relevance criteria)
the relevant documents were very sparse, and
relevance feedback would not have had much
effect. In these cases, for the manual searches
only, searchers were encouraged to make more
generous relevance judgements (i.e. to accept as
relevant some documents that did not meet all the
criteria precisely). The argument behind this
guideline was that relevance feedback should
work better given some partially-relevant items
than with few or no relevant items. This
argument obviously requires testing.

6.2   Bias to query terms
The bias in favour of original query terms
discussed in section 3.4 was an attempt to
represent the prior knowledge that a term chosen
by the original requester or a searcher is likely to
be good in terms of the probabilistic model. This


                                      27

argument relates to, but is not limited to,
Harman's argument about negative weights
(Harman, 1992). The point-S formula used in the
relevance weighting model actually has a built-in
bias which might be described as "0.5 out of 1".
The biases used in different TREC experiments
(10 out of 10 and 2 out of 3) were chosen
arbitrarily; unfortunately there was no time to do
any extensive tesfing to enable a better-informed
decision.

A bias such as 2 out of 3 has the curious effect of
downgrading some very good query terms (any
term that occurs in all the known relevant). This
was part of the reason for trying the 10 out of 10
bias. However, there may be good reason for this
effect: even very good results on the known
relevant should not persuade us that p is actually
unity.

6.3   Two implementation errors
There were also two errors in the implementation
of this bias. In the relevance weighting formula,
the probability p (that the term occurs in a
relevant document) is estimated directly from the
known relevant documents; the bias is correcdy
used to modify this estimate (e.g. r->r+2 and
R->R+3 in the formula p=r/R). But the
corresponding non-relevance probability q is
normally estimated by the complement method
(i.e. all documents in the collection not known to
be relevant are assumed to be non-relevant,
q= (n-r) I (N-R)). In the implementation used for
TREC, the modifications to r and R were
incorrectly carried over to the q estimate.

The second error occurred in the term selection
value for query expansion. The full selection
value should be w (p-q). Since q is normally very
small compared to p, this can be approximated by
wp. Since in a simple relevance feedback version,
p=rIR and R is the same for all terms (i.e. the
number of known relevant), ranking in wp order is
the same as ranking in wr order. So in the TREC
implementation, wr was used. However, the
modification to R for query terms invalidates the
second assumption (that R is the same for all
terms), so wp should have been used.

These errors will have had the effect of over-
emphasizing some infrequent query-terms, but
will probably not have affected the overall results
greatly.

7. Results and discussion

Full results can be seen in the official tables. The
evaluation of the feedback run was treated in a
somewhat special way, by agreement with the
organizers. The original plan had been to do
"residual ranking" evaluation, i.e. to remove from
the collection those items which were assessed for
relevance for feedback purposes, and to evaluate
two runs (with or without feedback) on the
reduced collection. This would have allowed a
comparison between these two runs, but not
between the feedback run and any of the other
results presented.

Instead, a "frozen rank" evaluation was used, in
which the documents examined for relevance
before feedback were retained as the top-ranking
documents in the feedback run. This simulates a
real search, in that those documents would have
been seen (in some form) by the user and would
therefore have to be regarded as part of the output
of the system. Therefore it may be seen as a
fairer evaluation of feedback than residual
ranking, although it is likely to reduce the
apparent effect of feedback.

A very brief summary of the results, taking just
two measures from the tables, is as follows:

           11-point Precision
           average  at 5 docs

cityal     12.1%    49.6%    Ad-hoc auto
citymi     15.6%    57.6%    Manual
citym2*    18.2%    58.8%    Feedback
cityri     17.7%    54.8%    Routing
(*Frozen ranks evaluation)

The performance of the automatic ad-hoc run is
really rather poor. The manual run without
feedback is better. Feedback does clearly produce
an improvement (though not, of course, given the
frozen ranks evaluation, at the high-precision
end). It seems that both the choice of terms and
the liberal relevance judgements by non-expert
students are effective at least to some degree.
(We have yet to compare the individual
judgements by the students with the "correct"
ones provided by the TREC organizers, or to
establish whether the "correct" judgements would


                                        28

have given us greater performance benefits.) The
routing results seem reasonable.

In general, we believe that the simple, robust and
minimum-effort methods we have adopted in
Okapi have been shown to work, even with very
different material (both documents and queries)
from that for which Okapi was originally
designed. Performance, both in absolute terms
and relative to the other TREC entries, is
respectable but by no means wonderful. We also
believe that there is much scope for improvement;
there are other simple and robust methods (such
as other weighting formulae or different
treatments of compound terms) to which Okapi
would be hospitable, and which may bring
performance up to a more acceptable level. We
look forward to TREC 2.


References

Bookstein, A. (1983). Information retrieval: a sequential
  learning process. Journal of the American Society for
  Information Science 34(5) 331-342.

Hancock-Beau lieu M. & Walker S. (1992). An evaluation
  of automatic query expansion in an online library
  catalogue. Journal of Documentation 48(4) 406A21.

Harman, D. (1992). Relevance feedback revisited. In:
  SIGIR 92-- Proc. 15th International Conference on
  Research and Development in Information Retrieval,
  ACM Press, 1-10.

Harper, DJ. & van Rijsbergen, CJ. (1978). An
  evaluation of feedback in document retrieval using co-
  occurrence data. Journal of Documentation 34(3), 189-
  216.

Porter, M.F. (1980). An algorithm for suffix stripping.
  Program 14(3)130-137

Robertson, S.E. (1990). On term selection for query
  expansion. Journal of Documentation 46(4), 359-364.

Robertson, S.E. & Sparck Jones, K. (1976). Relevance
  weighting of search terms. Journal of the American
  Society of Information Science 27(3), 129-146.

Robertson, S.E., Thompson, C.L., Macaskill, MJ. &
  Bovey, J. (1986). Weighting, ranking and relevance
  feedback in a front-end system. Journal of Information
  Science 12(1/2), 71-75.

Walker, S. (1989). The Okapi online catalogue research
  projects. In: The online catalogue: developments and
  directions, edited by Charles R Hudreth. Library
  Association. 84-106.

Walker S. & De Vere R. (1990). Improving subject
  retrieval in online catalogues: 2. Relevance feedback
  and query expansion. British Library (British Library
  Research Paper 72.) ISBN 0-7123-3219-7

Walker S. & Hancock-Beaulieu M. (1991). Okapi at
  City: an evaluation facility for interactive IR. British
  Library Research Report 6056.

    They are used to determine the linguistic
    processing to be applied to queries and the
    parameters to be used for index lookup and for
    the extraction of terms for automatic query
    expansion
  search mnemonics (e.g. ~, ABS, AUTH) used in
    query parsing
  display parameters, defining two levels of display
language knowledge bases
  Up to three of these may be associated with a
  database to allow linguistic processing to depend on
  the type of data being searched or extracted.
  Typically, these are common to a number of
  databases of similar type and usage.

Input

Appendix: System architecture
Platform

The system runs on Sun hardware. It should port fairly
easily to other UNIX platforms, at least of the BSD type.
All the search and indexing code is in C. Source file
conversion programs and log analysis programs may be
written in awk.

Database structure

A database consists of
  text file (1)ibliographic file)
     this is the dataset from which searches retrieve
     records
  up to three indexes
     Bach index consists of primary and secondary
     dictionaries and a posting file. There are several
     types of index. One contains no positional informa-
     tion below the level of records, and is suitable for
     "phrases" like personal names and titles. Others
     contain positional information in the form field,
     sentence, word number for every occurrence of
     every indexed term. An index can contain terms for
     up to 16 different types of search.
  a set of parameter files:
     database description parameter
     indexing parameters (one set for each index).
       These define how indexing is to be performed in
       terms of linguistic knowledge base, stemming
       function, procedure for extracting index terms
       and the fields and subfields from which they are
       to be extracted.
     search type (or group) parameters
       These are closely related to indexing parameters.


                                          29

Source files are stored in a simple format where each record
starts with a field directory giving the length of each field,
followed by the text of the fields. Fields may contain a
limited range of subfield or role markers, indicating the
nature of the following data. There are facilities for
importing a few types of bibliographic files, including
UKMARC and ISO 2709. An "Okapi exchange" format
also exists. Character coding is ASCII with a "shift"
character (`\`) to allow the encoding of characters above hex
7F. No data compression is used.

Output (interactive Okapi only)

Output is to character-based terminals or windows, or hard
copy. There are two levels of record display and printout --
brief (one line) and full. Record layout is determined by
parameters and is fairly flexible.

Database maintenance

There are no record editing and no index updating facilities.
Source file and indexes must be completely regenerated
when necessary.

Indexing storage overheads

Several types of index are available. Depending on the
nature of the database and the extent of indexing required
overheads range from about 10% to 120% of the biblio-
graphic file size.

Performance

Index lookup is fast because each lookup only requires one
disk access. A multi-term search runs in time approximately
proportional to the total number of postings for all the terms
inthequery.

Bibliographic record access is also fast because there is no
indirection: the postings records directly address the biblio-
graphic records, so again there is only one disk access per
record.

File inversion is relatively slow and cpu-bound because of
the multi-pass linguistic processing during index term
extraction. As a rough guide, inversion runs at about one
minute per megabyte of indexable text on a lightly loaded
Sun 4/330.

Limits

  Maximum bibliographic file size: 32 gigabytes
     but maximum index size 4 gigabytes
  Number of records per database: no practical limit
  Postings per index term: no practical limit
  Maximum amount of data which can be treated as a
     `trecord" for retrieval purposes: this is a system
     parameter usually set to 16 kilobytes. Up to 64 K or
     more is acceptable.
  Maximum field length: same as record size
  Maximum number of fields per record: 31
  Maximum index term length: 127 characters
  Maximum number of terms in single query: 32
     (interactive Okapi only)


                                            30