Okapi at TREC Stephen E. Robertson, Stephen Walker, Micheline Hancock-Beaulieu, Aarron Gull, Marianna Lau Centre for Interactive Systems Research Department of Information Science City University Northampton Square London BC1V OHB, UK Advisers: Karen Sparck Jones (University of Cambridge); Peter Willett (University of Sheffield); E. Michael Keen (University of Wales). Abstract: The Okapi retrieval system is described, technically and in terms of its design principles. These include simplicity, robustness and ease of use. The version of Okapi used for TREC is further discussed. Designing experiments within the TREC constraints but using Okapi's supposed strengths proved problematic, and some compromise was necessary. The official TREC runs were (a) very simple automatic processing of the ad-hoc topics; (b) manually constructed ad-hoc queries; (c) feedback on the manual queries from searchers' relevance judgements; and (d) routing queries automatically obtained using the training set in a form of relevance feedback. The best run (manual with feedback), although not up to the best reported TREC results, was respectable, and an encouragement to further development within the same principles. 1. Introduction Okapi is an experimental text retrieval system, designed to use simple, robust techniques internally and to present a user interface which requires no training and no knowledge of searching methods or techniques. It is presently accessible by academic users at City University, with the library catalogue and a scientific abstracts journal as databases. It is used for experimentation with and evaluation of novel retrieval techniques. 21 A design principle of Okapi is that simple techniques, without Boolean logic but with best- match searching, and with little in the way of a manually constructed knowledge base, can give effective and efficient retrieval. `Simple' also implies minimum effort, either manual or machine, either at the set-up stage or at input or at search time. In particular, relevance feedback (which requires little or no additional user effort, since users must make such judgements anyway), provides a mechanism whereby an initial query formulated with no great effort can be improved. Such a search process might be regarded as having something of the character of browsing: an exploration of a topic rather than a precise specification. In some respects (e.g. highly elaborate topic specifications; no evaluation of interactive systems) TREC does not at all represent the kind of retrieval activities for which Okapi was designed. However, our approach to TREC has been to try to arrive at some compromise between the aims of Okapi and those of TREC. The resulting performance was not spectacular, but was (we believe) respectable enough to encourage us to pursue the ideas further. 2. Background: the Okapi project The following is a description of Okapi as it existed before the start of TREC-related work ("interactive Okapi9'). Section 3 discusses some changes which happened concurrently with, and were necessary for, the TREC work. Okapi is a family of bibliographic retrieval systems, developed under a series of grants from the British Library. It is suitable for searching files of records whose fields contain textual data of variable length up to a few tens of thousands of characters. It allows the implementation of a variety of search techniques based on the probabilistic retrieval model, with easy-to-use interfaces, on databases of operational size and under operational conditions (Walker, 1989; Walker & De Vere, 1990; Walker & Hancock- Beaulieu, 1991; Hancock-Beaulieu & Walker, 1992). The main purpose of the Okapi installation at City is to allow the use of a variety of evaluation methods, including live-user evaluation in the context of user information-seeking behaviour. 2.1 Search techniques The interactive Okapi system uses probabilistic "best match" searching, and can handle queries of up to 32 terms. (There is no Boolean search facility in interactive Okapi -- but see 3.2 below concerning the development system.) Search terms may be keywords or phrases, or any other record component which has been indexed, and are extracted automatically by very simple "parsing" of an initial natural language query. Search terms are assigned weights, based on inverse document frequency in the absence of relevance information and on the F4 formula given in Robertson & Sparck Jones (1976) when relevance information is available. The match function is a simple sum-of-weights. There are facilities for "adjusting" the weighting to favour (for example) terms occurring in specified fields. There is also a limited alphabetical browsing facility (of records in index term order). The F4 formula, point-5 version, is: (r+O.5) (N-R-n+r+O.5) w = log (R-r+O.5) (n-r+O.5) where N = collection size n = number of postings of term R = total known relevant documents r = number of these posted to the term The inverse document frequency (IDF) weight is 22 F4 with R=r=O, i.e. w = log (N-n+O.5)/(fl+O.5) 2.2 Relevance feedback and query expansion The system can invite relevance judgments from the user, and following one or more positive relevance assessments it can perform an "expanded" search, using the original query terms together with additional terms extracted automatically from the relevant records. This procedure can be iterated. 2.3 Language processing Very simple text and linguistic processing is applied during indexing and searching. There are two levels of automatic stemming, and a mainly rule-based procedure for conflating British and American spellings. There are facilities for constructing and using a simple linguistic knowledge base containing "go" phrases, classes of terms to be treated as synonymous, prefixes, stopwords and phrases, and "semi-stopwords"--- words and phrases to be treated as relatively unimportant in processing a query. 2.4 Usage The interactive system is intended for highly interactive use by untrained users. 2.5 Logging The system can produce detailed logs of both user and system activity, down to keystroke level and sub-second granularity. 2.6 Present use and status The present use of Okapi is primarily as a tool for the evaluation of highly interactive bibliographic search systems with untrained users. It is also to be used in an investigation of the use of linguistic knowledge structures (e.g. thesauri) in text retrieval systems. The system is not commercially available. It is not finished, maintained or documented to commercial standards. It is, however, designed for live use, and there has, over the years, been a considerable amount of use under live conditions. It is a set of functions from which experienced designers and programrners can construct retrieval systems, rather than a finished "product". 3. Concurrent developments 3.1 Towards a distributed system This development reflects a long-standing plan for the Okapi project, but was brought forward to facilitate work on the TREC database. Okapi has been split into a Basic Search System (BS S) and a number of front-end systems. The BSS is essentially a database engine offering basic text retrieval functionality, extended in various ways to allow weighting, ranking and relevance feedback etc. Although the front-end systems at present reside on the same machine, the dialogue between the front-end and the BSS is roughly comparable to that which might take place using the Z39.50 or Search & Retrieve protocols. It concerns mainly specifications for and descriptions of search sets, and involves actual records only at the time of display. All automatic searching for the TREC project involved purpose-written front-ends to the BSS. A further front-end was developed for manual searching. This was designed to include most of the functionality of the old interactive version of Okapi, but not to emulate its user interface; it is command-driven. 3.2 Mixing Boolean and weighted searching One characteristic of the BSS needs explaining. The BSS is capable of conducting Boolean searches as well as weighted (best match) searches. Furthermore, any Boolean expression (resulting in an undifferentiated search set) can be treated as if it were a single term in the weighted searching model. This is compatible with the approach taken in the Cirt system (which acted as a front-end to a Boolean host) (Robertson et al., 1986); particular examples of uses in Cirt include ORed synonyms and phrases constructed with the ADJ operator. The Okapi BSS does not at present allow proximity operators such as ADJ, but the principle is the same. To a very limited extent, this facility was used by 23 the manual searchers (see 5.3). 3.3 Term selection for query expansion Interactive Okapi automatically selected terms from relevant documents for query expansion by taking the top x (=20) terms according to their relevance weights. The BSS version uses the Robertson selection value (Robertson, 1990), approximately r*w (where w is the usual F4 weight). (See also discussion in section 6.3, which shows that there was an error in taking this approximation.) Also, the interface used in the manual TREC experiments allows semi-automatic query expansion, in that the list of candidate terms can be displayed for the searcher to make selections from (and then entered manually), or the top 20 terms can be used automatically. Terms once selected are weighted using F4 in the usual way, except with the modification indicated below. 3.4 Bias towards query terms In interactive Okapi, the terms in the original query held no special position in the query expansion process, except in the sense that a "semi-stopword" in the original query would be a candidate for the feedback query, whereas the same term occurring in a relevant document but not in the query would not be considered. For the TREC experiments, some bias in favour of query terms was built in, in the form of some hypothetical relevant documents assumed to contain the query terms (Harman, 1992; Bookstein, 1983). These hypothetical relevant documents then contributed to the calculation of F4. Different quantitative assumptions were made in different TREC experiments (see section 5), but once again an error crept into the implementation of this facility (see section 6.3). 4. Input processing 4.1 Converting the raw files The Okapi system needs databases to be in its own format, in which each record consists of an identical sequence of fields in the form of terminated text strings. Fields are identified by sequence number only. Using the given information about the makeup and structure of the source material, together with quite a lot of trial and error, a program (lex + C) was written to convert all the raw datasets into a unified 25-field structure. The only fields common to all input were "text't, document-ID and a "source" field containing "fr", "doe" etc., so all records consisted mainly of empty fields. The "source" fields were intended solely as "limit" criteria, but were unused except perhaps by one or two of the human searchers. Fields other than the three mentioned were used solely for display. Any records longer than 64K were truncated at 64K. This truncation only affected the text field (field 25). Conversion, which included decompression, conversion to Okapi format using the lex-C program and a second-stage conversion to runtime format ran at about 10 records/sec on a SPARC machine. 4.2 Inversion scratch the surface of the problem. Stopwords: 120 Semi-stopwords: 256 These were humanly selected following trial indexing runs. The criteria were (1) small retrieval value and (2) high posting count. Examples: 100, begin, carry, date, december, enough, include, meanwhile, run, take, why, without, yesterday. Prefixes: 18 The purpose of this list is to cause - and to be treated identically for any value of Go phrases: 27 Examples: cold war, middle class, saudi arabia Synonym classes: 300, containing about 700 words Examples: australia, australian, australasia, australasian buyout, buy out mit, massachusetts institute of technology porn, porno, pornography, pornographic The text field was reduced to "words", stemmed using the moderate-strength Porter algorithm (Porter, 1980) with modifications aimed at conflating British and American spellings, filtered through a local database (GSL, see below) containing stopwords, semi-stopwords, prefixes, a few "go" phrases (phrases to be treated as words), and a list of classes of words and phrases to be treated as synonymous. The document-ID field was extracted unchanged. Inversion took about 33 hours CPU on a SPARC machine (about 6 documents per second, but this increases more than linearly with number of documents). The result was a simple inverted file structure with no within-field positional information (insufficient disk space). There were facilities for limiting searches by source dataset, by various document length ranges and by odd/even half-collection (for comparison experiments). 4.4 Some statistics First Second part 511514 part 230936 Total documents Truncated 603 (over 64K) Size(MB) 1107 (bibfile only, runtime format) Inversion 44 overheads (%) Unique index terms (excluding document numbers) Mean unique index terms/document Postings (excluding document numbers) Mean postings/document (document "length") Both 742450 531 1134 759 1866 N/K 44 1040415 1A3 95898880 132 4.3 The local GSL database The Go-See-List (GSL) for the TREC experiments was based on exisfing databases, but was somewhat extended for TREC. Both original and extensions were derived in a fairly ad-hoc fashion (some entries were identified by examining a list of the most frequent terms in the first part of the TREC collection). This is not a sophisticated facility, and can only be said to 24 5. Experiments Following the TREC design, results were submitted for routing queries on the second set of records, and for ad-hoc queries on the combined set. Routing queries were processed automatically only (section 5.2; results table cityri); ad-hoc queries were processed automatically (5.1; cityal) and manually, with feedback on the manual searches (5.3; results without feedback in table citymi, and with feedback in citym2). In accordance with the general philosophy of Okapi, the TREC experiments were used to test the use of simple statistical techniques, with minimal linguistic processing, minimal searcher knowledge of techniques of searching, and indeed minimal effort generally. The routing test was intended to address mainly the value of relevance feedback (term selection and weighting) in a routing context where relevance judgements are accumulated from earlier runs. Automatic ad-hoc queries tested the weighting scheme without relevance information. Manual ad-hoc queries tested the combination of human intelligence with a simple weighting scheme, with and without feedback. 5.1 Automatic processing of topics The basic principle was to take specific section(s) of the topic and parse them in standard Okapi fashion, as if they had been typed in verbatim by a searcher. Thus stopwords were removed; a few phrases and/or members of synonym classes were identified; remaining words were stemmed; all search terms (stems or phrases or synonym classes) were weighted using IDF (see section 2.1). No special account was taken of the negative phrases which appear in some of the TREC topics, so that negated words would have been given positive weights by Okapi. The selection of the topic sections was the subject of a very small amount of initial experimentation using the training set. The differences were not very consistent and in some cases small, and more testing would have been useful. However, marginally the best overall was Concepts only, and that was what we used for the returned results. The results for the above automatic analysis of ad-hoc queries are given in the official tables as cityal. 5.2 Routing queries The principle on the routing queries was to assume that all the known relevant documents from the first document set were already available for a relevance feedback process. Thus any actual searches conducted on the first document set, and their actual outputs, played no direct part in the 25 formulation of the routing queries, with one exception discussed below. However, the terms extracted from the topics took part in the relevance feedback process in the manner indicated in section 3.4, with a bias equivalent to 10 supposed relevant documents in all of which the topic terms were supposed to occur (10 out of 10 bias). The exception to the above statement was that for some topics, some additional relevance assessments were made (that is, additional to those provided centrally). These were based on the top ranked documents retrieved in automatic searches on the first document set. (See section 6.1 for a discussion on the local relevance judgements and on the reasons for this decision.) The results for the above analysis of routing queries are given in the official tables as cityri. 5.3 Manual searching and feedback The central idea behind these experiments was to approach as closely as possible the situation of a naive or inexperienced user. In other words, we wanted to gain some idea of how the system would perform if searched by an end-user with little or no knowledge of information retrieval. This intention reflects the design principles of the interactive Okapi, as discussed in section 2 above. To some degree, however, both the design of the TREC experiment in general and the constraints of the distributed system described in section 3.1 forced deviations from that ideal. 5.3.1 Searchers The first constraint is of course that we had no access to end-users (and more particularly, no access to end-users with the specific characteristics of the TREC analysts). We used a panel of searchers, mainly information science students who could be said to have some knowledge of searching in general, limited domain knowledge (depending on the topic), and no particular knowledge of the system. ~or reasons to do with the very limited time available for these searches, it was necessary to use project staff for a few searches; these staff obviously had more knowledge of the system.) The somewhat limited interface to the BSS which was used for this experiment required some training of the searchers. Clearly one would in general expect end-users to have more domain or subject knowledge, especially for the kinds of queries provided for TREC. Highly interactive systems in general, and Okapi in particular, may be assumed to exploit such subject knowledge; clearly relevance feedback in ad-hoc searching can only work well if it is relatively easy for the user to find some relevant items from the initial search. In this sense, we see the present experiment as to some degree unfavourable to Okapi. 5.3.2 Searching Searchers were expected to make whatever interpretations of the topic they deemed appropriate for the purpose of searching. In other words, they could use words or phrases taken from any part of the topic, or from their own general or specific knowledge. They could also have used other reference sources. However, they were encouraged to use the system to help them refine the search, in the way that an end-user might explore the possibilities within the system and try out different combinations of search terms. The combination of these ideas with the TREC rules was a little clumsy and artificial. The procedure was as follows: (a) The searcher was given the topic in full, as received by us. (b)The searcher examined the topic and chose some terms as candidates for searching (possibly including terms not in the topic as received). (c) The searcher made exploratory searches, examining the results, making tentative relevance judgements and perhaps using the semi-automatic query expansion facility (see section 3.3) to suggest new terms. (d)Having decided on an initial formulation, the searcher then finished the exploratory session and started the definitive session. (e)The definitive session involved two stages, an initial search and a first iteration feedback search. The initial search was strictly in accordance with the selected initial formulation; the searcher examined the top few documents, making relevance judgements. (f) The frrst iteration feedback was purely automatic from the relevance judgements, including re-weighting and automatic 26 expansion. No further iterations were conducted. The guidelines to the searchers included the following: Time: Searchers were asked to allow very roughly 30 minutes per topic. In fact, the average was nearer 50 minutes. Feedback: The guidance was to assess about the first 20 documents retrieved by the initial search, or to stop after finding about 8 relevant (if that was sooner). Relevance: If it seemed to be difficult to find any relevant items, searchers were encouraged to make generous relevance judgements, so as to ensure that there was some basis for feedback (see also section 6.2 below). 5.3.3 Remarks on the system The bias in favour of initial formulation terms in the relevance feedback formula was 2 out of 3 (i.e. 3 supposed relevant documents out of which 2 were supposed to contain the term). Searchers were able to use the Boolean facility described in section 3.2, for example to treat an expression such as (A and B) as if it were a single term, to be weighted like any other. However, the emphasis was on the usual (in the Okapi context) weighted searching of single terms, and this facility was used only occasionally, and only as part of larger best-match searches. In other words, this use did not compromise the characteristic of weighted searching as truly "best match", with all the flexibility that that implies. 5.3.4 Choice of terms The terms chosen by the searchers may be briefly characterized by the following statistics: Average num~r of terms 12.9 Terms appearing in the topic 10.5 (81%) Terms appearing in different fields: Description 3A Narrative 6.0 Concept 7.5 Others 2.9 (these add up to more than the total because a term may occur in more than one field). For comparison, the Concept field has around 19- 20 terms on average. The results for the manual ad-hoc queries are given in the official tables as cityml (without feedback) and citym2 (with feedback). For a discussion of the results, and of the evaluation method for citym2, see section 7. 6. Some observations on the experiments 6.1 Local relevance judgements We experimented with making our own relevance judgements, based on the topics as provided. Although these experiments were on a very small scale and not very systematic, our impression was that it was usually possible to reproduce the judgements provided centrally, with a high chance of agreement. If this is so, it presumably reflects (a) the relafively highly specified nature of the topics (as compared to most IR queries!), and (b) the fact that the centrally-provided judgements are being made by experts other than the original requester. Thus we felt justified in attempting to improve our routing queries by providing some more relevance judgements of our own, particularly in cases where there were few centrally-provided ones. Note that the relevance weighting method used (F4 formula in section 2.2) takes account only of positive relevance judgements; items judged non-relevant are combined with items not judged (the complement method: Harper and van Rijsbergen, 1978). However, as indicated in section 5.3, there were topics for which (under strict relevance criteria) the relevant documents were very sparse, and relevance feedback would not have had much effect. In these cases, for the manual searches only, searchers were encouraged to make more generous relevance judgements (i.e. to accept as relevant some documents that did not meet all the criteria precisely). The argument behind this guideline was that relevance feedback should work better given some partially-relevant items than with few or no relevant items. This argument obviously requires testing. 6.2 Bias to query terms The bias in favour of original query terms discussed in section 3.4 was an attempt to represent the prior knowledge that a term chosen by the original requester or a searcher is likely to be good in terms of the probabilistic model. This 27 argument relates to, but is not limited to, Harman's argument about negative weights (Harman, 1992). The point-S formula used in the relevance weighting model actually has a built-in bias which might be described as "0.5 out of 1". The biases used in different TREC experiments (10 out of 10 and 2 out of 3) were chosen arbitrarily; unfortunately there was no time to do any extensive tesfing to enable a better-informed decision. A bias such as 2 out of 3 has the curious effect of downgrading some very good query terms (any term that occurs in all the known relevant). This was part of the reason for trying the 10 out of 10 bias. However, there may be good reason for this effect: even very good results on the known relevant should not persuade us that p is actually unity. 6.3 Two implementation errors There were also two errors in the implementation of this bias. In the relevance weighting formula, the probability p (that the term occurs in a relevant document) is estimated directly from the known relevant documents; the bias is correcdy used to modify this estimate (e.g. r->r+2 and R->R+3 in the formula p=r/R). But the corresponding non-relevance probability q is normally estimated by the complement method (i.e. all documents in the collection not known to be relevant are assumed to be non-relevant, q= (n-r) I (N-R)). In the implementation used for TREC, the modifications to r and R were incorrectly carried over to the q estimate. The second error occurred in the term selection value for query expansion. The full selection value should be w (p-q). Since q is normally very small compared to p, this can be approximated by wp. Since in a simple relevance feedback version, p=rIR and R is the same for all terms (i.e. the number of known relevant), ranking in wp order is the same as ranking in wr order. So in the TREC implementation, wr was used. However, the modification to R for query terms invalidates the second assumption (that R is the same for all terms), so wp should have been used. These errors will have had the effect of over- emphasizing some infrequent query-terms, but will probably not have affected the overall results greatly. 7. Results and discussion Full results can be seen in the official tables. The evaluation of the feedback run was treated in a somewhat special way, by agreement with the organizers. The original plan had been to do "residual ranking" evaluation, i.e. to remove from the collection those items which were assessed for relevance for feedback purposes, and to evaluate two runs (with or without feedback) on the reduced collection. This would have allowed a comparison between these two runs, but not between the feedback run and any of the other results presented. Instead, a "frozen rank" evaluation was used, in which the documents examined for relevance before feedback were retained as the top-ranking documents in the feedback run. This simulates a real search, in that those documents would have been seen (in some form) by the user and would therefore have to be regarded as part of the output of the system. Therefore it may be seen as a fairer evaluation of feedback than residual ranking, although it is likely to reduce the apparent effect of feedback. A very brief summary of the results, taking just two measures from the tables, is as follows: 11-point Precision average at 5 docs cityal 12.1% 49.6% Ad-hoc auto citymi 15.6% 57.6% Manual citym2* 18.2% 58.8% Feedback cityri 17.7% 54.8% Routing (*Frozen ranks evaluation) The performance of the automatic ad-hoc run is really rather poor. The manual run without feedback is better. Feedback does clearly produce an improvement (though not, of course, given the frozen ranks evaluation, at the high-precision end). It seems that both the choice of terms and the liberal relevance judgements by non-expert students are effective at least to some degree. (We have yet to compare the individual judgements by the students with the "correct" ones provided by the TREC organizers, or to establish whether the "correct" judgements would 28 have given us greater performance benefits.) The routing results seem reasonable. In general, we believe that the simple, robust and minimum-effort methods we have adopted in Okapi have been shown to work, even with very different material (both documents and queries) from that for which Okapi was originally designed. Performance, both in absolute terms and relative to the other TREC entries, is respectable but by no means wonderful. We also believe that there is much scope for improvement; there are other simple and robust methods (such as other weighting formulae or different treatments of compound terms) to which Okapi would be hospitable, and which may bring performance up to a more acceptable level. We look forward to TREC 2. References Bookstein, A. (1983). Information retrieval: a sequential learning process. Journal of the American Society for Information Science 34(5) 331-342. Hancock-Beau lieu M. & Walker S. (1992). An evaluation of automatic query expansion in an online library catalogue. Journal of Documentation 48(4) 406A21. Harman, D. (1992). Relevance feedback revisited. In: SIGIR 92-- Proc. 15th International Conference on Research and Development in Information Retrieval, ACM Press, 1-10. Harper, DJ. & van Rijsbergen, CJ. (1978). An evaluation of feedback in document retrieval using co- occurrence data. Journal of Documentation 34(3), 189- 216. Porter, M.F. (1980). An algorithm for suffix stripping. Program 14(3)130-137 Robertson, S.E. (1990). On term selection for query expansion. Journal of Documentation 46(4), 359-364. Robertson, S.E. & Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society of Information Science 27(3), 129-146. Robertson, S.E., Thompson, C.L., Macaskill, MJ. & Bovey, J. (1986). Weighting, ranking and relevance feedback in a front-end system. Journal of Information Science 12(1/2), 71-75. Walker, S. (1989). The Okapi online catalogue research projects. In: The online catalogue: developments and directions, edited by Charles R Hudreth. Library Association. 84-106. Walker S. & De Vere R. (1990). Improving subject retrieval in online catalogues: 2. Relevance feedback and query expansion. British Library (British Library Research Paper 72.) ISBN 0-7123-3219-7 Walker S. & Hancock-Beaulieu M. (1991). Okapi at City: an evaluation facility for interactive IR. British Library Research Report 6056. They are used to determine the linguistic processing to be applied to queries and the parameters to be used for index lookup and for the extraction of terms for automatic query expansion search mnemonics (e.g. ~, ABS, AUTH) used in query parsing display parameters, defining two levels of display language knowledge bases Up to three of these may be associated with a database to allow linguistic processing to depend on the type of data being searched or extracted. Typically, these are common to a number of databases of similar type and usage. Input Appendix: System architecture Platform The system runs on Sun hardware. It should port fairly easily to other UNIX platforms, at least of the BSD type. All the search and indexing code is in C. Source file conversion programs and log analysis programs may be written in awk. Database structure A database consists of text file (1)ibliographic file) this is the dataset from which searches retrieve records up to three indexes Bach index consists of primary and secondary dictionaries and a posting file. There are several types of index. One contains no positional informa- tion below the level of records, and is suitable for "phrases" like personal names and titles. Others contain positional information in the form field, sentence, word number for every occurrence of every indexed term. An index can contain terms for up to 16 different types of search. a set of parameter files: database description parameter indexing parameters (one set for each index). These define how indexing is to be performed in terms of linguistic knowledge base, stemming function, procedure for extracting index terms and the fields and subfields from which they are to be extracted. search type (or group) parameters These are closely related to indexing parameters. 29 Source files are stored in a simple format where each record starts with a field directory giving the length of each field, followed by the text of the fields. Fields may contain a limited range of subfield or role markers, indicating the nature of the following data. There are facilities for importing a few types of bibliographic files, including UKMARC and ISO 2709. An "Okapi exchange" format also exists. Character coding is ASCII with a "shift" character (`\`) to allow the encoding of characters above hex 7F. No data compression is used. Output (interactive Okapi only) Output is to character-based terminals or windows, or hard copy. There are two levels of record display and printout -- brief (one line) and full. Record layout is determined by parameters and is fairly flexible. Database maintenance There are no record editing and no index updating facilities. Source file and indexes must be completely regenerated when necessary. Indexing storage overheads Several types of index are available. Depending on the nature of the database and the extent of indexing required overheads range from about 10% to 120% of the biblio- graphic file size. Performance Index lookup is fast because each lookup only requires one disk access. A multi-term search runs in time approximately proportional to the total number of postings for all the terms inthequery. Bibliographic record access is also fast because there is no indirection: the postings records directly address the biblio- graphic records, so again there is only one disk access per record. File inversion is relatively slow and cpu-bound because of the multi-pass linguistic processing during index term extraction. As a rough guide, inversion runs at about one minute per megabyte of indexable text on a lightly loaded Sun 4/330. Limits Maximum bibliographic file size: 32 gigabytes but maximum index size 4 gigabytes Number of records per database: no practical limit Postings per index term: no practical limit Maximum amount of data which can be treated as a `trecord" for retrieval purposes: this is a system parameter usually set to 16 kilobytes. Up to 64 K or more is acceptable. Maximum field length: same as record size Maximum number of fields per record: 31 Maximum index term length: 127 characters Maximum number of terms in single query: 32 (interactive Okapi only) 30