Query Improvement In Information Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project Jing-Jye Yang, Robert R. Korfhage Department of Information Science and Edie Rasmussen Depar~ment of Library Science School of Library and Information Science University of Pittsburgh Abstract We have been developing an adaptive method using genetic algorithms to modi~ user queries, based on relevance judgments. This algorithm was adapted for the Text REtrieval Conference (TREC). The method is shown to be applicable to large text collections, where more relevant documents are presented to users in the genetic modification. The algorithm also shows some interesting phenomena, such as parallel searching. Further studies are planned to adjust the system parameters to improve its effectiveness. 1. Introduction Information retrieval can be viewed as searching in a high-dimensional document space to bring relevant documents to users. It is known, however, that this is not an easy task. Due to the complexity of document writing styles and the difficulty users have in presenting their information requests, the retrieval results often frustrate users. Returning to the system with a modified query becomes unavoidable if a user wants to improve the retrieval results. However, most systems provide litde or no guidance to the user for modif~ing the original query, adding to the frustration. Using relevance feedback to help users solve the problem has long being viewed as a promising agenda. 31 Relevance feedback, although started more than twenty years ago, attracted more research in recent years. With advances in computer hardware and software, the feedback process, which requires more computing time and friendlier user interface than the traditional method, will dominate the information retrieval area. However, feedback techniques also need to be improved. They should include more intelligent mechanisms which embed the ability to adapt to the changing environment and handle the feedback process automatically. We have been developing an adaptive method using genetic algorithms to modify user queries automatically. Our preliminary experiments based on several simple data collections showed promising results (Yang and Korfhage, 1992a, 1992b). This algorithm was applied to the TREC project. This paper reports the experiments and some of the results. 2e The Model Our system is based on a vector space model. Within the model, both documents and queries are represented as vectors. A document vector (D~ ) with n keywords can be represented as: = (t~1, t~2~ t~3 , where (, = 1......, n) is the assigned value of the 1h keyword. Either weighted or unweighted document keywords can be used. For unweighted document keywords, tij is either 0 or 1. For weighted document keywords, the value of tij is a real number in the range [0,1]. In this model a query (Q) with m keywords is also represented in a vector format as: Q = (qtj, qt2..., qt~) where qt~(k= 1,..., m) is the kth keyword in the query, where the keywords are extracted from the user natural language requests. However, no weights are assigned to query terms in the initial stage; query term weights are assigned by the system during the processing to be described later. In general, under vector space models, document retrieval is based on a similarity measurement between the query and documents. This means that documents with high similarity to the query are judged more relevant to the query and should be retrieved first. In our model, the similarity measure is based on a metric, the L~ metric, which measures the distance between document vectors and query vectors. This means that the shorter the distance between a document and the query, the higher the similarity between them. The distance value between a document D~ and the query Q is calculated, using the Lp metric, by the formula: 32 DIS(D~, Q) =11 D~, Q II = (Xltik - qt~~)lIP k where k, (k = 1......, m), ranges over the set of query terms. Theoretically, the value of p can be in the range [1, J. In the current experiments, p is set to 2. That is, the distance measure is the Euclidean distance. Note that only terms which are present in the query are used in the formula. This means if a term is present in the document but not in the query, the term is assumed of no concern or not known by the user. In other words, a document will be considered in the retrieval process only if it has at least one term in common with those in the query. However, variations exist in the actual implementation. The keyterms in documents are unweighted in the ~TREC databases we have used. That is, the elements in the document vectors are in binary mode, 0 or 1. This was determined for two reasons. First, for a large database of this kind, obtaining document term weights using general term weighting methods, such as the inverse document frequency (e.g., Salton and Buckley, 1988) requires counting term frequencies from all of the documents in the database, which is resource intensive for a database of this size. Second, previous experiments using the Oranfield database indicate that our system performance equally well with either weighted or unweighted document terms. Each query vector is generated from a topic provided in the TREC project. In general, a topic includes several descriptive items: the sequential number of the topic, domain, title, descriptive, narrative, concepts, factors, nationality, etc. The terms of a query vector are derived from the title and the concepts of the topic. In some cases the nationality is added, if it is important to meet the requirement of the narrative. The query vector term weights are assigned by the system at the beginning of each run. 3. The Algorithm The method we developed is based on the relevance feedback concept. However, the algorithm underlying the process is totally different from those used in other relevance feedback studies. Our system is also based on genetic algorithms which are known for their efficiency and effectiveness in searching large problem spaces to find optimal solutions (e.g., Holland, 1975; Goldberg, 1989). The basic concept behind genetic algorithms is that there exists a population of individual variants (those individuals can be various types, depending on the actual problem to be solved) in an environment, each trying to survive by exchanging information and competing with others. The genetic algorithms use methods analogous to natural evolution mechanisms to generate those individuals who provide the optimal or near optimal solution to the problem. The adaptability of genetic algorithms is due to the fact that individual selection and manipulation are adaptive to the existing environment. The environment evaluates the fitness 33 of the individuals, and the results form the feedback from the environment to the system where the genetic process is applied. The efficiency of the algorithm is due to parallel searching, with a set of individuals exploring the problem space simultaneously. Information of better structures is stored, exchanged and transferred from generation to generation, and combined to make individuals having all or most of the better features of their predecessors. A few researchers have applied genetic algorithms to information retrieval. Gordon (1988) used a genetic algorithm to modif~ document descriptions to facilitate the retrieval of relevant documents for a group of users who are interested in similar topics. Frieder and Siegelmann (1991) developed a genetic algorithm to obtain an optimal allocation of documents onto nodes in distributed multiprocessors. In vector space models document retrieval can be seen as searching a very high- dimensional document space to find the area(s) where most of the relevant documents to a user query are located. We have been developing a genetic algorithm in document retrieval through the modification of query term weights. The steps of the genetic query modification processes can be described as follows. (1) Generate query variants: As described above, the genetic algorithms are applied to situations where there is a group of individuals. To apply the techniques, there needs to be a set of query individuals. They are generated in this step. Each query individual vector has the same elements as in the original query (user's query). Each element within a query vector represents a weight corresponding to the related term in the original query. The weights are assigned randomly using a random number generator, with the values of the weights scaled to the range [0,1). The number of query variants is set in the beginning of the experiments and remains fixed in later processes of each run. (2) Individual query performance evaluation: This process evaluates the performance of each individual query variant. First, the distances between query variants and documents are calculated, using the formula mentioned in Section 2. Second, for each query individual, those documents whose distance values to this query individual are less than a given threshold are viewed as relevant to the user request and are retrieved. Finally, the retrieval performance (P) of each query individual is assessed using the formula: P=A -B-C where A is the number of retrieved documents which are judged as relevant to the user request; B is the number of retrieved documents which are not relevant; and C is the number of relevant documents that are not retrieved by the query individual. The performance values for the query variants are to be used in the genetic modification. (Note: the P value also can be in the form: P = A - B, in case the C value is unknown which is normal in the actual online retrieval process. Note also that P is the internal value used by the system and is unknown to the user.) 34 (3) The genetic process: After the evaluation process genetic operators are used to modify the structures of the query variants, with the aim of improving the retrieval effectiveness of the variants. (a) Selection: In this step those query individuals whose performances are greater than the average performance of all the query variants are selected and retained for further processing. Other query individuals are discarded. (b) Reproduction: In this process the survivors from the previous step are reproduced. In our algorithm, the reproduction algorithm developed by Baker (1987) is used. The number of offspring for each individual depends on the ratio of its performance to the total performance of all the survivors and the population size. Therefore, the individual survivor with the highest performance will have the most offspring. Also, in this step, the query variants are rearranged randomly in a sequence for later processing. We call the set of query variants after reproduction the semi-new generation. (c) Mutation: In the mutation step, some individuals are selected at random from the semi-new generation, and some of their genes (i.e., the term weights) are selected and assigned new values, also in the range [0,1]. The number of individuals to be selected depends on the mutation ratio chosen in the experimentation. The selection of genes and the new value assignments are also done randomly. (d) Crossover: The crossover process mates pairs of query individuals in the semi- new generation and exchanges parts of their term weights. Here, the two-point crossover method is used. For each pair, two positions in the term vectors are selected randomly, and all term weights between the two positions are exchanged for the two individuals. Thus, two new individuals are born. However, it may be the case that not all individuals are chosen to mate with others. The crossover ratio controls the number of mates. After the crossover operation, a totally new generation is created and the process goes back to step 2 and continues until a preset condition is satisfied, either a fixed number of generations has been processed or no more relevant documents are retrieved. The genetic retrieval algorithm is distinguished from other research using relevance feedback: (1) Several query individual variants exist and explore the document space simultaneously. (2) To modify the term weights, it is not required to calculate and analyze the document term weights, at least in the current study. Query term weight modification is done by the genetic mechanisms through the interchange of term weights between query individuals and the introduction of new weights by the mutation operation. (3) Only term weights are modified and no new terms are introduced and no terms are deleted. The importance of a term is shown by the value of its weight. 35 4. Preprocessing of Documents and Queries As in most information retrieval Systems, the documents and topics in the TREC project need to be preprocessed before they can be used by our algorithm. (1) Document processing: Document retrieval in our system is based on keyword match. In current system, a keyword' is a single word. For the documents, to facilitate the retrieval we need to create inverted files. The processes for creating the inverted files can be described as follows. (a) Keyword extraction: Keywords are extracted for each document. The stopwords (total 2,529 stopwords on the list) are deleted, and keywords are stemmed using the Porter stemming algorithm which was implemented by Fox ~rakes, 1992). The document number from which a keyword is extracted is also stored with that keyword, for it will be used to create inverted files and later to facilitate the retrieval of the full text of the document. Also an "address file" is generated which indicates for each document the offset of its location related to the original file within which the document is stored. (b) Creation of inverted files: An inverted file is created for each database to facilitate the retrieval process. The file is organized as: a keyword followed by a list of document numbers from which the keyword extracted. Three-level indexed files are generated for the inverted file to reduce the search time. Each index file include keywords, their offsets in the inverted file and the offset in the immediately higher level file (for the second and third levels). (2) Query processing: Queries are generated from the topics provided on the TREC project. For each topic a single query vector (the original query) is generated, which consists of a list of keywords from the title and the concepts of the topic. For some queries information about the nationality is also included if it is necessary to satisfy the requests from the narrative descriptions of the related topics. The stemming algorithm is also applied to the terms in the query. For the training queries we did not add any terms from other descriptive items on the topics. But for some ad hoc queries, we added several keywords from the narrative because we thought that those keywords would be useful to identify relevant documents. The routing queries are those query individuals from the last generation of the training queries. Ten query individual vectors were generated for each original query. On the training queries the initial query term weights for the query individuals were assigned randomly, and then modified by the genetic algorithm. The final query individual vectors from the training set were used as routing queries with no weights being modified. 1Keyword, keyterm and term Will be used interchangeably in this paper. 36 The assignment of the term weights to the ad hoc queries is different. Only one query vector (called query individual one) is generated for each ad hoc query. The term weights in query individual one were assigned either by using the weights of the same terms in the final generation of the training topics if they existed, or by the researchers by referring to the weights of the relative terms in the final generation of the training topics or to their importance related to the topic. Moreover, in the ad hoc queries, a term weight of the query individual one could be assigned different values depending on which database the query is used to against documents. Based on the query individual one, another nine query individuals were generated with the weights of each term in the nine individuals being normally distributed around the corresponding term weights on the query individual one. One interesting question in query processing is how to handle the situation where a concept with a NOT requirement is presented in a user's request (i.e., the topics in TREC). It is hard to deal with in some retrieval systems. However, in the TREC topics, there are several cases which include NOT in the concepts and the narrative items. Our solution to this problem is to assign negative weight to a keyword if it is described by the NOT concept or narrative item. Since our system is based on a distance measure, a negative assignment can cause the documents which include this keyword to have longer distance from the query than those without the keyword. 5. System Configuration Some features of our system on the TREC project are described in this section. (1) Window size for document retrieval In much feedback retrieval research, there is a fixed window size which decides the number of documents to be retrieved. The value usually was set between 5 and 20 (e.g., Salton, 1971). Aalbersberg (1992) used window size one in the feedback retrieval process and showed that the precision values for four standard databases, on average, are higher than those achieved with size greater than one. However, Aalbersberg also suggested that variable relevance feedback (Ide, 1971) is worth implementing in term weighting IR systems. In our original experiments on small databases a threshold value was generated for each query in the beginning of an experiment, as the criterion for determining the retrieval window. If the distance measure between a document and a query individual was less than the threshold value, the document was viewed as relevant to the query and was retrieved. Thus we had a variable window size. However, in dealing with the large databases we added another factor to decide the window size. As in the previous method, a threshold value is generated first as each original query is read into the system. Documents whose distance is less than the threshold will be regarded as relevant to the query and will be printed for evaluation. However, if more than forty documents satisfy the condition, only the top forty are retrieved for evaluation. If there is a tie, the selection is random among these tied. This number seems reasonable given the time constraints and focus of the study. 37 (2) User involvement and relevance judgment Since our system uses relevance feedback (on the training queries and the ad hoc queries), users (evaluators) are needed to evaluate the retrieved documents. The evaluators are our researchers in the ~EC project. The document evaluation process runs like this. On each iteration, for each query individual the retrieved documents are printed with both their full text (or abstracts or summaries if available) and document numbers. An evaluator examines the retrieved documents based on his/her reading of the text (or abstract or summaries) and understanding of the topic, and marks the document numbers which are judged relevant to the topic. The results of the judgment from the evaluators are fed back to the system. The system then applies the genetic modification process (as described in Section 3) based on the feedback information. (3) The feedback process The feedback process can be described as follows. The system uses the relevance judgment to calculate the performance (P) of each query individual based on the function described in Section 3.2. However, the parameters in the formula were changed as follows. First, there are no known relevant documents for each query, the value C is deleted. Second, in our initial tries on a few training topics we found that for these very large databases many nonrelevant documents are retrieved in the first generation. This would cause the individuals which retrieved few relevant and many nonrelevant documents to get lower P values than those which retrieved no documents or only a few nonrelevant documents. So instead of using equal weight for the values of A and B, we gave more weight to A. The formula became: P = 1O*A - B. This increases the chance of survival for those individuals which retrieve a few relevant documents. The weight 10 was chosen for the parameter A because in several initial tries (weight equal to 1, 5 and 10), 10 brought more relevant documents in the modified generation than the other weights. However, values higher than 10 were not used to avoid the query variants to converging too quickly so that the final query might be a local optimum. After computing the performance for each variant, genetic operations are applied to generate a set of new query individuals. Those new query individuals are then used by the system to retrieve documents. However, those documents which have been retrieved in previous generations are not presented in the new generation, even if they satisfy the selection condition. This strategy is similar to the residual collection method used by Chang et. al. (1971). The feedback process continues until either no more relevant documents are retrieved for any query variants or the user (the evaluator) decides to stop the process, mainly due to the time constraints. 38 6e Statistical Data (1) Computing time: Time is given for two types of processing. (a) Time for document processing: Document processing time includes keyword extraction, sorting and merge, and building inverted files. Two systems were used, a SUN -670 and SUN SPARQIPC(s), depending on availability. Table 1 provides the time used on several stages in document processing. (b) Time for document retrieval per generation The time required to retrieve documents for each iteration (generation) depends on several factors: the number of terms in a query, the generation, and the computing system used. Table 2 gives the average retrieval time for three generations, based on topics 1-50 and the first dataset (disk one). (2) Storage space for data structures: Several types of data structure were created to facilitate retrieval. They are inverted files, indexed files and address files which consist of document numbers and their locations in the databases. The storage space used for these files (disk one only) is shown in Table 3. (3) Machine capability: The capabilities of our systems are: SUN-670: 32 Megabytes RAM and 40 MHz CPU clock rate; SUN SPARC/IPC: 24 Megabytes RAM and 25 MHz CPU clock rate. (4) Manpower: There are in total four persons participating in the project, two Ph.D. students and two faculty. One Ph.D. student worked full time on this project, and the other part time, one faculty member half time and the other part time. Table 1. Time for document processing (Unit: min.) DOE AP ZIFF WSJ FR 19st 1'st 2nd 1~st 2~nd 1~st 2nd 1~st 2nd ___________________ ataset ataset ataset ataset ataset ataset ataset ataset ataset Keywordextraction 129 148 169 130 43 182 125 19 16 Sortingandmerge 293 281 NA 248 72 404 398 18 11 Create inverted files 29 NA 27 28 11 NA 34 2 3 * NA: Not Available 39 Table 2. Average retrieval time for topics 1-50 on the first dataset ~rnt: minutes) Generation 0 Generation 1 Generation 2 3:56 2:19 2:05 Table 3. Total amount of storage for inverted and indexed files -- disk one only ~rnt: Megabyte~ DOE AP ZIFF WSJ Invertedfiles 162.3 199.8 143.7 223.4 Indexed files 3.0 2.2 2.4 2.1 Addressfiles 4.3 1.7 1.7 2.6 7. Results This section describes several results of using the genetic algorithm in the TREC document collection. Examples provided are from training queries (topic 1 to 50) on the DOE database. (1) Query convergence In large document collections like the TREC databases, the genetic algorithm caused the query variants to converge within 3 to 6 generations in most cases. For an example, Table 4 shows the term weights of query individuals in the first generation, 0, and last generation, 5, on topic 3. For most of the query terms the weights on the query individuals converged to a single value in the final generation. Although a few variations existed, they were caused by the mutation operation. Table 5 shows a similar situation for topic 12, where the final generation is 4. An interesting phenomenon is how the query term weights changed. The genetic operators select the query individuals which have higher performance values than the average performance of all the individuals and exchange parts of their term weights by using mutation and crossover. Although the two operations are random, the results are interesting. The 40 experiments show that from generation to generation the average weight of each term in the query individuals gradually moves to the value in the final converged generation, although small variations may exist. As an example, Table 6 shows the changes of the average term weights on topic 3, from generation 0 to generation 5. The values on generation 5 are almost identical with those in Table 4. (2) Effects of query term weight modification The goal of query convergence using the genetic algorithm is to find the query individual with highest performance, that is, retrieving more relevant documents than its predecessors. Evidence from the experiments has shown the GA works as expected. In most cases, new relevant documents were brought in to the user in each generation, until convergence at the final generation. Table 7 through Table 11 show the numbers of new relevant documents retrieved in each generation for the five databases. Observing the results, one interesfing phenomenon arises; that is, for a topic the algorithm may retrieve different numbers of relevant documents on distinct databases. It seems reasonable since the five databases may concentrate on different areas. Moreover, the retrieval patterns for WSJ and AP databases, which should be interesting in the same topic, looks similar with each other. Table 4 Query individuals on Topic 3 (11 terms) Generation = 0 o 0.18 0.31 0.53 0.95 0.17 0.70 0.23 0.49 0.12 0.08 0.39 1 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.15 0.63 2 0.31 0.35 0.92 0.52 0.40 0.61 0.79 0.93 0.87 0.87 0.67 3 0.76 0.58 0.39 0.36 0.20 0.83 0.42 0.46 0.98 0.13 0.21 4 0.96 0.74 0.41 0.78 0.76 0.96 0.03 0.32 0.76 0.24 0.59 5 0.04 0.96 0.32 0.06 0.44 0.92 0.57 0.12 0.57 0.25 0.50 6 0.24 0.48 0.41 0.87 0.43 0.36 0.38 0.04 0.16 0.52 0.70 7 0.10 0.40 0.77 0.24 0.34 0.23 0.30 0.30 0.89 0.04 0.65 8 0.40 0.68 0.73 0.94 0.23 0.84 0.97 0.78 0.43 0.67 0.81 9 0.16 0.28 0.14 0.86 0.75 0.21 0.14 0.29 0.80 0.22 0.56 Generation = S 0 0.28 0.37 0.98 0.54 0.77 0.76 0.03 0.45 0.69 0.24 0.78 1 0.28 0.22 0.98 0.54 0.77 0.56 0.03 0.32 0.76 0.24 0.58 2 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78 3 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78 4 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78 5 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.45 0.69 0.24 0.78 6 0.28 0.42 0.98 0.54 0.77 0.66 0.13 0.45 0.69 0.24 0.78 7 0.28 0.22 0.98 0.54 0.77 0.66 0.03 0.32 0.76 0.24 0.58 8 0.28 0.37 0.98 0.54 0.77 0.67 0.13 0.32 0.76 0.24 0.55 9 0.28 0.37 0.98 0.54 0.77 0.66 0.03 0.32 0.76 0.24 0.58 41 Table 5 Query individuals on Topic 12 (12 terms) Generation =0 0 0.18 0.31 1 0.37 0.98 2 0.92 0.52 3 0.36 0.20 4 0.76 0.96 5 0.92 0.57 6 0.38 0.04 7 0.30 0.89 8 0.43 0.67 9 0.22 0.56 Generation =4 0 0.92 0.57 1 0.92 0.57 2 0.92 0.57 3 0.92 0.57 4 0.92 0.57 5 0.92 0.57 6 0.92 0.57 7 0.92 0.57 8 0.92 0.57 9 0.92 0.57 0.53 0.54 0.40 0.83 0.03 0.12 0.16 0.04 0.81 0.72 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.95 0.17 0.70 0.23 0.49 0.12 0.08 0.39 0.28 0.77 0.65 0.77 0.78 0.82 0.15 0.63 0.31 0.35 0.61 0.79 0.93 0.87 0.87 0.67 0.76 0.58 0.39 0.42 0.46 0.98 0.13 0.21 0.96 0.74 0.41 0.78 0.32 0.76 0.24 0.59 0.04 0.96 0.32 0.06 0.44 0.57 0.25 0.50 0.24 0.48 0.41 0.87 0.43 0.36 0.52 0.70 0.10 0.40 0.77 0.24 0.34 0.23 0.30 0.65 0.40 0.68 0.73 0.94 0.23 0.84 0.97 0.78 0.16 0.28 0.14 0.86 0.75 0.21 0.14 0.29 0.80 0.20 0.99 0.25 0.43 0.76 0.86 0.89 0.98 0.40 0.67 0.25 0.50 0.28 0.28 0.96 0.84 0.41 0.60 0.57 0.25 0.50 0.07 0.15 0.96 0.74 0.41 0.40 0.47 0.25 0.50 0.09 0.35 0.96 0.98 0.43 0.36 0.57 0.25 0.50 0.09 0.15 0.96 0.77 0.41 0.47 0.57 0.25 0.50 0.09 0.21 0.96 0.74 0.41 0.47 0.57 0.25 0.50 0.09 0.28 0.96 0.84 0.41 0.47 0.72 0.25 0.50 0.26 0.31 0.96 0.84 0.41 0.47 0.52 0.25 0.50 0.09 0.12 0.96 0.74 0.41 0.52 0.67 0.25 0.50 0.26 0.15 0.96 0.74 0.41 0.52 0.67 0.25 0.50 0.26 0.15 0.96 0.74 0.41 0.52 Table 6 Average query term weights on each generation for topic 3 Gen. ti t2 t3 t4 tS t6 t7 t8 t9 tlO til 0 .343 .515 .560 .612 .449 .631 .460 .451 .640 .317 .571 1 .514 .493 .747 .688 .585 .772 .592 .672 .734 .426 .671 2 .396 .386 .881 .552 .510 .787 .315 .516 .763 .348 .684 3 .301 .349 .938 .526 .511 .811 .262 .444 .785 .213 .700 4 .286 .342 .968 .556 .696 .746 .060 .346 .746 .240 .649 5 .280 .285 .980 .540 .770 .661 .050 .398 .718 .240 .700 42 Table 7. # of relevant document retrieved - DOE database Topic Generation 0 1 2 3 4 5 Ol 0 02 0 03 5 8 5 9 2 04 0 05 0 06 0 07 0 08 0 09 0 10 2 0 11 4 3 12 29 15 1 2 13 23 14 0 15 0 16 0 17 0 18 0 19 0 20 0 21 27 5 22 0 23 0 24 36 15 5 2 25 10 0 1 10 26 6 0 5 27 58 22 15 4 22 28 1 29 0 30 0 31 0 32 0 33 10 3 10 34 0 35 0 36 0 37 0 38 3 39 14 40 2 2 41 0 42 3 1 43 4 44 0 45 3 2 0 46 0 1 47 0 48 4 49 9 50 0 43 Table 8. # of relevant document retrieved - FR database Topic Generation 0 1 2 3 4 5 Ol 0 02 0 03 0 04 0 OS 1 06 0 07 0 08 0 09 0 10 0 11 1 12 2 13 0 14 0 15 0 16 0 17 19 18 6 18 0 19 0 20 0 21 0 22 0 1 2 23 0 24 0 25 0 26 0 27 0 28 0 29 0 30 0 31 0 32 0 33 0 34 0 35 0 36 0 37 0 38 0 39 0 40 0 41 0 42 0 43 0 44 0 45 0 46 0 47 0 48 0 49 0 50 0 Table 9. # of relevant document Table 10. # of relevant document retrieved - WSJ database retrieved - AP database Topic Generation Topic Generation 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Ol 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 0 29 1 30 0 30 20 31 0 31 3 4 32 0 32 0 32 0 33 6 33 0 1 1 34 3 34 0 35 2 2 35 0 1 36 1 36 0 37 0 37 2 38 1 38 0 39 0 39 0 40 1 40 0 41 0 41 0 42 0 42 2 1 43 0 43 0 44 0 44 7 5 45 0 45 0 46 1 14 46 0 19 47 22 47 8 48 0 48 1 8 49 23 20 19 1 49 12 50 0 50 0 * means more relevant documents were retrieved after this generation, but the document number has been added into this generation. 44 Ol 15 10 4 8 02 24 9 1 03 30 5 04 11 9 7 8 2 Os 3 06 26 2 5 5 5 7 07 4 5 08 15 11 8 09 22 4 5 2 10 43 24 6 5 3 11 10 3 2 12 26 2 13 29 14 2 15 45 16 0 17 2 18 31 19 50 20 1 21 18 22 21 23 2 24 28 25 1 26 6 27 1 28 6 16 8 12 8 2 22* 50 30 5 31 23 32* 6 10 13 7 3 7 2 3 9 3 7 1 25 13 2 7 9 1 3 1 1 12 6 19 5 12 11 2 17 5 7 7 6 0 2 1 11 14 14 8 3 7 26* 5 4 0 2 0 3 14 2 1 30 18 11 31 16 23 8 20 14 21 45 38 3 1 67 6 0 12 0 3 0 4 8 10 21 2 5* 28 35 62 74 88 29 48* 0 1 4 3 2 3 1 65 15 24 7 3 4 5* 1 2 16 14 10 2 0 0 1 2 0 3 1 Table 11. # of relevant document retrieved - ZIFF database Topic Generation 0123456 Ol 3 1 2 02 10 1 0 03 24 10 7 6 04 0 05 2 OG 0 07 1 08 0 09 0 10 0 11 3 12 0 13 0 14 0 15 47 3 2 16 0 17 0 18 0 19 0 20 18 11 15 21 2 1 22 4 23 0 24 0 4 25 0 26 73 75 10 3 5 5 27 28 27 17 28 34 29 8 29 9 4 1 30 8 0 2 31 11 5 1 32 18 4 33 13 20 13 15 7 6 9* 34 24 29 7 2 35 2 36 3 37 59 38 1 38 6 4 3 7 14 2 39 6 40 3 41 15 42 12 43 2 44 43 45 10 6 1 46 14 20 47 26 13 14 2 48 1 0 5 49 72 14 5 50 2 45 (3) Effects of the genetic operators - mutation and crossover Previous examples have shown the overall effect of the GA operations, where the genetic modification processes on the query term weights have improved retrieval results. However, in most situations, the function of the two important operators - mutation which assigns a new weight to a randomly selected term, and crossover which exchanges individual term weights between two randomly selected pairs - is difficult to display. However, in experiments on the DOE database we found one example, topic 13, which shows how mutation and crossover can create a new query individual retrieving, in this example, all of the relevant documents which are retrieved part by one of their parent and part by the other parent, together with the other new query individual retrieving none. Table 12 depicts this situation. The top part of the table shows the term weights of the query individuals. The left side are the two original query individuals - the parents, and the right side are the children after mutation and crossover. The rest of the table shows the documents retrieved by the parent 1, parent 2 and the children. We can see that one of the children has inherited all of the proper term weights from the parents, and this child retrieved all of the relevant documents retrieved by both of the parents. Note that although two of the term weights are changed by the mutation, it does not affect the combination effect because the difference between the changed weights and the original ones from one of the parents is not significant. Of cause, the effects of crossover and mutation are not always this clear cut. (4) Parallel search using multiple query individuals The genetic algorithm using multiple individuals in the document retrieval process provides the capability of parallel search with different query individuals searching the document space simultaneously. Each individual, with the same terms but different weights, may retrieve different documents which are located in different areas in the document space. This makes the genetic search distinct from other searching methods, where only one query is used. The effects of parallel search can be explained by showing the documents retrieved from several query individuals. Here we give the results from topic 24. Table 13 displays the term weights of query individuals in the first generation, and Table 14 shows the relevant documents retrieved by four different query individuals from Table 13. We can see that for these four queries, there is no overlap among the relevant documents retrieved. Thus each query individual emphasizes different concepts by assigning high weights to them. Documents which are more closely related to those concepts would be retrieved by different query individuals. The situation also can be viewed as having the query individuals handle the AND and OR operation simultaneously, which is impossible for methods using only one query unless they use the Boolean model. However, many researchers reject the Boolean model because it is difficult for users to apply the AND/OR operations correctly (e.g., Bookstein, 1985). 46 Table 12 The combined effect of crossover and mutation, topic 13 ti t2 t3 t4 t5 ti t2 t3 t4 t5 parent 1 .77 .65 .77 .78 .82 child 1 .65 .77 ~ parent2 .46 .98 .13 .21 .96 child 2 .98 .13 ~ Note: underhne means caused by crossover double underline means caused by mutation Documents retrieved by parent 1 Documents retrieved by parent 2 Documents retrieved Documents retrieved by child 1 by child 2 DOEl -05-0747 DOE 1-24-0861 DOE 1-05-0747 none DOE 1-11-0383 DOE 1-26-0555 DOE1-1 1-0383 DOE 1-44-0234 DOE 1-36-0791 DOE 1-44-0234 DOE 1-49-0051 DOE 1-50-0948 DOE 1-49-0051 DOE 1-69-0957 DOEl -79-0679 DOE 1-69-0957 DOE 1-81-0200 DOE2-27-0093 DOE 1-81-0200 DOE2-20-0798 DOE2-69- 1023 DOE2-20-0798 DOE2-27-0 146 DOE2-70-0787 DOE2-27-0 146 DOE2-32- 1039 DOE2-7 1-0280 DOE2-32- 1039 DOE2-37-0525 DOE2-80-0365 DOE2-37-0525 DOE2-48-0 100 DOE2-48-0 100 DOE2-49-0342 DOE2-49-0342 DOE2-7 1-0973 DOE2-7 1-0973 DOEl -24-0861 DOE 1-26-0555 DOE 1-36-0791 DOE 1-50-0948 DOE 1-79-0679 DOE2-27-0093 DOE2-69- 1023 DOE2-70-0787 DOE2-7 1-0280 DOE2-80-0365 47 The parallel search could be continued if the individuals retrieving different relevant documents have the same power, that is, the differences in the number of relevant documents brought by each of them are small. The results in Table 15 and Table 16 come from the same topic in Table 13 and Table 14, but from the second generation (generation 1) after feedback and genetic modification to the first generation. It can be seen that due to the genetic operations, parts of several query individuals have begun converging. At this point the queries group into three clusters ofsimilarqueries, (0,3,7,8,9), (1,5,6) and (2,4)-- Table 15. Within each group several term weights are identical. However, parallel search still continues. Differences in the other weights result in query individuals that retrieve completely different sets of documents. For example, contrast query 7 with the others in its cluster. Other examples from this study show that this effect may be observed with differences in only one or two term weights. Tables 17 and 18 show that the parallel search from this topic went on to the final generation, 3, where for each term almost all of the weights had converged to a single value. But query 7 has two different values at terms 4 and 21 which cause it to retrieve totally different relevant documents than the other query individuals. However, this is not always the case. In most situations, the genetic algorithm will force the query variants to converge to a single query; that is, all of the query individuals in the final generation retrieve the same relevant documents. The example on topic 24 also shows, as in previous cases, that new relevant documents are brought to the user in each generation, as indicated by the `*` following the document numbers. Table 13. Term weights of query individuals of topic 24, the first generation (generation 0) ti t2 t3 t4 t5 t6 t7 t8 t9 tio tlI t12 t13 t14 tls t16 t17 t18 t19t20 t21 0 18 .31 .53 .95 .17 .70.23.49 1 .63 .31 .35 .92.52 .40.61 .79 2.13 .21 .96.74.41 .78.76.96 3 .57.25 .50.24.48 .41 .87.43 4.30.89 .04.65.40 .68.73.94 5.14.29.80.22.56 .72.20.99 6.60.24.45.79.08 .48 .15 .25 7.78.71 .45.70.10 .96.55.74 8 .49.61 .42.13 .26 .04.98.11 9.84.84.90.59.54 .17.65.69 .12.08 .39.28 .37.98 .54.77.65 .77.78 .82.15 .93 .87.87.67 .76.58 .39.36.20.83 .42.46.98 .03.32.76.24.59.04.96 .32.06.44.92.57.12 .36.38 .04.16.52.70.10.40.77 .24.34.23 .30 .23.84.97 .78 .43 .67.81 .16.28 .14.86.75 .21 .25 .43 .76.86 .89.98 .40.43 .13 .46.24.99.65 .94.61 .99.48 .80.74.38 .48.53.10.59.35 .14 .58.64.78.19.30.28.68 .29.57.42.31 .44.57 .38 .65 .35 .55 .36 .57 .48 .16.62.17 .55 .29.87 .26.11 .81 .19.42.35 .84.14.26.18 .48.38 .50 48 Table 14 Relevant documents retrieved by query individuals for topic 24, the first generation (generation 0) Query individuals 0 3 6 Document number 8 DOE 1-57-0764 v DOE2-04-0727 v DOEl -75-0356 v DOE 1-56-0376 v DOE2- 14-0677 V DOE1-07-1283 V DOE 1-14-1165 v DOE 1-35-0677 v DOE 1-35-0739 v DOE 1-53-1065 v DOE1-77-0170 v DOE 1-85-0552 v DOE2- 15-0328 v DOE2-31-1277 v DOE2-61 -0632 v DOE2-28-08 19 v DOE 1-55-0792 V DOE 1-70-0125 v DOE 1-01-0620 V DOE 1-11-0738 V DOE2-73-0983 v DOE2-7 1-0057 v DOE2-73-0254 v DOE 1-85-0035 V DOE 1-06-1157 v DOE1-22-1 160 v DOE1-09-1204 v DOE 1-01-0602 V DOE 1-73-0028 v DOE 1-19-0847 v DOE1-21-1059 v DOE2-83-0561 v DOE1-09-1154 V DOE 1-30-0306 v DOE 1-33-0952 V DOE2-25-0240 v DOE1-07-1305 v DOE 1-70-0170 v DOE 1-73-0068 v 49 (5) Precision and recall The average performance of the current genetic algorithm can be shown using precision and recall values. Here we use data provided by NIST to show the results from the ad hoc queries and the routing queries. Note that in our experiment the ad hoc queries were used only on the second dataset (disk two). Table 19 and Table 20 display the average precision at the 11 recall levels from both the ad hoc queries and the routing queries, respectively. 8. Discussion The experimental results on the TREC document collection have shown that the genetic approach in query modification can be applied in large and varied databases. Evidence has indicated that the genetic query modification leads to the convergence of the query term weights. The results have also shown that additional relevant documents can be retrieved by modifying the query term weights using the genetic approach. We have also shown, through specific examples, the effects of crossover and mutation, and how variations in query term weights affect the retrieval of relevant documents. We believe that the genetic algorithm technique introduces new information into the retrieval process, allowing the system to do a better job of query modification. Evidence has shown that even without the use of genetic techniques our system employs parallel search, although the genetic modification brings additional relevant documents. For the system designer interested in Boolean techniques this permits exploration of several Boolean variants simultaneously, and also facilitates use of Boolean combinations of the individual query variants. Table 15. Term weights of query individuals of topic 24, the second generation (generation 1) ti t2 t3 t4 t5 t6 t7 t8 t9 tio til t12 t13 t14 tistl6 t17 tig t19 t20 t21 0.18 .31 .53.95 .17.70.23 .49.12.08 .39.28 .37.96.38 .48.69.77.78.82.15 1 .60.24.45.79.08.48.15 .25.94.61 .99.48.80.76.54.77.49 .10.59.35.14 2.49.61 .42.13.26 .04.98 .11 .38 .65 .35 .55 .36.57.48.16.74.57.55 .29.87 3 .18.31 .53.95 .28 .70.23.49.12.08 .39.28 .37.98 .54.77.53.37.78.82.15 4.49.61 .42.13.26 .04.67.25.94.61 .99.48.80.75.48.16.62 .17.55.29.87 5.60.24.45 .79.08.48.47.11 .38 .65 .35 .55 .36.55 .38.48 .53.10.59.35.14 6.60.24.45.79.08.48.15 .25.94.61 .79.28.37.98.54.77.65 .77.78.69.14 7.18.31 .53.95.17.70.23.49.12 .08.59.48.80.74.38.48 .53.10.59.48.98 8.18.31 .53 .95 .17.70.23 .49.12.08 .39.28 .37.98 .54.77.65 .77.78.82.15 9.18 .31 .53.95 .17.70.23 .49.12.08 .39.28 .37 .98 .54.77 .65 .77.78.82.15 50 Table 16. Relevant documents retrieved by query individuals for topic 24, the second generation (* new retrieved documents) Query individuals Document number 0 1 2 3 4 5 6 7 8 9 DOEl-57-0764 DOE2-04-0727 DOEl-56-0376 DOE1-07-1283 DOEl-14-l 165 DOEl-35-0677 DOE 1-35-0739 DOEl-53-1065 DOEl-77-0170 DOEl-85-0552 DOE2-15-0328 DOE2-31-1277 DOE2-73-0254 DOE1-75-0356 DOE1-06-1 164* DOE1-1 1-0738 DOE2-73-0983 DOE2-71-0057 DOE2-28-0819 DOE1-19-0847 DOE1-21-1059 DOE2-83-0561 DOE1-09-1154 DOE1-30-0306 DOE1-33-0952 DOE2-25-0240 DOE1-55-0888* DOE1-83-0442* DOE242-0268* DOE2-72-0035* DOE1-77-0125* DOE1AO-1 152* DOE1-1 1~0490* DOE1-15-1 164* DOE1-35-0747* DOE146-031 1* DoE1~73-0024* DOE2-39-0087* DoE2-71-0068* DOE2-71-0075* DoE1A3~0907* DOE1-7&1035* DOE1-82-0152* DOE2-14-0677 V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V 51 V V V V V V V V V V V V V V V V V Table 17 Query individuals on Topic 24 (generation 3) 0 0.18 0.31 0.53 0.95 0.17 0.70 0.23 0.49 0.12 0.08 0.39 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.15 1 0.18 0.31 0.53 0.95 0.17 0.70 0.23 0.49 0.12 0.08 0.39 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.15 2 0.18 0.31 0.53 0.95 0.17 0.70 0.33 0.59 0.12 0.08 0.39 0.28 0.37 0.98 0.54 0.77 0.75 0.77 0.78 0.82 0.15 3 0.18 0.31 0.53 0.95 0.28 0.70 0.23 0.49 0.12 0.08 0.39 0.28 0.37 0.98 0.54 0.77 0.55 0.77 0.78 0.82 0.15 4 0.18 0.31 0.53 0.95 0.17 0.70 0.33 0.59 0.12 0.08 0.39 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.15 5 0.18 0.31 0.53 0.95 0.17 0.70 0.23 0.49 0.12 0.08 0.39 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.15 6 0.18 0.31 0.53 0.95 0.17 0.70 0.33 0.59 0.12 0.08 0.39 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.15 7 0.18 0.31 0.53 0.95 0.17 0.70 0.13 0.39 0.12 0.08 0.39 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.98 8 0.18 0.31 0.53 0.95 0.17 0.70 0.23 0.49 0.12 0.08 0.39 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.15 9 0.18 0.31 0.53 0.95 0.17 0.70 0.23 0.49 0.12 0.08 0.39 0.28 0.37 0.98 0.54 0.77 0.65 0.77 0.78 0.82 0.15 Table 18. Relevant documents retrieved by query individuals for topic 24, (generation 3) (* new retrieved documents) Document number 0 1 2 3 4 5 6 7 8 9 DOE 1-57-0764 V V V V V V V V V DOE 1-75-0356 V V V V V V V V V DOE2~07~0957* V V V V V V V V V DOE2~55~0603* V V V V V V V V V DOE1-82-0152 V V V V V V V V V DOE 1-82-0447 V V V V V V V V V DOE2- 14-0677 V V V V V V V V DOE1-07- 1283 V V V V V V V V DOE1-53-1065 V V V V V V V V DOE1-59-1 152 V V V V V V V V DOE 1~82~0434* V V V V V V V V DOE2-63-071 1* V DOE1~07~1313* V DOE1~12~0905* V DOE1-23-051 1 V DOE2~33~1315* V DOE1A8-0009 V DOE1-16-1 155* V DOE2-40- 1200* V DOE1-1 1-0490 V DOE1-15-1 164 V DOE 1-35-0747 V DOE143~0913* V DOE1-46-031 1 V DOE 1-73-0024 V DOE 1~93~0736* V DOE2-71-0075 V DOE 1~33~0034* V DOE 1~84~1200* V DOE 1-85-0564 V 52 Table 19. Precision and recall for ad hoc queries Top ranked evaluation Run number: upitt3 all Num~queries: 50 Total number of documents over all queries Retrieved: 5383 Relevant: 5923 Rel_ret: 1249 Recall - Precision Averages: at 0.00 0.5622 atOlO 0.2723 at 0.20 0.2025 at0.30 0.1382 at 0.40 0.0708 at 0.50 0.0309 at0.60 0.0279 at0.70 0.0155 at0.80 0.0136 at0.90 0.0000 at 1.00 0.0000 Average precision for all points 11-ptAvg: 0.1213 Average precision for 3 intermediate points (0.20,0.50,0.80) 3-ptAvg: 0.0824 Table 20. Precision and recall for routing queries Top ranked evaluation Run number: upitt3 all Num~queries: 50 Total number of documents over all queries Retrieved: 7611 Relevant: 14216 Rel_ret: 1277 Recall - Precision Averages: at 0.00 0.5285 atOlO 0.2057 at0.20 0.0649 at 0.30 0.0085 at 0.40 0.0000 atOSO 0.0000 at0.60 0.0000 at0.70 0.0000 at0.80 0.0000 at0.90 0.0000 at 1.00 0.0000 Average precision for all points 11-pt Avg: 0.0734 Average precision for 3 intermediate points (0.20, 0.50,0.80) 3-pt Avg: 0.0216 Recall: Recall: at S docs: 0.0192 at 5 docs: 0.0090 at 15 docs: 0.0509 at 15 docs: 0.0215 at 30 docs: 0.0849 at 30 docs: 0.0362 at lOodocs: 0.1753 at 100 docs: 0.0842 at 200 docs: 0.2206 at 200 docs: 0.1007 Precision: Precision: At Sdocs: 0.3280 At Sdocs: 0.3520 At 15 docs: 0.2867 At 15 docs: 0.3013 At 30 docs: 0.2533 At 30 docs: 0.2727 At 100 docs: 0.1844 At 100 docs: 0.2102 At 200 docs: 0.1249 At 200 docs: 0.1277 53 9. Failure Analysis Compaiiing our results with those provided by NIST, the precision values at eleven recall points indicate that our system performs better than the median level on about half of the topics for the ad hoc queries (Figure 1). In several queries (54, 60, 79, 81, 84, 91, and 100) we sent only a few documents (fewer than 10), and for queries 59, 61, 68, 80 and 99 we sent no documents, because the threshold values limited the number of documents retrieved. Thus the precision values for those queries are zero or very low. The same situation happened for some routing queries. The precision values for the routing queries in our system are lower than the median level in most cases (Figure 2). Beside the threshold inhibition mentioned above, we observed that due to the query convergence in the final generation of the training topic, most query variants retrieved the same documents. Some query individuals in the intermediate generations which retrieved different relevant documents than the last generation may not have survived. We think this caused the situation where fewer relevant documents were retrieved on the routing queries. Problems arose with specific queries due to our pre-processing of the documents. Several circumstances were not considered in designing our system. For example, some special keywords, such as AT&T and M which was used in some documents to represent million, were not processed, but were significant in some topics. Another factor is that in the AP, WSJ and ZIFF databases more than one text with different topics comprised a single document. Since we did not separate them, the keyword match could cause a document to be retrieved because keywords from different text parts matched the query, though the document itself is not relevant. 54 0.6 P r 0.5 e C 0.4 S i 0 0.3 n V 0.2 a 1 U e 0.1 0 0.6 P r 0.5 e C 0.4 S i 0 0.3 n V 0.2 a 1 U e 0.1 I I 1111 L I 59 61 79 84 99 89 71 51 57 88 87 67 81 86 72 55 90 9 76 64 78 62 52 82 69 60 68 80 91 92 73 66 65 56 95 54 74100 63 83 75 85 94 53 77 98 % 58 93 70 Topic number (ad-hoc) Upitt :>.> NIST-Median Figure 1. Precision values on 50 ad hoc queries 0 I Ii. 4: 5 3 5 7 16 4., 2 Figure 2. I fi n 4 4 3 ~. 3~3'1I -~ 3 4~ 1 3( 4~ 1~ 4 4' 2~ ( 4' 711159 5.218 25.' 8 29:9:7 Topic number ~outing) Upitt NIST-Median Precision values on 32 routing queries 2' 2 55 3 10. Further Studies In dealing with these large databases, our algorithm shows many interesting phenomena which we could not deal with due to the short time period of TREC, and which could form the agenda for further studies. (1) The document term weights Although we use the vector space model mentioned in Section 3, unweighted keyterms are assigned to describe the documents. We have mentioned the two factors based on which we made the decision. Moreover, the databases consist of a variety of subjects which may make the general term weights less reliable than in most experimental databases, which deal with only a narrow area. While our Cranfield study supports this decision, the effect of weighted and unweighted document keywords in large databases should be further studied. (2) Window size for document retrieval Although we established the maximum size of the retrieved document set as 40, it would be interesting to set the window size smaller than 40 at the beginning of the process and gradually increase the size after the query individuals move toward convergence. Various thresholds and cutoff points are often arbitrarily set. Further experimentation is planned to investigate the effect of changes in these values. (3) Document classification In this experiment we found many cases in which different query individuals retrieved different documents. This may arise because the database contains relevant documents that belong to different classes, that can be retrieved only by using different queries. This fact would never be discovered by systems that rely on a single query. Our present algorithm aims at convergence to a single query form. We need a mechanism which will detect the development of distinct classes, and automatically separate the query into two or more distinct queries. Similar ideas, not based on genetic algorithms, have been discussed in document classification (Worona, 1971) and in query splitting (Borodin, et. al., 1971). (4) Term addition and deletion In our feedback process, no terms were added to or removed from the original query. This may cause lower recall. However, the determination of which terms should be added or removed is not an easy task (e.g., Harman, 1988, 1992). It would be interesting to see how this problem can be handled in the genetic process. Reference Aalbersberg, I.J. (1992). Incremental relevance feedback. Proceedings of the lSth Annual International SIGIR Conference, pp 11-22. Copenhagen, Denmark (June, 1992). 56 Baker, J.E. (1987). Reducing bias and inefficiency in the selection algorithm. In J.J. Grefenstette (Ed.), Proceedings of the Second International Conference on Genetic Algorithms. pp.14-21. NJ: Lawrence Eribaum Associates, Publishers. Bookstein, A. (1985). Probability and fuzzy-set applications to information retrieval. In: M. E. Williams (Ed.), Annual Review of Information Science and Technology, Vol.20, pp.117-151. Borodin, A., Kerr, L. and Lewis, F. (1971). Query splitting in relevance feedback systems. In: 0. Salton ~d.), The Smart Retrieval System: experiments in automatic document processing. NJ: Prentice Hall. Chang, Y.K., Cirillo, C. and Razon, J. (1971). Evaluation of feedback retrieval using modified freezing, residual collection, and test and control groups. In: 0. Salton ~d.) (1971). The SMART Retrieval System: experiments in automatic document processing. New Jersey: Prentice-Hall, Inc. Frakes, W.B. (1992). Stemming Algorithms. In W.B. Frakes and R. Baeza-Yates, (Eds.) Information Retrieval: data structures and algorithms, pp. 131-160. NJ: Prentice Hall. Frieder, 0. and Siegelmann, H. (1991). On the allocation of documents in multiprocessor information retrieval systems. Proceedings of the Fourteenth Annual International ACMISIGIR Conference on Research & Development in Information Retrieval, pp. 230-239. Illinois: Chicago. Goldberg, D. E. (1989). Genetic Algonthms: in Search, Optimization, and Machine Learning. New York: Addison-Wesley Publishing Company, Inc. Gordon, M.D. (1988). The necessity for adaptation in modified Boolean document retrieval systems. Information Processing & Management, 24(3), pp. 339-347. Harman, D. (1988). Towards interactive query expansion. Proceedings of the Eleventh Annual International ACMISIGIR Conference, pp. 321-331. Harman, D. (1992). Relevance feedback revisited. Proceedings of the 15th Annual International SIGIR Conference, pp 1-10. Copenhagen, Denmark (June, 1992). Holland, J. H. (1975) Adaptation in Natural and ArtLflcial Systems. Ann Arbor: The University of Michigan Press. Ide, E. (1971). New experiments in relevance feedback. In: G. Salton ~d.) (1971). The SMART Retrieval System: experiments in automatic document processing, Chapter 16. New Jersey: Prentice-Hall, Inc. 57 Salton, G. (Ed.). (1971). The SMART Retrieval System: Experiments in Automatic Document Processing. NJ: Prentice-Hall, Inc. Salton, 0. (1986). Recent trends on automatic information retrieval. Proceedings of ACMISIGIR Conference on Research and Development in Information Retrieval, pp. 1-10. Pisa, Italy (September, 1986). Salton, 0. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), pp.513-523. Worona, 5. (1971). Query clustering in a large document space. In: 0. Salton (Ed.) (1971). The SMART Retrieval System: experiments in automatic document processing, Chapter 12. New Jersey: Prentice-Hall, Inc. Yang, J.J. and Korfhage, R.R. (1992a). Adaptive information retrieval systems in vector space model. Proceedings of Symposium on Document Analysis and Information Retrieval, pp.134-150. Las Vegas, Nevada (March, 1992). Yang, J.J. and Korfhage, R.R. (1992b). Query modification using genetic algorithms in vector space models. Research Reports, L!5045/1592001, Department of Information Science, University of Pittsburgh. 58