On Expanding Query Vectors with Lexically Related Words Ellen M. Voorhees Siemens Corporate Research, Inc. 755 College Road East Princeton, NJ 08540 ellenŠlearning.scr.siemens.com Abstract Experiments performed on small collections suggest that expanding query vectors with words that are lexically related to the original query words can im- prove retrieval effectiveness. Prior experiments using WordNet to automatically expand vectors in the large TREC- i collection were inconclusive regarding effec- tiveness gains from lexically related words since any such effects were dominated by the choice of words to expand. This paper specifically investigates the effect of expansion by selecting query concepts to be expanded by hand. Concepts are represented by WordNet syn- onym sets and are expanded by following the typed links included in WordNet. Experimental results sug- gest that this query expansion technique makes little difference in retrieval effectiveness within the TREC en- vironment, presumably because the TREC topic state- ments provide such a rich description of the information being sought. 1 Introduction The IR group at Siemens Corporate Research is in- vestigating how concep~ spaces data structures that define semantic relationships among ideas - can be used to improve retrieval effectiveness in systems de- signed to satisfy large-scale information needs. As part of this research, we expanded document and query vec- tors automatically using selected synonyms of origi- nal text words for TREC-i [5]. The retrieval results indicated that this expansion technique improveA the performance of some queries, but degraded the perfor- mance of other queries. We concluded that improving the consistency of the method would require both a bet- ter method for determining the important concepts of a text and a better method for determining the correct sense of an ambiguous word. We took TREC-2 as an opportunity to investigate the effectiveness of vector expansion when good con- cepts are chosen to be expanded. As in TREC-1, query vectors were expanded using WordNet synonym sets. 223 However, the synonym sets associated with each query were selected manually (by the author). These results therefore represent an upper-bound on the effectiveness to be expected from a completely automatic expansion process. The results of the TREC-2 evaluation indicate that the query expansion procedure used does not signifi- cantly affect retrieval performance even when impor- tant concepts are identified by hand. Some expanded queries are more effective than their unexpanded coun- terparts, but for other queries the unexpanded version is more effective. In either case, the effectiveness differ- ence between the two versions is seldom large. Further testing suggests that more extreme expansion proce- dures can cause larger differences in retrieval perfor- mance, but the net effect over a set of queries is de- graded performance compared to no expansion at all. The remainder of the paper discusses the experiments in detail. The next section describes the retrieval envi- ronment, including a description of WordNet. Section 3 provides evaluation results for both the official TREC-2 runs and some additional supporting runs. The final section explores the issue of why the expansion fails to improve retrieval performance. 2 The Retrieval Environment The expansion procedure used in this work relies heavily on the information recorded in WordNet, a manually-constructed lexical system developed by George Miller and his colleagues at the Cognitive Sci- ence Laboratory at Princeton University [4]. Word- Net's basic object is a set of strict synonyms, called a synset. Synsets are organized by the lexical rela- tions defined on them, which differ depending on part of speech. For nouns, the only part of WordNet used in this study, the lexical relations include antonymy, hy- pernymy/hyponymy (is-a relation) and three different meronym/holonym (part-oJ) relations. The is-a rela- tion is the dominant relationship, and organizes the synsets into a set of approximately ten hierarchies1. Figure 1 shows a piece of WordNet. The figure con- tains all the ancestors and descendents as defined by the is-a relation for the six senses of the noun swing. Also shown is that one of the senses, a child's toy, is pan-of a playground. Given a synset, there is a wide choice of words to add to a query vector one can add only the syn- onyms within the synset, or all descendents in the is-a hierarchy, or all words in synsets one link away from the original synset regardless of link type, etc. One of the goals of this work is to discover which such strate- gies are effective. Wang et al. found expanding vectors from relational thesauri to be effective [6], but based those conclusions on experiments performed on one small collection. Experiments we performed as part of our TREC-1 work showed showed serious degradation when anything other than synonyms were used in the expansion but the TREC-1 resuJts were dominated by the problem of finding good synsets to expand. This work examines the effectiveness of the different relation types assuming good synsets are used as the basis. Siemens' official TREC-2 runs consist of one rout- ing run (topics 51-100 against the documents on disk three) and two ad hoc runs (topics 101-150 against the documents on the first two disks). All of the runs are manual since the input text of the topics was modified by hand. There are two types of modifications: parts of the topic statement that explicitly list things that are not relevant were removed, and synsets containing nouns germane to the topic statement were added as a new section of the topic text. Document text was in- dexed completely automatically (once the errors were fixed2) using the standard SMART indexing routines [1] (i.e., tokenization, stop word removal, and stemming). In general, only the "text" fields of the documents were indexed. For example, only the title, abstract, detailed claims, claims, and design claims sections were indexed for the patent subcollection. The manually assigned keywords included in some of the Ziff documents were not used, nor were the photograph captions of the San Jose Me~ury collection. The goal in selecting synsets to be included in a topic statement was to pick synsets that emphasized impor- tant concepts of the topic. One aspect of the prob- lem is sense resolution: selecting the synset that con- tains the correct sense of an ambiguous original topic word. However, since one purpose of the experiments 1The actual structure is not quite a hierarchy since a few synsets have more than one parent. 2There were seven errors total in ifies patni)14 and patn~51 that were not on the official list, but caused the ifies to not con- form to the patent collection's readme ifie. These errors - miss- ing `/TEXT' tags,'TEXT' tags preceding `OREF' tags, and the like - were also fixed manually. 224 is to investigate how effective lexical relations are in expanding queries assuming good starting concepts, I did not restrict myself to adding only synsets that con- tain some original topic word. For example, topic 93 asks for information about the support of the NRA and never mentions the word gun. Nevertheless, I be- lieved gun to be an important concept of the topic and added the synset containing gun meaning "a weapon that discharges a missile from a metal tube or barrel" to the topic. (Rifle, a word that does appear in the topic statement, is a grandchild of this synset in Word- Net, with the intervening synset being {flrearm, piece, small-arm}). Synset selection was also influenced by the fact that these synsets would be used to expand the query. Early experiments demonstrated that ex- pansion worked poorly when synsets with very many children in the is-a hierarchy (e.g. coun~~y) were used, so those synsets were avoided. Furthermore, when se- lecting one sense among the different senses in WordNet was difficult, I frequently used the words related to the synsets as a way of making a decision. Figure 2 shows the original text of topic 93 and the synsets that were added to it. Some topics contained important concepts that had no corresponding synset. Occasionally, the missing synset was a gap in WordNet; for example, joxic was~e, gene~ic engineering, and sancuons meaning economic disciplinary measures are not in version 1.3 of WordNet. More often, the important concept was a proper noun or highly technical term that one wouldn't expect to be in WordNet. NRA or Nalional Rifle Associajion, for example, is an important concept for topic 93 but does not occur in WordNet. Nothing was added to the topic texts for concepts that lacked corresponding synsets in these experiments, although making some provision for them would improve retrieval performance. Once the text of the topics is annotated with synsets, the remainder of the processing is automatic. Selected fields of the topic statements (the title, nationality, nar- rative, factors, description, and concept fields) are in- dexed using the standard SMART routines. The terms derived from these sections are "original query terms The expansion procedure is invoked when the synonym set section is reached. The procedure is controlled by a set of parameters that specifies for each relation type included in WordNet the maximum length of a chain of that type of link that may be followed. A chain begins at each synset listed in the synset section of the topic text and may contain only links of a sin- gle type. All synonyms contained within a synset of the chain are added to the query. Collocations such as change~of~loca~ion in Figure 1 are broken into their component words, stop words such as of are removed, and the remaining words are stemmed. The word stems abstraction attribute attribute property sound~roperty rhyLhm swing lilt - IS-Alink - - PARTlink act human~activity human_action change liveliness diversion change~oLlocation recreation motion movement ballroom_music approach~shot chip~shot pitch~shot 1+71 ~swinging~ entity o jec inanimate_objec physical~oiject thing artifact article artefact instrurnentality device Imechanicaijievice F~~7~i I toy ____________________________________ J jive L]yground~e Figure 1: Relations defined for the six senses of the noun swing in WordNet. 225 Topic: What Backing Does the National Ri~le Association Have? <desc> Description: Document must describe or identi:'y supporters o~ the National Ri~le Association (NRA), or its assets. <narr> Narrative: To be relevant, a document must describe or name individuals or organizations who are members o~ the NRA, or who contribute money to it. A document is also relevant i~ it quanti~ies the NRA's ~inancial assets or identi~ies any other NRA holdings. <con> Concept(s): 1. National Ri~le Association, NRA 2. contributor, member, supporter 3. holdings, assets, finances <syn> {funds, finance, monetary resource, cash~in~hand, pecuniary~resource} {supporter, protagonist, champion, admirer, booster} {gun} Figure 2: Topic 093 and the synonym sets selected for it. plus a tag indicating the lexical relation through which the stems are related to the original synset are then appended to the original query terms. As an example of the expansion process, consider the synsets for swing shown in Figure 1. If the synset added to the topic is the synset containing golf~stroke, and any number of hyponym (child) links may be traversed, then the stems of golf, stroke, swing, shot, slice, hook, drive, putt, approach, chip, and pitch would be added to the query vector. If hyponym chains are limited to length one, then chip and pitch would not be added. If the synset added to the topic is the one containing swing meaning plaything and any link type may be fol- lowed for one link, then the stems of swing, mechanical, device, plaything, toy, playground, and trapeze would be added to the query. Stems added through different lexical relations are kept separate using the extended vector space model introduced by Fox [3]. Each query vector is comprised of subvectors of different concept types (called ctypes) where each ctype corresponds to a different lexical re- lation. A query vector potentially has eleven ctypes: one for original query terms, one for synonyms, and one each for the other relation types contained within the noun portion of WordNet (each half of a symmet- ric relation has its own ctype). An original query term that is a member of a synset selected for that query appears in both of the respective ctypes. Similarly, a word that is related to a synset through two different relations appears in both ctypes. 226 The similarity between a document vector D and an extended query vector Q is computed as the weighted sum of the similarities between D and each of the query's subvectors: sim(D,Q)= ~ ctype where denotes the inner product of two vectors, Q~ is the ith subvector of Q, and a~, a real number, re- flects the importance of ctype i relative to the other ctypes. Terms in documents vectors are weighted us- ing the lnc weights suggested by Buckley et al. [2]; that is, the weight of a term is set to 1.0 + ln(tf) where tf is the number of times the term occurs in the document and is then normalized by the square root of the sum of the squares of the weights in the vector (cosine nor- malization). Query terms are weighted using ltN: the log term frequency factor above is multiplied by the term's inverse document frequency, and the weights in the ctype representing original query terms are normal- ized by the cosine factor. Weights in additional ctypes are normalized using the length computed for the orig- inal terms' ctype. This normalization strategy allows the original query term weights to be unaffected by the expansion process and keeps the weights in each ctype comparable with one another. 3 Experiments The training data for the routing queries was used both to refine the synsets that were included in the topic text and to select the type of relations used to expand the query vectors. In some cases a synset that appears to be a logical choice for a query is nonetheless detri- mental. For example, adding the synset for dea~h to topic 59 (weather fatalities) causes the query to re- trieve far too many articles reporting on deaths that have no relation to the weather. I produced five differ- ent versions of synset-annotated topic texts, although the differences between versions are not very large. The version used in the official routing run added an average of 2.9 synsets to a topic statement, with a minimum of o synsets added and a maximum of 6 synsets added. Of course, the utility of a synset depends in part on how that synset is expanded and the relative weights given to the different link types (the a's in the similar- ity function above). Table 1 lists the various combina- tions that were evaluated using the training data. Four different expansion strategies were tried: expansion by synonyms only, expansion by synonyms plus all descen- dents in the is-a hierarchy, expansion by synonyms plus parents and all descendents in the is-a hierarchy, and expansion by synonyms plus any synset directly related to the given synset (i.e., a chain of length 1 for all link types). Different a values were also investigated. As- suming original query terms are more important than added terms, the a for the original terms subvector was set to one and the a for other subvectors varied between zero and one as shown in Table 1. The most effective run was the one that expanded a query synset by any synset directly related to it and had a = .5 for all added subvectors. Therefore, this strategy was used to produce the official routing queries from the final version of the annotated text. The scheme added an average of 24.7 words to a query vector (minimum 0, maximum 70), and an average of 20.2 (0, 66) words that are not part of the original text. The average number of relevant documents retrieved at rank 100 for this run is 40.7 and at rank 1000 is 133.3; the mean "average precision" is .2984. In gen- eral, the individual query results are at or slightly above the median, with a few queries significantly below the median. Of more interest to this study is how the ex- panded queries compare to unexpanded queries. A plot of average recall versus average precision for these two runs is given in Figure 3. As can be seen, the effective- ness of the two query sets is very similar. Since there was no way to evaluate the relative ef- fectiveness of different expansion schemes for the ad hoc queries, the same same expansion scheme as was used for the official routing run chains of length one for any relation type and all a's = .5 was used for 227 the ad hoc run. Furthermore, there could be no re- fining of which synsets to add, so only one version of synset-annotated text was produced. An average of 2.7 (minimum 0, maximum 6) synonyms was added to an ad hoc topic text. The expansion process added an av- erage of 17.2 (0, 66) terms and 12.8 (0, 55) terms that are not part of the original text. Siemens actually submitted two ad hoc runs. The first was the expanded queries with a's set to 0, a run that is equivalent to no expansion and is used as a base case. The second Siemens ad hoc run used the .5 a val- ues. A plot of the effectiveness of the two ad hoc runs is given in Figure 4. The differences in effectiveness be- tween unexpanded and expanded queries is even smaller for the ad hoc queries than it is for the routing queries. The average number of relevant documents retrieved at rank 100 is 46.9 for both the unexpanded and expanded queries. The average number of relevant documents re- trieved at rank 1000 is 161.4 for the unexpanded queries and 161.3 for the expanded queries. The mean "average precision" is .3408 and .3397 respectively. A possible explanation for the little difference made by expanding the queries is that the expansion param- eters used were too conservative. To test this hypoth- esis, additional runs were made using the same set of synsets but allowing longer chains of links and/or using greater relative link weights (the a's). Table 2 lists the additional combinations tested using both the ad hoc queries versus the documents on disks one and two, and the routing queries versus the documents on disk 3. As was the case for the routing training runs, the strategy used for the official TREC-2 runs (all links of length one, a's = .5) was the most effective expansion strategy. The more aggressive expansion strategies did make larger differences in retrieval effectiveness com- pared to the unexpanded queries, but across the set of queries the aggregate difference was negative. Hence it is unlikely that the conservative expansion strategy is the reason for the lack of improvement. 4 Conclusion The experimental evidence clearly shows this query ex- pansion technique provides little benefit in the TREC environment. The most likely reason for why this should be so is the completeness of the TREC topic de- scriptions. Query expansion is a recall-enhancing tech- nique and TREC topic descriptions are already large compared to queries found in traditional IR collections. Although most of the expanded queries did have some new terms added to them, the most important terms frequently appeared in both the original term set and the set of expanded terms. This had an effect on the relative weight of those terms in the overall similarity Expansion by synonyms only orig terms ~ synonyms OL 1 .1 1 .3 1 .5 1 .8 Expansion by synonyms plus all descendents orig terms a synonyms a descendents a 1 .1 .1 1 .3 .1 1 .3 .3 1 .5 .1 1 .5 .3 1 .5 .5 1 .8 .1 1 .8 .3 1 .8 .5 Expansion by synonyms plus parents and all descendents orig terms a synonyms a descendents a parents a 1 .1 .1 .1 1 .3 .1 .1 1 .3 .3 .3 1 .5 .1 .1 1 .5 .3 .1 1 .5 .3 .3 1 .5 .5 .1 1 .5 .5 .3 1 .5 .5 .5 1 .8 .1 .1 1 .8 .3 .1 1 .8 .3 .3 1 .8 .5 .1 1 .8 .5 .3 1 .8 .5 .5 Expansion by synonyms plus any directly related synset orig terms a synonyms a other a 1 .3 .1 1 .3 .3 1 .5 .1 1 .5 .3 1 .5 .5 1 .3 .5 Table 1: Combinations of expansion strategies and relation weights tested. 228 :1 C" A unexpanded A expanded 0.0 0.2 0.4 0.6 0.8 1.0 Recall 5 .5 a Figure 3: Effectiveness of expanded versus unexpanded routing queries on test documents (disk 3). ˝ A unexpanded A expanded 0.0 0.2 0.4 0.8 0.8 1.0 Racall Figure 4: Effectiveness of expanded versus unexpanded ad hoc queries. 229 Expansion by synonyms plus parents and all descendents orig terms ~ synonyms ~ descendents a parents a 1 .5 .5 .5 1 1 .5 .5 1 1 1 1 1 2 1 1 Expansion by synonyms plus any directly related synset orig terms a synonyms a other a 1 0 0 1 .5 .5 1 1 1 Table 2: Additional combinations of expansion strategies and relation weights tested. computed for a document, especially when some im- portant terms had no corresponding synset. Topic 112 provides an example of this effect. The topic is con- cerned about the world-wide investment in biotechnol- ogy. I added synonym sets for investment and capital to the topic. WordNet does not contain biotechnology, although it does contains biomedicaLscience. Thus, I also added the biomedicaLscience synset and a synset containing gene. The resulting expanded query per- formed significantly worse than the unexpanded version (33 relevant retrieved in the first 100 versus 52 relevant retrieved). The problem is that the expanded query places too much emphasis on money and not enough on biotechnology. Thus these results indicate that sim- ply recognizing which are the important concepts in a query statement is not sufficient to ensure improved re- trieval performance. An expansion procedure must also preserve the relative weights of those concepts. Another possible explanation is that WordNet is not suited for this task - it was not designed to be used in this manner and it may not contain the necessary links. Even if this is true, however, it is unlikely that any other broad-coverage knowledge base would be better suited. The success of relevance feedback and other routing techniques suggests that the most useful relations are specific and idiosyncratic. A second goal of this work was to characterize the effectiveness of different types of lexical relations when used to expand a query. Assuming the set of words to be expanded is well chosen, any closely related word - regardless of the type of relation - may be a good additional word. Wang et al. reached a similar con- clusion [6]. Nevertheless, an added word should be weighted less than the original word that caused it to be included. Runs in which added words were equally or more heavily weighted than original words were consistently less effective than the more conserva- tively weighted runs. Similarly, runs that added words 230 that were loosely related to original words (i.e., when long paths of links were followed) were consistently less effective than runs that used only near relatives. Acknowledgements Geoff Towell carefully read a draft of this paper and suggested changes that improved its presentation and clarity. References [1] Chris Buckley. Implementation of the SMART in- formation retrieval system. Technical Report 85- 686, Computer Science Department, Cornell Uni- versity, Ithaca, New York, May 1985. [2] Chris Buckley, Gerard Salton, and James Allan. Automatic retrieval with locality information us- ing SMART. In D. K. Harman, editor, Proceed- ings of the First Text REtrieval Conference (TREC- 1), pages 59-72. NIST Special Publication 500-207, March 1993. [3] Edward A. Fox. Extending the Boolean and Vec- tor Space Models of Information Retrieval with P- norm Queries and Multiple Concept Types. PhD thesis, Cornell University, 1983. University Micr~ films, Ann Arbor, MI. [4] George Miller. Special Issue, WordNet: An on-line lexical database. International Journal of Lexicog- raphy, 3(4), 1990. [5] Ellen M. Voorhees and Yuan-Wang Hou. Vector ex- pansion in a large collection. In D. K. Harman, edi- tor, Proceedings of the First Text REtrieval Confer- ence (TREC-1), pages 343-351. NIST Special Pub- lication 500-207, March 1993. [6J Yih-Chen Wang, James Vandendorpe, and Martha Evens. Relational thesauri in information retrieval. Journal of the American Society for Information Science, 36(1):15-27, January 1985. 231