PROXIMITY-CORRELATION FOR DOCUMENT RANKING: The PARA Group's TREC Experiment by Mark Zimrnermann P.O.Box 598 Kensington, Maryland 20895A)598 USA Abstract (zimm@alumni.caltech.edu) The PARA Group's simple document routing method achieved surprisingly good results in the first TREC experiment. The system works by awarding points to documents with many query terms in near proximity to each other. The current implementation of this system is described in general terms; this note is followed by a listing of the complete source code used to rank documents for the 50 TREC test questions, written in Awk. Possible improvements1 and directions for further research, are suggested. Acknowledgements The PARA Group is a loose affiliation of people with common interests in free-text information retrieval, hypermedia, and free software. (For further information, or to join, send a message to "para- reques~cs.cmu.edu" via the Internet.) For the TREC relevance-ranking document routing test, I consulted with other PARA Group members and implemented concepts that we discussed communally. I would like to thank Dr. Donna Harman, MST, for allowing me to participate in TREC and for encouraging me to write up my results. I also thank the members of the PARA Group for their helpful advice. I made extensive use of, and am grateful for, software from the Free Software Foundation - in particular, the GNU Emacs text editing system, and the Gawk version of the Awk programming language. (Disclaimer: My Employer Is In No Way Responsible For This Work!) Approach I began with the subjective observation that, in my personal experience, the documents which I like most tend to have local clusters of "interesting" words. I also began with the constraint that I had only a few hours of programming time to invest in my TREC experiment; contrariwise, I had a NeXT workstation with an optical disk and plenty of unused background CPU cycles available. This led me to try a quick-and~irty approach using the regular expression pattern-matching and other programming facilities of Gawk, a free version of the Awk language. I decided to work on the document routing task using the full TREC data set. I took the 50 TREC questions and manually constructed simple regular expressions ("regexps") for each of the key terms in them. Thus, for Topic 001, on pending antitrust cases, I had /~TRUST/, ICASEI, and /PEND/; for topic 002, acquisitions or mergers involving US and foreign companies, I came up with IACQUISITION I BUYOUT I MERGER I TAKEOVERI, etc. For equivalent terms which were implicitly boolean~R'd together, I wrote a single regexp with "I" joining the words. I spent approximately two minutes per TREC query writing these patterns, a total of about two hours, and used words contained in 353 the TREC topic statements plUs a few obvious synonyms which occurred to me as I typed in the queries. I converted all characters in the TREC document set to upper~ase before processing, so my regexps ignored case issues. To handle the proximity (boolean-ANDAike requirement) among separate terms, and to generate estimated relevances, I invented a very simple scoring system. Every time a line in a document matched one of my regexps, I added an arbitrary 5 points to that regexp's score. A line in the document got a score equal to the product of the regexp scores for that query. When moving on to the next line of a document, I multiplied all regexp scores by 0.9, to make them fade away with a characteristic length scale of 10 lines (=11(1-0.9)). The estimated relevance for a document as a whole was the maximum relevance of any line in that document. For terms which were to be negatively~weighted (boolean-NOT-like), as in TREC Topic 026, instead of multiplying in the regexp's score to get a line's relevance, I multiplied in 5 minus that score. I experimented briefly with different weights and relevanc~length~scales for different terms, but decided that there was neither sufficient time nor much benefit to be gained that way, and so I settled on the S-points-per-instance, 0.9 degradation-rate standard. Implementation I performed my TREC experiments over a period of a few weeks in background on a NeXT workstation (an old Cube), using the (rather slow) built-in writeable optical disk to hold data copied from the TREC CD-ROMs. On the average, each TREC query took about on~third of a second per document to execute. My implementation ran 10 queries at a time and generated a line of output for each document, listing the document ID followed by (the integer part of) its estimated relevance, in 10 columns. I then used additional UNIX shell scripts containing coirm and sort and head to tabulate the 200 highest- ranked documents for each topic. A final Gawk program converted my results into the standard TREC format. The total wall~lock computation time which I used was about 11 days for each CD-ROM of data. Late in the process of scoring documents, I discovered that a bug in my programs caused the last document in a data set not to be ranked - but I had no time to fix the error, and it probably did not affect my overall results significantly. Further Work I believe that the proximity~correlation approach used in my TREC experiment has promise for other relevance~ranking tasks. Almost certainly, the use of a compiled language (and perhaps a simpler, faster pattern-matching facility) rather than Gawk would result in a speed increase of an order of magnitude or more. Much higher speeds should be achievable by inverted-index methods. User feedback might be valuable in modifying the default settings for term weights and relevance-lengths. A thesaurus option to automatically generate synonyms could help automate the query~reation process. References [1] Mark Zimmermann. "The FreeText Project: Large-Scale Personal Information Retrieval", in Delany & Landrow, TEXT-BASED COMPUTING IN ThE HUMANmES (to be published by MIT Press, early 1993), pps 51~66. 354 -ore docusents delimited by ] (scorelj (score2l ... scoreb] ZN I ml = 0; 52 = 0; 53 = 0; mA = 0; 55 = 0; 56 = 0; 57 = 0; sB = 0; 59 = 0; 510= 0; sla = 0; sib = 0; slc = 0; s2a = 0; s2b = 0; s2c = 0; s3a = 0; s3b = 0; sIc = 0; sAs = 0; sAb = 0; sAc = 0; sAd = 0; 555 = 0; s5b = 0; sic = 0; s6a = 0; s6b = 0; s6c = 0; s6d = 0; s7a = 0; s7b = 0; s7c = 0; s7d = 0; aBa = 0; sIb = 0; sIc = 0; aId = 0; s9a = 0; s9b = 0; s9c = 0; slOs = 0; slOb = 0; docno = *~; COCNO~I I print~ I~%-2Os %Sd ~5d %id lid lid lid lid lid lid lid\n~, docno, ml, 52. 53.54, Si. .6. 57, 59. 59, 510); docno = $2; 51 = 0; 52 = 0; 53 = 0; TREcscore.Q.O1 -1O=gawk 54 0; 55 0; .6 0; s7 = 0; 59 0; .9 0; .10= 0; sia = 0; sib = 0; sic = 0; 968 = 0; s6b = 0; s6c = 0; s6d = 0; 875 = 0; s7b = 0; s7c = 0; s7d = 0; aBa = 0; s9b = 0; sBc = 0; s9d = 0; s9a = 0; s9b = 0; s9c = 0; slOs = 0; slOb = 0; I topic 001 -- - antitrust cases pending ~ANTITRUSTI I ala += 5; ) ~CASEI I sib += 5; I slc += 5; 51 = ala = slb slc; if (51 > .11 51 = sl; sla == .9; slb == .9; slc == .9; I topic 002 --- acquisition/mergerletc. involving UB & foreign IcospaniesI ,ACOUI3ZTZ0NIsUYo~IMERCERITAKEOVER/ I s2a += 5; ,USIU\.B\.IAMERICAN/ I s2b += 5; 1 I?0REZGNI I s2c += 5; 1 1 s2 = s2a = s2b = s2c; if 182 > 521 52 = 82; s2a == .9; s2b == .9; 92'OSP1I 08=A1:12 s2c .9 * top;c 003 --- 3apanese joint venture JAPANI ( 939 += `JOINTI ( s3b 5; `VENTURSI ( sIc 5 93 = 939 sIb sIc; if (93 > ml m3 = 93; 939 == .9; s3b == .9; s3c == .9; topic 004 --- debt rescheduling developing/third~wo~~~ Country/nation `DEET~ ( 949 += 5; `RESCHEOUL/ ( s4b += 5 `CO~R(NATION, { s4c += 5 "O~ELOpiTNIRD.WORLD/ ( s4d .= 5 94 = 949 = s4b = 54c = s4d; ~f 94 > 54) 54 = 84; 949 == .9; s4b == .9; s4c == .9; s4d == .9; topic 009 --- dumping charges US/EC vs. /DUNPING, `US(U\.S\. IECIE\.C\. (EUROPEAN COMMUNITYI IJAPANI 95 = 959 s5b = s5c; if (95 > m5( ins = s5; 959 == .9; s5b == .9; s5c == .9; * topic 006 --- third world /TNIRO.~ORLD( DEVELOP, `COUNTA NATIONI IDERTI IRELIEFI FORGIVE(RESCHEDUL, Japan 959 4= 5; s5b += 5; s5c += 5; (Or developing nation debt relief 969 += 5; s6b += 5 s6c += 5; sEd += 5; 96 = 969 = e6b * 96c s6d; if (96 > 56 m6 = 96; 969 == .9; sEb == .9; s6c == .9; sEd == .9; * topic 007 --- US budget deficit decreace/reduction (US(U\.S\.(FEDERAL(~~~1~/ ( 979 5; ( `BUDIET/ ( s7b += 5; `DEFICIT(SNORTFALLISPEIOD/ ( s7c += 5 ( `REDUC(DECREAS(CUT(ELZMINA~, ( s7d == 5 ( 87 979 a7b s7c = s7d; if (97 > 57 57 = s7; TRECscore.Q.O1 -1O.gawk 979 == .9; s7b == .9; s7c == .9; s7d == .9; * topic 009 - -- non-US economic /US(U\.S\.(ANERICA/ ( 9*9 /ECONOM/ ( sBb /INDICATOR(INDEK/ ( s9c /PROJECT(FORECASTI ( ssd indic9tor/jndex projections/forecasts 5; 5; 5; 5; SB = (5 - 9Ba( = s*b = s8c s9d; if (99 > m9( m* = 9*; 989 == .9; s8b == .9; s9c = .9; sBd == .9; * topic 009 --- 199* presidential candidate sightings/locations (see name list) I AKIS(JESSE. =JACKSON(GARY.=NART(JO~=0IDEN(AL. =OORE(PAUL.=SINON(RICNARD.=GEPNARDT(BRUCE.=EA 9BITI ( 999 == 5; I IN ( AT I TO I NEAR I ( s9b += 5; /WASHINGTON(NEW YORK (LOS ANGELES (MOSCOW(LONDON( PARIS(TOKYO(WHITE HOUSE( CANP DAVID(TEEAS(N EN H~NPSNIRE(I04A(EUROPE(CAPITOL NILL(CITY(STATE(TO**N(STREET(RUILDINGI ( s9c += S 99 = 999 = s9b = s9c; if (99 > m9( m9 = 99; 999 == .9; s9b == .9; s9c == .9; * topic 010 --- AIDS treatment I AIDEI ( 9109 4= 5; /TREATMENT(DRUG/ ( slOb 4= 5; 910 = 9109 = slOb; if (910 > 510 slO = 910; 9109 == .9; slob == .9; ( ~2,oa~In ~O8:42~5~ TRECscore.Q. 1 1-20.gawk score documents delimited by lines from TREC date for second 10 questions on TREC list =2 920527-29, 0601 usage: gawk -f TRECscore.Q.il-20.gawk typically will want to do something like: *ZcSt wa~Ii9~/ .Z I tr a-z A=Z I gawk -f TREcscore.Q.ii=20.gswk >Q.1l=20.TRECscores.o reads from stdin and outputs scores for each document to stdout in format: l] iscorel] jscore2] ... jacorelO] GIN I ml 0; ml = 0; ml = 0; mA = 0; mS = 0; m6 = 0; m7 = 0; ml = 0; m9 = 0; miO= 0; sla 0; sib = 0; sic = 0; s2a = 0; s2b = 0; sla = 0; sIb = 0; sIc = 0; 54a 0; sAb = 0; siOc = 0; sAc = 0; sAd 0; S topic 011 -- - space program s5s 0; s5b = 0; /BPACE/ I sia += 5; 1 s5c = 0; /PRCGRAM I PROJECT sib 4= 5;- = 0; /GOAL I PLANI I sic 4= 5; s6b = 0; s6c 0; Si ala sib sic; s7a = 0; if Isi > ml) ml = Si; 570 = 0; sia == .9; s7c = 0; sib == .9; sla = 0; sic == .9; 1 sIb = 0; sIc = 0; I topic 012 --- water pollution s9a = 0; s9b = 0; IWATER/ I a2a += 5; s9c = 0; /POLLt~TION/ I s2b += 5; 1 sios = 0; slOb = 0; slOc = 0; 82 = s2a s2b; docno = *=; if (s2 > ~I ml = s2; a2a == .9; I,;Doc~~;,l I printf (~%=20s ~5d %5d %5d ~5d %5d %5d ~5d %5d %5d %5d\n~. s2b == .9; 1 docno. ILl. ml. ml. mA, aS. m6. m7. al. m9, ab); docno = $2; S topic 013 --= Mitsubishi Heavy Industries Ltd. ml = 0; ml = 0; IMITSUBISHI/ I s3a 4= 5; 1 ml = 0; IHEAVYI ( sIb += 5; 1 mA = 0; /INDU$TR/ I sIc 4= 5; 1 m5 0; 92'0~Ii 08:42:15 53 = sla s3b SIC; if (SI ml) ml sI; sla .9; sIb == .9; SIC == .9; opic 014 -- - drug approval tUG (MEDICINE! MALI PROVAL MARKET CLEAR! NE I GENERIC I 8RAND! 54 = S4a = s4b = 84c = sAd; if (64 mA) mA = s4; 644 == .9; sAb == .9; sAc == .9; sAd == .9; opic 019 --- CEO (flew appolntmCnt or resignation) O(CNIEF EXECUTIVE OFFICER! { ssa 4= 5; MPANYICORFORATION(INCORPORATEDILTDIINC, I s5b 4= 5 POINTIRESIGNIONOSE! ( s5c 4= 5 69 = s9a sSb = s9c; if (69 4 mS) 55 = s9; 658 == .9; 65b == .9; SC = .9; ,pic 016 -- - marketing agrochemical5 £(MARKET! I 66a += 9 (ICULTURICROPIAGRO! I s6b += 5 ~ICALIPEsTICIDE(HER8ICIDEIFUNGICIDEIINSECTICIDEIFERTILIZER! I s6c += 5 ;6 = s6a s6b s6c; f (66 > 86) m6 = s6; `65 == .9; 6b == .9; 6c == .9; `piC 017 --- measures to Control agrochemicals (REGUIAT(CONTROL(REOTRICT(CURB! ( s7a a= 5; ICULTURICROPIAGROI I s7b 4= 5 MICALI PEOTICIDE NERBICIDEI FUNGICIDE INSECTICIDE I FERTILIZER/ ( s7c 4= S 7 = 57a = s7b = s7c; f (87 > m7( m7 = 87; 78 == .9; 7b = .9; 7c == .9; ~ic 018 =-- Japanese stock market trends ~(NIKKEI(TOXYO1 ( USa += 5 .~(~~KET(AVERACE! ( sSb 4= 5 ~(CHANGE(RISE(FALL! ( s8c = 5 - 886 = sBb = aBc; 644 += 9 sAb 4= 5 sAc ,= 5 sAd 4= 5; TRECscore.Q. 11 -20.gawk if 1s8 > m8) m8 = 88; s8a == .9; s8b = .9; s8c == .9; * topic 019 --- global stock market trends ISTOCKI ( 595 4= 5; /MARKETI I s9b 4= 5; /TREND(CNANGE(RISE(FALL, ( s9c 4= 5 s9 = 694 = s9b = 59c; if (89 > m9; m9 = 69; s9a = .9; s9b == .9; s9c == .9; * topic 020 -- - patent infringement lawsuits IPATENT! ( slOa =4 5 /INFRING/ ( slOb 4= 5 ILAWSUIT(TRIAL(COURT, I slOc =4 5 I slO = 6104 = slOb = SlOc; if (810 > 810) mlO = 810; SlOs = .9; slOb =4 .9; slOc == .9; 1 ;92'Ow11 TRECSCOre.C!.21 -30.gawk score documents delimited by lines from TREC date for third 10 questions on TREC list =z 920527-29, 0601, 0602 usage: gawk -f TREcecore.Q,21-l0.gawk typically will want to do something like: zcat wsjIl9=/= .Z I tr a-z A-Z gawk -f ThEcecore.Q.21-30.gewk >Q.21-30.TRECecoree.o :t ((maybe prefixed by nohup) then typically will want to do something like: sort -n +1 TREcscores.out I tail -1000 ;`021.best this program reads from stdin and outputs scores for each document to stdout in format: (;DOCNO>1 Iscorel] [ecore2] . .. [scoreb] 9ECIN I ml - ml ml m4 m5 m6 m7 m9 ml el al - el 52 51 53 53 CA e4 84 55 sS - s5c = 0: s6a = 0: e6b = 0: e6c = 0: a7e = 0: e7b = 0: aBa = 0: eBb = 0: aBc = 0: a9a = 0: s9b = 0: s9c = 0: s9d = 0: slOe = 0: slOb 0: docno = `~null~=: ~/ ( printf )=%-20a ~5d %5d %5d %Sd %5d %5d %Sd %5d %Sd ~5d~n=, \ docno, ml, ml. ml. .4, mS, .6, .7. .8, m9, .10): docno = $2: ml = 0: ml = 0: ml = 0: mA = 0: mS = 0; mE = 0; m7 = 0; mB = 0: m9 = 0: mlO= 0; ala = 0; elb = 0: slc = 0; ala = 0: slb = 0: ala = 0: sIb = 0; sIc = 0: 54a = 0; cAb = 0; sAc = 0; 555 = 0: a5b = 0; aSt = 0; a6a = 0: s6b = 0: s6c = 0: s7a = 0; s7b = 0: aBa = 0; eBb = 0; sBc = 0: s9a = 0: a9b = 0: s9c = 0: a9d = 0; slOe = 0: slOb = 0; B topic 021 --- superconductivity breakthrough with commercial application ~SUPERCONDUCT/ ( ala += 5: ,DZSCOVER)BREAKT~RI~IRSTI~~ANCl I alb += 5: ICO~ERC)AppLI)PRACTIC/ ( dc += 5: sl = ale slb sOc: if (81 ~ ml) ml = sO: ala `= .9: slb == .9: sOc == .9: a topic 022 -- - counternarcotics IDRUG(R~COTIC)CCCAINE)HEROIN)OPIUM)~IJUAPB:1 ( ala += 5: IZ~~)SMUCGL)CARTEL)TRAFFICI ( alb += 5: 82 ale = slb: if (52 > ml ml = al: s2s == .9: slb =,= .9: I topic 021 --- legal repercussions of agrochamical use { ala += 5: ~ ( sIb += 5: ~w~g~sE(vzCTIM(AcCxDENT(TR~EDY(TRAGIC(DZ$BSTER/ ( sIc == 5; 92'08(11 08:43:06 SI ala 530 sIc; if 51 53 nI SI; ala `= .9; sIb = .9; sIc `= .9; tOPIt 024 new sedital technology `DRUG!MEDICI 54a += 9; CONPANY!HOSPITAL!RESSARCH!INSTITUT/ 546 == 9; DRUG!EOUIPMENT!TRKATMENT!PROCEDURE! ! s4c == 9; 54 = 54a 546 s4c if (54 54) m4 = 54; 54a = .9; 546 = .9; s4t = .9; topic 029 aftermath of Chernobyl, European 9ov't actions CHERHOBYLI ( s9a == 9 *EUROPE( BULGARIA I RUMANIAINOMANIA I BELGI (NETHERLANDS I HOLLAND ~~NCE I ~RENCH (GERMAN! SCANDINA .`IA(U\ .K\. UK I BRITAINIENGLANDI UNITED KINGDOM(NORWAYINORWEGIANISWEDEI ~INLAND!FINNISN( PG IANDI POLISH I HUNGAR CZECHOSLOVAK (AUSTRIA! SWITZERLAND! SWISS (GREEK IGREECEIYUGOSLAVI ITALIAN! .TALYISPAINISPANISHIPORTUGI m9b == 9; `FOOD(TESTINGI ~~LOUTICONSEQUENCE!EVACUATINEALTHICANCER! s9c += 5; s5 = s9a 556 s9c; if (55 > 55) 59 = s9; s5a = .9; 596 `= .9; s9c = .9; topic 026 tracking influential players in multimedia CD-ROM I MULTIMEDIA I MULTI-MEDIAl sEa == 5; APPLICATION DEVELOPERI 566 == 9; APPLE I IBM I MICROSOETi ( sEc += 9 SE = s6a 566 (9 sEc) if (SE S m6; mE = SE; sEa `= .9; 566 = .9; s6c = .9; topic 021 --- expert Systems or neural networks in business or manufacturing EXPERT.SYSTSMIRULE.BASED(SNDHLEDGE.BASE(ARTIFICIAL INTELLIGENINEURAL NETI { sla == 5 BUSINESS (MANUFACTURI ( 576 == 9; s7 = ala 576; if (ml a ml) ml = 51; 515 = .9; 516 == .9; topic 02s --- AT&T's technical efforts I AMERICAN TELEPHONE! BELL SYSTEM I BELL LAB/ ?RODUCT (TECHNOLOG I COMPUTER I COMMUNICATION, sEa == 5 SABY BELL! BELLCORE I UNIXI sEb S sBc += 5 SE = sEa 586 = (9 - SEc); if (SE a mEl mE = sE; SEa = .9; 586 `= .9; TRECscore.Q.21 -30.gawk sEc `= .9; * topic 029 foreign installation of AT&T communications products IFOREIGH (NATION! COUNTRYI I s9a == 9; /AT\&TIAMERICAN TELEPHONE) BELLI I 596 += 5; /TECHNOLOG(SWITCHINGI FIBER I NETWORKI I s9c = 9; /U\ .K\. (UNITED KINGDOM! ENGLAND! BRITAIN ICANADAI I s9d == 9; s9 = s9a 596 s9c (9 s9d) if 1s9 > m9! 59 = s9; s9a *= .9; s96 = .9; s9c `= .9; s9d = .9; 9 topic 030 --- OS/2 Problems /OS\/2i slOa == 5 IPROBLEM (TROUBLE! DELAY! IMMATURI I slOb += 9; slO = slOs 5106 if (Sb > 5101 510 = 510; slOs == .9; slOb `= .9; )O8:43'~ core docuinents delimited by ~DOGNOs lines from ThEC data or fourth 10 questions on TREC list 920527-29. 0601, 0602, 0603 usage: swk -f TREcacore.Q.ll-40.awk .`ypically will want to do aoeething like; zcst ws1/l9~/= .1 I tr a-z A-Z ; awk -f TRECscore.O.ll-40.awk >Q.ll-40.ThEcacorea.out (isaybe prefixed by nohup)) then typically will want to do something like; sort -n +1 TREcscores.out I tail -1000 ;=0ll.beat etc.... this proqras; reads in format; i~; sDOC;;Os/ ( prOntf )~%-i0s %5d %Sd %Sd %5d %Sd %Od %Od %Od %Od %5d\n~, docno, ml, ml, ml, mA, mO, m6, m7, ma, m9. slO); docno = $2; ml = 0; m2 = 0; ml = 0; mA = 0; = 0; mO = 0; ~SOm;0 ~ TRECscore.Q=31 AO.gawk = 0; = 0; m9 = 0; mio= 0; ala = 0; sib = 0; sic = 0; ala = 0; sib = 0; ala = 0; sIb = 0; sic = 0; a4a = 0; eAb = 0; ass = 0; s5b = 0; s5c = 0; sSd = 0; a6a = 0; s6b = 0; s6c = 0; s7a = 0; s7b = 0; 565 = 0; s9b = 0; sBc = 0; a9a = 0; s9b = 0; slOs = 0; slOb = 0; * topic 011 --- advantages of OS/2 /OS\/2/ I sla == 0; /ADVANTAOISTRENGTH! { sib += 5; /WINDO;;SIX.WINDOWS)DOS/ I sic += 5; Si = sia - sib sic; if (Si > ml) ml = SI; sla = .9; slb `= .9; sic == .9; * topic 012 --- outsourcing computer work /CONTRACT. OUT I OUTSOORCZNG / sia += 5; ICOMPUTER I DATA I NETWORKI sib += 5; Si = s2* sib; if )s2 5 ml) ml = s2; sia == .9; s2b == .9; * topic 031 --- companies capable of producing document management systems IDOCUMENTI I ala += 5; IMANAGEMENT I PROCESSING I AUTOMATION (OCR) O~ICAL CMARACTER RECOGNI/ I sIb + = 5; ,COMPANYICORPICO\.IINC\.ILTD\.IINCORPORATED/ ) sIc += 5; 51 = ala = sIb sIc; if (51 > ml) ml = 51; ala == .9; sIb == .9; 92'OS'1I t68:43.3='9 s3c .9 topic 034 ISON applications~exploitation IEDNINTEGRATED SERVICES DIGITAL NETWORK/ ( S4a += S; `STRATEGY;APPLICATION(PRODUCT/ ( s4b 5 = s4a s4b; if (s4 m4) m4 s4; 944 9. s4b == .9; topic 039 alternatives to Postscript ((note or TRUETYPE implicit in ranking there is this chSSting??() `POSTSCRIPTI ( aSa += 9 `ADOREIAPPLEIMICROSOFT, ( s5b == 5 ALTERNATIVE(SURSTITUTE;COMPETIT, 95c == 5; `TRUETYPEI { sSd += 5 s5 = sSa = s5b = s5c s9d; if (s5 > mS) m5 = sS; s5a == .9; s5b `= .9; sSc `= .9; sSd == .9;) topic 036 how rewriteable optical disks work `OPTICAL, ( s6a == 5 /DISK/ { a6b 4= 5 `REWRIT/ { s6c 4= 5 96 = s6b s6c; if ;s6 > m6) m6 = 96; s6a == .9; s6b == .9; s6c == .9; topic 037 SM coeponents `SMISYSTEWS APPLICATION ARCHITECTURE, ( s7a 4= 5 IOFFICEVISION;CONPONENT(COMPLIANT(ADHEREICONFORMICONPLY, C s7b == 5 97 = s7a s7b; if (97 i m7) m7 = 97; s7a == .9; a7b == .9; ~ topic 038 role of minicc*Putera/mainframes in /WINICONPUTER(VAX(NAINFR~~, ( 889 =5; LAN! PC/ C sSb += 5 ) ` RCLEITRANSITION(CHANGEIENVIRQNMINT, ( 89c 4~5; C 88 - 889 89b = 88c; if (88 > m8) m8 - 88; 888 == .9; *9b *= .9; 88c == .9; ; topic 039 client-8erver PlaflSIeXpectatiOfl9 !CLIENT. =SERVER/ TRECscore.Q.3 I -40.gawk /INPLENENT(BUILOCPLANCEXPECTATION, C sOb 9; 99 = 999 s9b if (99 > 99) m9 = 99; 999 == .9; s9b == .9; 8 topic 040 impact of info systems tech on orgs /ORGANIZATIONCCORPORATIONCCONPANYC EFPICIENC PRODUCTIVITf I INNOVATIONICOLLA(9ORATI I slOa 4= 5; IGRAPHICAL USER INTERFACEI GUI (LOCAL AREA NETWORK) LAN (FACSIMILE) FAX) E.NAILI EMAILISO REAOSNEETIDATAOASEIDESKTOP PUOLISN(WORKGROUPI C slOb .= 5 slO = 5109 = slOb if (slO i miD) miD = slO; siQa == .9; slOb == .9; LAN/PC/work8tation environments 898 == 5; ~2~O~fl S:44:28 TRECscore=l~AI -50.gawk ~re documents delimited by Q.nl-4n1.TRECscores.ou (tasybe prefixed by nohup)) en typically will wCflt to do something like: sort -n +1 TREcscores.out tail -1000 mQnl.best is program reeds format: Do(10;Om) froe stdin and outputs scores for each document to stdout [scorel] Iscorel) ... [scoreb] ml = 0; s:2 = 0; ml = 0; m4 0; = 0: mE = 0; m7 = 0; ml = 0; m9 = 0; mio= 0: ala = 0; sib = 0: sic = 0; sla = 0; slb = 0: sla = 0: sIb = 0; sIc = 0; sld 0; 54a = 0: s4b = 0; s4c = 0; = 0; s5b = 0; sEa = 0; sEb 0; sEc = 0: 57. = 0; s7b = 0; s7c = 0; sea = 0; sIb = 0; 89. = 0; *9b = 0 8108 = 0; slOb = 0; doeno = ml 5 5 5 5 5 5 = sIc = 0; sld = 0; s4a = 0; s4b = 0; s4c = 0: s5a = 0; s5b = 0; s6a = 0; sIb = 0; sEc = 0: .7. = 0; s7b = 0; s7c = 0; 598 = 0; sIb = 0; s9a = 0; s9b = 0; slOa = 0; slOb = 0; I topic 041 --- computer or comsunicstions sy'stes upgrade ICOWpUTERIC09~))fCATlON. =SYSTEMI ala == 5; /U~RADEITR(4SITION)PHAS.=OUTl sib 4= 5; ,C~p~ICORp)co\.)ZNC\.)LTl\.)INCORPORATEli ( sic 4=5; 81 = sic sib sic; if Esi ~ ml; ml = 81; sic == .9; sib == .9: sic == .9; I topic 042 -- - ei~ user computing ,E,IDUSERIEt~.USER)DECE)E'R~IZ) EI,c s2a 4= 5; ico'iP'?r, slb 4= 5; .2 = als = s2b; if )sl ~ ml) ml = sl; sla == .9; slb == .9; I topic 043 ==- Al conferences bet~Ieen 1 Feb and 20 Mar 91 ~1 ( printf (=%-20s %5d %ld %Sd %5d ~54 ~Sd %5d %ld %5d ~ \ docno. ml. ml. ml. m4. aS. mE. m7. ml, m9. miG); I M PARTZFZC!AL XNTE~XGE~EI ( 835 4= 5; docno El; /CORFERE(91E SYWPOSXUNI sIb += 5; ml 0: 11991/ s3c 4= 5; ml = 0; IFEBRUARY) FED) MARCH) ~/ sld 5; ml = 0; m4 = 0; II - sla = c3b = sIc = sld: mE = 0; if (53 ~ ml) ml SI; 92'O~1i 08:44:28 s34 == 9, s3b == .9; s3c .9; s3d `= .9; tOp~t 044 --- staff reductions at cosputerlcommunications companies TAFRIWORKERS, 54a 4= 5; EDUCTIONLLAYOFF;CUT9ACKIATTRITION;FIRINGIFIREDI { s4b 4= 9; IMPUTERICOMMUNICATIONITELEPHONEI 54c 4= 5; 54 545 s4b s4c; if ;s4 m4) 54 = s4; 545 == .9; s4b == .9; 54c == .9; opic 045 --- CASE success/failure `Aft ICOWPUTER.=AIDED.=SOFTWARE.=EfioINEERINCI\(CASE\)/ { s9a =4 5 JCCESSIFAIL;SUCCEED;PRODUCTIVITYIIMP9OVEMENT/ s5b +4 9; 59 = s5a s9b; if (s5 > m5) m5 = 59; s54 == .9; s5b == .9; opic 046 --- computer virus outbreak MPt"ER;INTERNET/ 56a 4= 9; RuS;wo~i s6b 4= 9; T9REAKIORGANIIATION;VICTIMICOMPANYICORPICO\. IINC\. ILTD\. INCORPORATED, s6c 4= 5; s6 = sEa sEb sEc; if (sE SE; mE = sE; sEa == .9; sEb == .9; sEt == .9; Opic 047 -~ - computer system contracts over 1 MO MPITER;COMMUNICATION;INFORMATION, sla 4= 5; ~RACT;ACQUIIPROCUREIAWARDIPURCNASE/ s7b == 5 ILIONIOOO\.OO0IBILLION, s7c 4= 5 =7 = s7a s7b slc; Lf (57 4 m7) m7 = s7; "75 == .9; 37b == .9; .`7t == 9; )Pic 049 -- - purchaae of modern communications equipment LAN i~Al AREA NETINETWORK NANAcEME?~r; FAX/ ( s9a 4= 5 tCHASEiACOUI;PROCUREIAWARDICONTACT, { sBb 4= 5 `B = ass a9b; f (SB 4 aB) aB = 89; Ba == .9; Bb == .9; pic 049 -- - supercomputer OParation~programssing/pu~c~885 ERCOMPUTBRPCRAYIZBM.=30901 aBa == 9; ) TRECscore.Q.41 -50.gawk IPURCHASE I ACQUI PROCURE I AWARD I CONTACT I RESEARCN I OPERAT I PROORAM I CENTER s9 = s9a s9b if 1s9 > 591 59 = s9; s9a =4 .9; s9b =4 .9; * topic 050 --- military interest in Virtual Reality IVIRTUAL REALITY; VR I\IVR\IICYBERSPACE/ I slOa == 5; IMILITARYIARWYIARPAINAVYIAIR FORCEIWARINEI DOD IDEFENSEIDEFENCE, I slOb +4 5 all = aba slob if (410 > slOl slO 4 slO; slOa =4 .9; slOb == .9;