Expert search guidelines (9 Aug 2005) ------------------------------------- At the workshop last year there was significant interest in searching for relationships between "entities" in an organisation: people, groups, products etc. The expert search task is a pilot experiment in that direction. The scenario is that the user wants to know "who is an expert on topic X", enters a topical query and the system retrieves a ranked list of people. The system operates based on the query, a corpus and a list of candidate experts (e.g. a staff list). We did not ask W3C people to participate in an "expertise survey" because we were told that they might take a dim view of such an unsolicited request. In fact, W3C did not even provide us with a list of candidate experts. Instead this year's experiment is based on an existing database of W3C working groups. The topics are names of working groups and the experts are members of the group. The overall candidate list is the union of all group members. It is expert search, because we assume that the current members of e.g. the "semantic web" working group are experts on the semantic web (relative to those who are members of other groups). There will be no further "relevance assessment" to expand that list i.e. we take the lists as ground truth. Using the lists in this way gives us a pilot entity ranking task, and the research question is how best to use the corpus for effective entity ranking. Participants are provided with: - The full W3C corpus - A list of 1092 people, each with a unique personid - A set of 10 training topics, each with a list of personids - A set of 50 test topics Each group can submit up to 5 runs. A run is a ranked list of up to 100 personids for each test topic. The best system is one that on average tends to retrieve the correct working group members near the top of the ranking, based on standard IR measures. Deadline is October 5th. Ranking approaches might include: statistical analysis of the corpus with respect to queries and person names. Models for matching variants of person names and name disambiguation. Analysis of document structure. Special treatment of different subcollections, for example looking at message authors in the 'lists' subcollection. QA techniques. Extra kudos points for explaining how your expert ranking can be implemented efficiently, to respond to arbitrary user queries in real time. The submission format is the standard TREC format, which is the same as is being used in the discussion search and known-item tasks: topic Q0 personid rank sim tag where - topic is the topic number - Q0 is a literal 'Q0' - personid is the ID of the person - rank is your rank for this person - sim is a score output by your system - tag is the run tag Interesting issues: 1) When you write about these experiments in a paper or on a Web page, try not to use the real email addresses (and maybe even real names). If W3C people get more spam email because we put their email addresses online, that is a bad outcome. 2) Not all of the 1092 people are represented in the corpus. Having deployed an expert search system at CSIRO, I can tell you this is entirely realistic: given an official list of people, only a subset will be represented in your corpus. It is even possible that we lose an entire topic this way. The task is to do a good job of ranking the 80% of candidates that are in the corpus. 3) In the course of designing and debugging your system you may look at the training topics and associated person names. You should certainly not look at the 50 test topics. You should also avoid working too closely with the full list of 1092 names. The usual rule in TREC is not to make changes to your system based on knowledge of the test topics, and we should try to follow that rule here for test topics and the full candidate list. 4) Some pages in the corpus contain the ground truth. These lists may be from out of date documents, whereas the ground truth comes from 2005. There are a number of points to make: a) Some topics are very well covered by a page in the corpus, and we chose some of these topics for our training set. b) Do not just hand-tool a ground truth extractor that looks for such lists. Doing so would not be very interesting science, and in any case it is not guaranteed to give good results. An interesting participation is one that uses well-founded general-purpose techniques for entity ranking. These techniques must be described in the workbook paper so that others can replicate your results. c) Do not extract ground truth from some external source (i.e. outside the corpus) and use that for your submission or for training. d) In the corpus we may have partial working group info from 1993 to 2004, and our ground truth comes from 2005. So it is possible to get the right answers for 1999 or the right answers for 2004 but still be considered slightly wrong, because our ground truth is a slightly different list from 2005. Note however that all topics (or almost all) have useful information with respect to 2005 membership. 5) Some people are listed in the ground truth database twice with variations in their name or email address. In cases where I detected this, both variants are listed on separate lines of the candidates file, but with the same personid. Files: ent05.expert.candidates - the list of possible "experts" ent05.expert.trainingtopics - a set of 10 training topics ent05.expert.trainingqrels - qrels for the training topics ent05.expert.topics - the 50 test topics