Expert search guidelines (9 Aug 2005)
-------------------------------------

At the workshop last year there was significant interest in searching
for relationships between "entities" in an organisation: people,
groups, products etc.  The expert search task is a pilot experiment in
that direction.  The scenario is that the user wants to know "who is
an expert on topic X", enters a topical query and the system retrieves
a ranked list of people.  The system operates based on the query, a
corpus and a list of candidate experts (e.g. a staff list).

We did not ask W3C people to participate in an "expertise survey"
because we were told that they might take a dim view of such an
unsolicited request.  In fact, W3C did not even provide us with a list
of candidate experts.

Instead this year's experiment is based on an existing database of W3C
working groups.  The topics are names of working groups and the
experts are members of the group.  The overall candidate list is the
union of all group members.  It is expert search, because we assume
that the current members of e.g. the "semantic web" working group are
experts on the semantic web (relative to those who are members of
other groups).  There will be no further "relevance assessment" to
expand that list i.e. we take the lists as ground truth.  Using the
lists in this way gives us a pilot entity ranking task, and the
research question is how best to use the corpus for effective entity
ranking.

Participants are provided with:
- The full W3C corpus
- A list of 1092 people, each with a unique personid
- A set of 10 training topics, each with a list of personids
- A set of 50 test topics
Each group can submit up to 5 runs.  A run is a ranked list of up to
100 personids for each test topic.  The best system is one that on
average tends to retrieve the correct working group members near the
top of the ranking, based on standard IR measures.  Deadline is
October 5th.

Ranking approaches might include: statistical analysis of the corpus
with respect to queries and person names.  Models for matching
variants of person names and name disambiguation.  Analysis of
document structure.  Special treatment of different subcollections,
for example looking at message authors in the 'lists' subcollection.
QA techniques.  Extra kudos points for explaining how your expert
ranking can be implemented efficiently, to respond to arbitrary user
queries in real time.

The submission format is the standard TREC format, which is the same
as is being used in the discussion search and known-item tasks:

topic Q0 personid rank sim tag

where
 - topic is the topic number
 - Q0 is a literal 'Q0'
 - personid is the ID of the person
 - rank is your rank for this person
 - sim is a score output by your system
 - tag is the run tag


Interesting issues:
1) When you write about these experiments in a paper or on a Web page,
   try not to use the real email addresses (and maybe even real
   names).  If W3C people get more spam email because we put their
   email addresses online, that is a bad outcome.
2) Not all of the 1092 people are represented in the corpus.  Having
   deployed an expert search system at CSIRO, I can tell you this is
   entirely realistic: given an official list of people, only a subset
   will be represented in your corpus.  It is even possible that we
   lose an entire topic this way.  The task is to do a good job of
   ranking the 80% of candidates that are in the corpus.
3) In the course of designing and debugging your system you may look
   at the training topics and associated person names.  You should
   certainly not look at the 50 test topics.  You should also avoid
   working too closely with the full list of 1092 names.  The usual
   rule in TREC is not to make changes to your system based on
   knowledge of the test topics, and we should try to follow that rule
   here for test topics and the full candidate list.
4) Some pages in the corpus contain the ground truth.  These lists may
   be from out of date documents, whereas the ground truth comes from
   2005.  There are a number of points to make:
   a) Some topics are very well covered by a page in the corpus, and
      we chose some of these topics for our training set.
   b) Do not just hand-tool a ground truth extractor that looks for
      such lists.  Doing so would not be very interesting science, and
      in any case it is not guaranteed to give good results.  An
      interesting participation is one that uses well-founded
      general-purpose techniques for entity ranking.  These techniques
      must be described in the workbook paper so that others can
      replicate your results.
   c) Do not extract ground truth from some external source
      (i.e. outside the corpus) and use that for your submission or
      for training.
   d) In the corpus we may have partial working group info from 1993
      to 2004, and our ground truth comes from 2005.  So it is
      possible to get the right answers for 1999 or the right answers
      for 2004 but still be considered slightly wrong, because our
      ground truth is a slightly different list from 2005.  Note
      however that all topics (or almost all) have useful information
      with respect to 2005 membership.
5) Some people are listed in the ground truth database twice with
   variations in their name or email address.  In cases where I
   detected this, both variants are listed on separate lines of the
   candidates file, but with the same personid.

Files:
ent05.expert.candidates     - the list of possible "experts"
ent05.expert.trainingtopics - a set of 10 training topics
ent05.expert.trainingqrels  - qrels for the training topics
ent05.expert.topics         - the 50 test topics