rm3_hedge_neg — Retrieval Task

Submission Details

Organization: UAmsterdam
Track: Tip-of-the-Tongue Search
Task: Retrieval Task
Date: 2025-09-10

Run Description

Please describe in details how this run was generated: Corpus and index: TREC ToT 2025 Wikipedia JSONL; PyTerrier/Terrier index over title + full text. Software/config: PyTerrier 0.10.0, Terrier 5.11, terrier-prf plugin; parse=false. Query processing: Parser-safe normalization combined with hedge/uncertainty removal (data/hedges.txt), applied as case-insensitive, phrase-level deletion of hedge phrases only (longest-first). Negation detection: We detect negated spans from the normalized query by matching single-token cues (not/no/never/without/cannot), two-token not forms (do/does/did/is/are/was/were/should/could/would/will), and split contractions arising after normalization (“don t”, “isn t”, etc.). We capture up to four subsequent tokens and retain the span only if it contains an attribute head from data/neg_heads.txt (e.g., version/remake/year/language/color/cut). Retrieval (pseudo-relevance feedback): BM25 with feedback depth 50 on the hedges-removed query → RM3 (fb_docs=10, fb_terms=20) → BM25 final retrieval with 1000 results per query. Negation-aware re-scoring: After final retrieval, we apply a soft penalty if a negated span appears within a candidate’s title-like window (~first 128 chars, −2.0) or early lead (~first 400 chars, −1.0). No hard filtering and no removal of query terms. Ranking/output: Sort by adjusted score; guarantee exactly 1000 docs per query; TREC format with run_id rm3_hedges_neg. External resources/baselines: No LLMs or official baseline runfiles used. Run type: Automatic.
Specify datasets used in this run.: ['Other']
(if you checked "other", describe here): none
Are you 100% confident that no data from https://github.com/microsoft/Tip-of-the-Tongue-Known-Item-Retrieval-Dataset-for-Movie-Identification or iRememberThisMovie.com (besides the training data provided as part of this year's track) was used for producing this run (including any data used for pretraining models that you are building on top of)?: Yes I am confident that no data from those sources except the official track training data was used to produce this run
Did you use any of the official baseline runs in any way to produce this run?: no
If you did use any of the official baseline runs in any way to produce this run, please describe how below in sufficient detail (e.g., as reranking candidates or in ensemble with other approaches).

Evaluation Files

rm3_hedge_neg.trec_eval (trec_eval)

Paper