TREC 2025 Proceedings

bm25_hedges_neg

Submission Details

Organization
UAmsterdam
Track
Tip-of-the-Tongue Search
Task
Retrieval Task
Date
2025-09-10

Run Description

Please describe in details how this run was generated
Corpus and index: TREC ToT 2025 Wikipedia JSONL; PyTerrier/Terrier index over title + full text. Software/config: PyTerrier 0.10.0, Terrier 5.11, parse=false. Query processing: Parser-safe normalization, then hedge/uncertainty removal using a fixed lexicon (data/hedges.txt). Removal is case-insensitive, phrase-level, longest-first; only matched hedge phrases are deleted and content words are preserved. Negation detection: The normalized (pre-removal) query text is analyzed for negation cues. We match single-token cues (not, no, never, without, cannot), two-token not forms (do/does/did/is/are/was/were/should/could/would/will), and split contractions (e.g., “don t”, “isn t”). We capture up to four subsequent tokens and retain a span only if it includes an attribute head from data/neg_heads.txt (version/remake/year/language/color/cut/etc.). Retrieval: BM25 first-stage retrieval on the hedges-removed query, returning 1000 documents per query. Negation-aware re-scoring: After retrieval, we penalize candidates whose title/lead foreground a negated span: −2.0 if matched in ~first 128 chars; −1.0 if in the early lead (~first 400 chars). No hard filtering; body-only mentions are not penalized. Ranking/output: Sort by adjusted score; guarantee exactly 1000 per query (top up from base BM25 if needed); TREC format with run_id bm25_hedges_neg. External resources/baselines: No LLMs or official baseline runfiles used. Run type: Automatic.
Specify datasets used in this run.
['Other']
(if you checked "other", describe here)
none
Are you 100% confident that no data from https://github.com/microsoft/Tip-of-the-Tongue-Known-Item-Retrieval-Dataset-for-Movie-Identification or iRememberThisMovie.com (besides the training data provided as part of this year's track) was used for producing this run (including any data used for pretraining models that you are building on top of)?
Yes I am confident that no data from those sources except the official track training data was used to produce this run
Did you use any of the official baseline runs in any way to produce this run?
no
If you did use any of the official baseline runs in any way to produce this run, please describe how below in sufficient detail (e.g., as reranking candidates or in ensemble with other approaches).

Evaluation Files

Paper