bm25_negations — Retrieval Task

Submission Details

Organization: UAmsterdam
Track: Tip-of-the-Tongue Search
Task: Retrieval Task
Date: 2025-09-10

Run Description

Please describe in details how this run was generated: Corpus and index: TREC ToT 2025 Wikipedia JSONL; PyTerrier/Terrier index over title + full text. Software/config: PyTerrier 0.10.0, Terrier 5.11, parse=false. Query processing: Parser-safe normalization only (remove problematic punctuation as described above). We do not remove hedge phrases in this run; the normalized query text is used as-is for retrieval. Negation detection: We detect negation cues in the normalized query by matching single-token cues (not, no, never, without, cannot), two-token forms ( not for do/does/did/is/are/was/were/should/could/would/will), and split-contraction forms that arise after normalization (e.g., “don t”, “isn t”). After a cue, we capture up to four subsequent tokens and keep a span only if it contains an attribute head from a fixed vocabulary (data/neg_heads.txt: version/remake/year/language/color/cut/etc.). This focuses on facets users commonly negate and avoids misinterpreting “No …” titles (which rarely include such heads). Retrieval: BM25 first-stage retrieval on the normalized query, returning 1000 documents per query. We do not remove the negated words from the query. Negation-aware re-scoring: After retrieval, we examine each candidate’s title plus the first ~1000 characters of the page (lead). If a negated span appears within the first ~128 characters (title-like window), we subtract 2.0 from the score; if it appears within the early lead (~first 400 characters), we subtract 1.0. No hard filtering is applied; body-only mentions are not penalized. This preserves recall and only demotes items that strongly foreground the negated facet. Ranking/output: Sort by the adjusted score; ensure exactly 1000 documents per query by topping up from the BM25 list if needed; TREC format with run_id bm25_negpen. External resources/baselines: No LLMs or official baseline runfiles used. Run type: Automatic.
Specify datasets used in this run.: ['Other']
(if you checked "other", describe here): none
Are you 100% confident that no data from https://github.com/microsoft/Tip-of-the-Tongue-Known-Item-Retrieval-Dataset-for-Movie-Identification or iRememberThisMovie.com (besides the training data provided as part of this year's track) was used for producing this run (including any data used for pretraining models that you are building on top of)?: Yes I am confident that no data from those sources except the official track training data was used to produce this run
Did you use any of the official baseline runs in any way to produce this run?: no
If you did use any of the official baseline runs in any way to produce this run, please describe how below in sufficient detail (e.g., as reranking candidates or in ensemble with other approaches).

Evaluation Files

bm25_negations.trec_eval (trec_eval)

Paper