scrb-tot-03 — Retrieval Task

Submission Details

Organization: SRCB
Track: Tip-of-the-Tongue Search
Task: Retrieval Task
Date: 2025-08-31

Run Description

Please describe in details how this run was generated: A pipeline composed of Dense Retriever, Reranker, LLM Retriever and LLM Reranker Query processing: all queries are converted to a list of cues by DeepSeek-V3 Dense Retriever based on Qwen/Qwen3-Embedding-8B: - For movie domain: finetuned on movie data (augmented data based on train, dev1, and 5000 samples from tomt-kis dataset), creating index for 500k+ movie docs filtered by wikidata properties - For other domain: use the original Qwen3-Emebedding-8b to create the index for all docs Reranker: finetuned Qwen3-Reranker-8B on augmented data based on train, dev1, dev2 and 300 samples from tomt-kis dataset. Rerank top 2000 results from the retriever. LLM Retriever: use DeepSeek-R1 to retrieve up to 10 Wikipedia entities and align them with the doc id in the corpus. Listwise Reranker using Deepseek-V3: We design a three-stage ranking pipeline. First, the LLM retrieval results are inserted into the candidate list starting from rank 6, while ranks 1–5 are preserved from the baseline ranking. Second, we apply DeepSeek-v3 in a listwise ranking setting to reorder candidates from rank 2 through rank 10. Third, from the resulting ranking, we select the top four titles and conduct a fine-grained reranking using GPT-5 with the analyze-ranking strategy. The final output is obtained from this refined ranking.
Specify datasets used in this run.: ["This year's TREC TOT training data", 'Other']
(if you checked "other", describe here): Additional movie data from webis/tip-of-my-tongue-known-item-search-triplets Wikidata dumps used to filter documents of movie domain
Are you 100% confident that no data from https://github.com/microsoft/Tip-of-the-Tongue-Known-Item-Retrieval-Dataset-for-Movie-Identification or iRememberThisMovie.com (besides the training data provided as part of this year's track) was used for producing this run (including any data used for pretraining models that you are building on top of)?: no
Did you use any of the official baseline runs in any way to produce this run?: no
If you did use any of the official baseline runs in any way to produce this run, please describe how below in sufficient detail (e.g., as reranking candidates or in ensemble with other approaches).

Evaluation Files

scrb-tot-03.trec_eval (trec_eval)

Paper