scrb-tot-01 — Retrieval Task

Submission Details

Organization: SRCB
Track: Tip-of-the-Tongue Search
Task: Retrieval Task
Date: 2025-08-25

Run Description

Please describe in details how this run was generated: A pipeline composed of Dense Retriever, Reranker, and LLM Reranker Query processing: all queries are converted to a list of cues by DeepSeek-V3 Dense Retriever based on Qwen/Qwen3-Embedding-8B: - For movie domain: finetuned on movie data (augmented data based on train, dev1, and 5000 samples from tomt-kis dataset), creating index for 500k+ movie docs filtered by wikidata properties - For other domain: use the original Qwen3-Emebedding-8b to create the index for all docs Reranker: finetuned Qwen3-Reranker-8B on augmented data based on train, dev1, dev2 and 300 samples from tomt-kis dataset. Rerank top 2000 results from the retriever. Listwise Reranker using Deepseek-V3: Rerank top-20 results from the reranker
Specify datasets used in this run.: ["This year's TREC TOT training data", 'Other']
(if you checked "other", describe here): Additional movie data from webis/tip-of-my-tongue-known-item-search-triplets Wikidata dumps used to filter documents of movie domain
Are you 100% confident that no data from https://github.com/microsoft/Tip-of-the-Tongue-Known-Item-Retrieval-Dataset-for-Movie-Identification or iRememberThisMovie.com (besides the training data provided as part of this year's track) was used for producing this run (including any data used for pretraining models that you are building on top of)?: no
Did you use any of the official baseline runs in any way to produce this run?: no
If you did use any of the official baseline runs in any way to produce this run, please describe how below in sufficient detail (e.g., as reranking candidates or in ensemble with other approaches).

Paper