runid1 — Retrieval Task

Submission Details

Organization: ufmg
Track: Tip-of-the-Tongue Search
Task: Retrieval Task
Date: 2025-09-10

Run Description

Please describe in details how this run was generated: This run was generated using a Direct Preference Optimization (DPO)-based query rewriting system, where a single general-purpose language model was fine-tuned to align its rewrites with the preferences of dense and cross-encoder retrieval systems. The model was applied uniformly across all queries — without domain classification. 1. Rewrite Model with DPO A fixed pool of rewrite candidates was generated for each training query using a base language model. These candidates were ranked by: - A dense retriever using all-mpnet-base-v2 over the corpus - A cross-encoder reranker using cross-encoder/ms-marco-MiniLM-L-12-v2 From these scores, preference pairs were derived based on improvements in rank or cross encoder logit of the target item. These preferences were then used to fine-tune two separate LoRA adapters on top of meta-llama/Llama-3.1-8B-Instruct via DPO. Both adapters were trained using the same general-purpose query distribution (no domain filtering), enabling the system to generalize across a wide range of vague, incomplete, or ambiguous user inputs. 2. Rewrite Inference At test time: - All queries were passed through the same fixed pipeline, with no classification or routing. - The dense rewrite was generated using the adapter and a prompt instructing the model to expand on vague details and maintain specificity. - The cross rewrite was generated using the adapter and a prompt encouraging compact, fact-based formulations. 3. Retrieval via Tree-of-Thoughts (ToT) The generated rewrites were passed to a Tree-of-Thoughts search module, which simulates iterative refinement of the query through LLM-generated hypotheses ("thoughts") and new rewrites: - Embedding generation used all-mpnet-base-v2- - Dense retrieval was performed using cosine similarity on a normalized vector index - Reranking was done with the same cross-encoder used during training - The search proceeds greedily by expanding nodes that produce higher reranking scores - Final result aggregation is done at the document level using the reranking score
Specify datasets used in this run.: ["This year's TREC TOT training data"]
(if you checked "other", describe here)
Are you 100% confident that no data from https://github.com/microsoft/Tip-of-the-Tongue-Known-Item-Retrieval-Dataset-for-Movie-Identification or iRememberThisMovie.com (besides the training data provided as part of this year's track) was used for producing this run (including any data used for pretraining models that you are building on top of)?: no
Did you use any of the official baseline runs in any way to produce this run?: no
If you did use any of the official baseline runs in any way to produce this run, please describe how below in sufficient detail (e.g., as reranking candidates or in ensemble with other approaches).

Evaluation Files

runid1.trec_eval (trec_eval)

Paper