runid3 — Retrieval Task

Submission Details

Organization: ufmg
Track: Tip-of-the-Tongue Search
Task: Retrieval Task
Date: 2025-09-10

Run Description

Please describe in details how this run was generated: This run was generated using a Dense + Cross Encoder preference-based rewriting pipeline, in which a LLaMA 3.1–8B model was fine-tuned using Direct Preference Optimization (DPO) to generate reformulations aligned with the behavior of both dense and cross-encoder retrievers. 1. Rewrite Preference Modeling via DPO We began by creating a pool of candidate rewrites for each training query using a pretrained LLM. These rewrites were evaluated using two retrieval modules: - Dense retriever: MPNet embeddings over the corpus. - Cross-encoder reranker: cross-encoder/ms-marco-MiniLM-L-12-v2. For each query, pairwise preferences were derived by comparing rewrites based on their downstream retrieval performance (e.g., higher-ranked retrieved results). We applied a threshold over the NDCG difference or rank delta to filter consistent preference pairs. These preference pairs were then used to train a DPO objective on the base model meta-llama/Llama-3.1-8B-Instruct, resulting in several LoRA adapters specialized for rewrite generation in different domains 2. Prompt-Guided Inference for Rewrites At inference time, for each test query: - A domain classifier (movie vs. other) routed the input to the appropriate set of adapters and prompts. - The LLaMA model loaded the selected LoRA adapter and generated multiple rewrites: Three rewrites using different dense-focused adapters and prompts One cross-encoder-aligned rewrite using the cross adapter - Each rewrite was generated with a specialized prompt, instructing the model to: Be precise, short, and factual (for reranking) Preserve user-specified context and details 3. Retrieval via Tree-of-Thoughts Framework The output rewrites were passed into a Tree-of-Thoughts (ToT) search module: - Dense embedding retrieval using MPNet - Cross-encoder reranking. - Greedy tree expansion with LLM-generated thoughts and rewrites - Node evaluation based on reranker score - Final aggregation of results based on score
Specify datasets used in this run.: ["This year's TREC TOT training data"]
(if you checked "other", describe here)
Are you 100% confident that no data from https://github.com/microsoft/Tip-of-the-Tongue-Known-Item-Retrieval-Dataset-for-Movie-Identification or iRememberThisMovie.com (besides the training data provided as part of this year's track) was used for producing this run (including any data used for pretraining models that you are building on top of)?: no
Did you use any of the official baseline runs in any way to produce this run?: no
If you did use any of the official baseline runs in any way to produce this run, please describe how below in sufficient detail (e.g., as reranking candidates or in ensemble with other approaches).

Evaluation Files

runid3.trec_eval (trec_eval)

Paper