TREC 2025 Proceedings

mllm-indelab-09-17

Submission Details

Organization
indelab
Track
Million LLM
Task
LLM Ranking Task
Date
2025-09-18

Run Description

Is the run manual or automatic?
automatic
Did you use the response metadata?
yes
Did you use any additional data or external knowledge?
no
Did you use the development set?
yes
Did you train on the development set?
yes
Provide a description of this run, including details about your answers above.
The mllm-indelab-09-17 submission is an automatic run using an ensemble of five dual-encoder neural ranking models with query embeddings processed through 4-head attention (384→96×4→384) and dense layers (384→256→192→128), while LLM representations use learned embeddings (1131→256) followed by a 4-layer tower (256→512→384→256→128). Final ranking uses cosine similarity between 128D representations, with ensemble scores averaged across the 5 best-performing cross-validation folds (selected post-hoc based on validation nDCG@10). Training data consisted of 4.45M examples: 90% of the development set queries per fold (~347K examples with human qrels 0-3) plus 4.06M weakly labeled examples generated from discovery response metadata (specifically if response length > 50 characters then qrel = 1.0, otherwise qrel = 0.0). For 10-fold cross-validation, each fold trained on the training subset of the development queries plus all weakly labeled examples, and validated on the held-out 10% of original dev queries. No additional data or external knowledge was used beyond the provided development and discovery datasets.
Priority for pooling
1 (top)

Evaluation Files