TREC 2025 Proceedings
mllm-indelab-09-17
Submission Details
- Organization
- indelab
- Track
- Million LLM
- Task
- LLM Ranking Task
- Date
- 2025-09-18
Run Description
- Is the run manual or automatic?
- automatic
- Did you use the response metadata?
- yes
- Did you use any additional data or external knowledge?
- no
- Did you use the development set?
- yes
- Did you train on the development set?
- yes
- Provide a description of this run, including details about your answers above.
- The mllm-indelab-09-17 submission is an automatic run using an ensemble of five dual-encoder neural ranking models with query embeddings processed through 4-head attention (384→96×4→384) and dense layers (384→256→192→128), while LLM representations use learned embeddings (1131→256) followed by a 4-layer tower (256→512→384→256→128). Final ranking uses cosine similarity between 128D representations, with ensemble scores averaged across the 5 best-performing cross-validation folds (selected post-hoc based on validation nDCG@10).
Training data consisted of 4.45M examples: 90% of the development set queries per fold (~347K examples with human qrels 0-3) plus 4.06M weakly labeled examples generated from discovery response metadata (specifically if response length > 50 characters then qrel = 1.0, otherwise qrel = 0.0). For 10-fold cross-validation, each fold trained on the training subset of the development queries plus all weakly labeled examples, and validated on the held-out 10% of original dev queries. No additional data or external knowledge was used beyond the provided development and discovery datasets.
- Priority for pooling
- 1 (top)
Evaluation Files