TREC 2025 Proceedings
lightgbm_job266431
Submission Details
- Organization
- uvairlab
- Track
- Million LLM
- Task
- LLM Ranking Task
- Date
- 2025-09-21
Run Description
- Is the run manual or automatic?
- automatic
- Did you use the response metadata?
- yes
- Did you use any additional data or external knowledge?
- no
- Did you use the development set?
- yes
- Did you train on the development set?
- no
- Provide a description of this run, including details about your answers above.
- We employ LightGBM to rank LLMs using global statistics derived from their responses across all queries. Specifically, we compute the global average and standard deviation of response confidence, where the statistics are estimated from the top 1% (1 out of every 100) confidence scores to reduce the influence of low-confidence outputs. As supervision labels, we use weak signals derived from query–LLM pairs as training labels. For model evaluation, we use the development set and qrels to produce metrics such as NDCG@5, NDCG@10, and MRR.
- Priority for pooling
- 1 (top)
Evaluation Files
Paper