lightgbm_job266431 — LLM Ranking Task

Is the run manual or automatic?: automatic
Did you use the response metadata?: yes
Did you use any additional data or external knowledge?: no
Did you use the development set?: yes
Did you train on the development set?: no
Provide a description of this run, including details about your answers above.: We employ LightGBM to rank LLMs using global statistics derived from their responses across all queries. Specifically, we compute the global average and standard deviation of response confidence, where the statistics are estimated from the top 1% (1 out of every 100) confidence scores to reduce the influence of low-confidence outputs. As supervision labels, we use weak signals derived from query–LLM pairs as training labels. For model evaluation, we use the development set and qrels to produce metrics such as NDCG@5, NDCG@10, and MRR.
Priority for pooling: 1 (top)