lambdamart_profiles — LLM Ranking Task

Is the run manual or automatic?: automatic
Did you use the response metadata?: yes
Did you use any additional data or external knowledge?: yes
Did you use the development set?: yes
Did you train on the development set?: yes
Provide a description of this run, including details about your answers above.: This run uses listwise learning to rank with LambdaMART to predict query-LLM relevance. Each LLM is represented by a behavioral profile built with the Qwen3-Embedding-8B model, which encodes domain expertise (10 categories), task capabilities (7 types), and response metrics (log-probs, refusal rate, confidence). For each query, a 64-dimensional feature vector is constructed, combining profile features, query and statistical descriptors.
Priority for pooling: 1 (top)