infolab_UD_run1 — LLM Ranking Task

Is the run manual or automatic?: automatic
Did you use the response metadata?: no
Did you use any additional data or external knowledge?: no
Did you use the development set?: yes
Did you train on the development set?: yes
Provide a description of this run, including details about your answers above.: Run 1: Hierarchical single-index ranking model. Consists of two main components: 1. Cluster Profiles (Broad Scoring) : Answers provided on the discovery datasets were embedded, and the embeddings were clustered to create centroids representing the expertise areas of each LLM. 2. Answer Embedding Index (Refine Scoring): Each LLM's answers were stored in an index for fast nearest-neighbor retrieval. At inference: New query -> encoded -> compared to each LLM centroid -> max similarity -> broad score -> searched against each index -> top k most similar answers are retrieved -> averaged score -> refine score -> Final score (Weighted combination of both broad and refine scores) *Run Type: Automatic and Internal – no manual intervention was performed after seeing the test queries. And, only the discovery data was used to build profiles
Priority for pooling: 1 (top)