TREC 2025 Proceedings
duth_stablelm2_rj_v1
Submission Details
- Organization
- DUTH
- Track
- Retrieval-Augmented Generation
- Task
- Relevance Judgment subtask
- Date
- 2025-08-28
Run Description
- Is this a manual (human intervention) or automatic run?
- manual
- Does this run leverage neural networks?
- yes
- Does this run leverage proprietary models in any step of the retrieval pipeline?
- no
- Does this run leverage open-weight LLMs (> 5B parameters) in any step of the retrieval pipeline?
- no
- Does this run leverage smaller open-weight language models in any step of the retrieval pipeline?
- yes
- What would you categorize this run as?
- Multi-Stage Pipeline pointwise
- Please provide a short description of this run
- Automatic run for the Relevance Judgment subtask. We use an open-weight LLM (stabilityai/stablelm-2-1_6b-chat) as an automatic assessor at the segment level. The prompt encodes the TREC rubric (0–4); decoding is deterministic (do_sample=False, max_new_tokens=16). The model outputs
"LABEL, CONFIDENCE" which we parse to produce lines: qid Q0 docid label confidence run_id. We submit exactly top-k segments
- Please give this run a priority for inclusion in manual assessments.
- 1 (top)
Evaluation Files
Paper