TREC 2025 Proceedings

duth_stablelm2_rj_v1

Submission Details

Organization
DUTH
Track
Retrieval-Augmented Generation
Task
Relevance Judgment subtask
Date
2025-08-28

Run Description

Is this a manual (human intervention) or automatic run?
manual
Does this run leverage neural networks?
yes
Does this run leverage proprietary models in any step of the retrieval pipeline?
no
Does this run leverage open-weight LLMs (> 5B parameters) in any step of the retrieval pipeline?
no
Does this run leverage smaller open-weight language models in any step of the retrieval pipeline?
yes
What would you categorize this run as?
Multi-Stage Pipeline pointwise
Please provide a short description of this run
Automatic run for the Relevance Judgment subtask. We use an open-weight LLM (stabilityai/stablelm-2-1_6b-chat) as an automatic assessor at the segment level. The prompt encodes the TREC rubric (0–4); decoding is deterministic (do_sample=False, max_new_tokens=16). The model outputs "LABEL, CONFIDENCE" which we parse to produce lines: qid Q0 docid label confidence run_id. We submit exactly top-k segments
Please give this run a priority for inclusion in manual assessments.
1 (top)

Evaluation Files

Paper