duth.hybrid.stableri — Relevance Judgment subtask

Is this a manual (human intervention) or automatic run?: automatic
Does this run leverage neural networks?: yes
Does this run leverage proprietary models in any step of the retrieval pipeline?: no
Does this run leverage open-weight LLMs (> 5B parameters) in any step of the retrieval pipeline?: no
Does this run leverage smaller open-weight language models in any step of the retrieval pipeline?: yes
What would you categorize this run as?: Multi-Stage Pipeline pointwise
Please provide a short description of this run: Automatic RJ run with StableLM-2-1.6B. Same hybrid confidence (LLM + Jaccard + baseline). Calibrated to emphasize recall at label=2 (floor-2=4; th1=0.30, th2=0.38, th3=0.56, th4=0.70; cap4=2, cap34=6). Focus: many trustworthy 2s plus some 3/4.
Please give this run a priority for inclusion in manual assessments.: 2