duth.hybrid.qwen.cal — Relevance Judgment subtask

Is this a manual (human intervention) or automatic run?: automatic
Does this run leverage neural networks?: yes
Does this run leverage proprietary models in any step of the retrieval pipeline?: no
Does this run leverage open-weight LLMs (> 5B parameters) in any step of the retrieval pipeline?: no
Does this run leverage smaller open-weight language models in any step of the retrieval pipeline?: yes
What would you categorize this run as?: Multi-Stage Pipeline pointwise
Please provide a short description of this run: Automatic RJ run. Hybrid judge blending Qwen2.5-3B output, Jaccard overlap (narrative↔segment), and normalized baseline (top-20) scores into a confidence; per-topic caps/floors; final calibration (th1=0.40, th2=0.52, th3=0.66, th4=0.78; cap4=2, cap34=5). Focus: strong 3/4 with healthy 2s.
Please give this run a priority for inclusion in manual assessments.: 1 (top)