TREC 2025 Proceedings

garamp_qwen25_14b_r4

Submission Details

Organization
DUTH
Track
Detection, Retr., and Gen for Understanding News
Task
Report Generation Task
Date
2025-08-23

Run Description

Is this run manual or automatic?
automatic
Is this run based on the provided starter kit?
no
Briefly describe this run
BM25 retrieval with Pyserini over the MS MARCO V2.1 (Segmented) Lucene index (msmarco-v2.1-doc-segmented.20240418.4f9675). For each topic we retrieve up to k=40 candidate segments and keep at most 18 evidence passages (dedup/length filtering) to fit the context window. A single LLM pass generates a <=250-word report in 3–5 sentences; each sentence cites up to 3 segment docids. Post-processing clips citations to <=3, validates JSON, and aligns outputs to the official topic list (1 line/topic)
What other datasets or services (e.g. Google/Bing web search, ChatGPT, Perplexity, etc.)were used in producing the run?
MS MARCO V2.1 (Segmented) (prebuilt Lucene index). Pyserini/Anserini (Lucene). Hugging Face Transformers. Local GPU; no manual browsing or external web data beyond the MS MARCO collection.
Briefly describe LLMs used for this run (optional)
Primary model: Qwen/Qwen2.5-14B-Instruct (HF). Inference via Transformers pipeline (temperature=0.2, top_p=0.9, max_new_tokens capped to fit context).Instruction asks the model to write a well-attributed trustworthiness report for the given article, focusing on source bias/motivation, evidence cited in the article, and alternative viewpoints. The prompt explicitly requires (i) grounding only in the provided MS MARCO segments, (ii) 3–5 sentences total <=250 words, and (iii) per-sentence citations of up to three MS MARCO segment docids. If no answer exists in the collection for a sub-question, the model must skip it rather than speculate.
Please give this run a priority for inclusion in manual assessments.
1 (top)

Evaluation Files

Paper