TREC 2025 Proceedings
garamp_dragun_t2_q7b
Submission Details
- Organization
- DUTH
- Track
- Detection, Retr., and Gen for Understanding News
- Task
- Report Generation Task
- Date
- 2025-08-23
Run Description
- Is this run manual or automatic?
- automatic
- Is this run based on the provided starter kit?
- no
- Briefly describe this run
- BM25 retrieval with Pyserini over the MS MARCO V2.1 (Segmented) Lucene index. For each topic we retrieve k=40 segments and keep up to 8 evidence passages after de-dup/length filtering. A single LLM pass (Qwen2.5-7B-Instruct) generates a <=250-word report in 4 sentences; each sentence cites up to 3 MS MARCO segment docids. Post-processing validates JSON, clips citations to <=3, and aligns outputs 1-to-1 with the official topics list.
- What other datasets or services (e.g. Google/Bing web search, ChatGPT, Perplexity, etc.)were used in producing the run?
- MS MARCO V2.1 (Segmented) prebuilt Lucene index; Pyserini/Anserini (Lucene, Java 21); Hugging Face Transformers; local GPU inference. No external web data beyond the MS MARCO collection.
- Briefly describe LLMs used for this run (optional)
- Qwen/Qwen2.5-7B-Instruct. Inference via HF Transformers (temperature=0.2, top_p=0.9, max_new_tokens capped to fit context window).Instruction prompts the model to write a well-attributed trustworthiness report grounded only in the provided MS MARCO segments: discuss source bias/motivation, assess the article’s cited evidence, and include alternative viewpoints. Hard constraints: 3–5 sentences total <=250 words; each sentence includes up to three MS MARCO segment docids as citations; if the collection lacks evidence for a sub-question, the model must skip it rather than speculate. Outputs are JSONL with per-topic objects and validated after generation.
- Please give this run a priority for inclusion in manual assessments.
- 2
Evaluation Files
Paper