TREC 2025 Proceedings

garamp_yi9b_t2_v1

Submission Details

Organization
DUTH
Track
Detection, Retr., and Gen for Understanding News
Task
Report Generation Task
Date
2025-08-23

Run Description

Is this run manual or automatic?
automatic
Is this run based on the provided starter kit?
no
Briefly describe this run
BM25 retrieval with Pyserini over the MS MARCO V2.1 (Segmented) Lucene index. For each topic we retrieve k=40 segments and keep up to 8 evidence passages after de-dup/length filtering. A single LLM pass (Yi-1.5-9B-Chat) produces a ≤250-word report in ~4 sentences; each sentence cites up to 3 MS MARCO segment docids. Post-processing validates JSON, clips citations to ≤3, and aligns outputs 1:1 with the official topic list.
What other datasets or services (e.g. Google/Bing web search, ChatGPT, Perplexity, etc.)were used in producing the run?
MS MARCO V2.1 (Segmented) prebuilt Lucene index; Pyserini/Anserini (Lucene, Java 21); Hugging Face Transformers; local GPU inference. No external web data beyond the MS MARCO collection.
Briefly describe LLMs used for this run (optional)
01-ai/Yi-1.5-9B-Chat via Hugging Face Transformers (temperature≈0.2, top_p≈0.9, max_new_tokens capped to fit the context window).The instruction asks the model to write a well-attributed trustworthiness report grounded only in the provided MS MARCO segments: discuss source bias/motivation, assess the article’s cited evidence, and present alternative viewpoints. Hard constraints: 3–5 sentences, total ≤250 words; each sentence includes up to three MS MARCO segment docids as citations. If the collection lacks evidence for a sub-question, the model must skip it rather than speculate. Outputs are validated (JSON, citation count/format) and aligned to the official topic list.
Please give this run a priority for inclusion in manual assessments.
3

Evaluation Files

Paper