TREC 2025 Proceedings
Team02_Run01_1000SegmentsExpansion
Submission Details
- Organization
- SCIAI
- Track
- Detection, Retr., and Gen for Understanding News
- Task
- Report Generation Task
- Date
- 2025-08-15
Run Description
- Is this run manual or automatic?
- automatic
- Is this run based on the provided starter kit?
- no
- Briefly describe this run
- A set of 60 questions are generated based on the article contents via three LLM calls. These questions are narrowed down to 10 using a pre trained model that ranks questions and by removing questions that are too similar to other questions. These questions are used to generate additional queries. Each query is used to retrieve the top 1000 segments from MS MARCO V2.1 (Segmented), followed by reranking techniques and a LLM being used to select the most relevant segments for each question. An LLM then answers as many questions as possible using the retrieved segments before hitting the 250 word count limit in the final report.
- What other datasets or services (e.g. Google/Bing web search, ChatGPT, Perplexity, etc.)were used in producing the run?
- ChatGPT (gpt-4o model), Llama pretrained model ("s-emanuilov/query-expansion-Qwen2.5-7B-GGUF", specifically using "query-expansion.Q4_K_M.gguf"), Yet Another Keyword Extractor (YAKE), NLTK data (stopwords_en, averaged_perceptron_tagger_eng)
- Briefly describe LLMs used for this run (optional)
- Please give this run a priority for inclusion in manual assessments.
- 1 (top)
Evaluation Files
Paper