TREC 2025 Proceedings

Team02_Run02_100SegmentsExpansion

Submission Details

Organization
SCIAI
Track
Detection, Retr., and Gen for Understanding News
Task
Report Generation Task
Date
2025-08-15

Run Description

Is this run manual or automatic?
automatic
Is this run based on the provided starter kit?
no
Briefly describe this run
A set of 60 questions are generated based on the article contents via three LLM calls. These questions are narrowed down to 10 using a pre trained model that ranks questions and by removing questions that are too similar to other questions. These questions are used to generate additional queries. Each query is used to retrieve the top 100 segments from MS MARCO V2.1 (Segmented), followed by reranking techniques and a LLM being used to select the most relevant segments for each question. An LLM then answers as many questions as possible using the retrieved segments before hitting the 250 word count limit in the final report.
What other datasets or services (e.g. Google/Bing web search, ChatGPT, Perplexity, etc.)were used in producing the run?
ChatGPT (gpt-4o model), Llama pretrained model ("s-emanuilov/query-expansion-Qwen2.5-7B-GGUF", specifically using "query-expansion.Q4_K_M.gguf"), Yet Another Keyword Extractor (YAKE), NLTK data (stopwords_en, averaged_perceptron_tagger_eng)
Briefly describe LLMs used for this run (optional)
Please give this run a priority for inclusion in manual assessments.
2

Evaluation Files

Paper