TREC 2025 Proceedings
cfda-auto-4
Submission Details
- Organization
- cfdalab
- Track
- Interactive Knowledge Acquisition Track
- Task
- Passage Ranking and Response Generation
- Date
- 2025-07-27
Run Description
- What type of manually annotated information does the system use?
- automatic: system does not use any manually annotated data and relies only on the user utterance and system responses (canonical responses of previous turns)
- How is conversation understanding (NLP/rewriting) performed in this run (check all that apply)?
- ['method identifies sub-topic switches', 'method uses large language models like LLaMA and GPT-x.']
- What data is used for conversational query understanding in this run (check all that apply)?
- ['method uses other external data (please specify in the external resources field below)']
- How is ranking performed in this run (check all that apply)?
- ['method uses learned sparse retrieval (e.g.¸ SPLADE¸ etc.)', 'method performs re-ranking with a pre-trained neural language model (BERT¸ Roberta¸ T5¸ etc.) (please describe specifics in the description field below)', 'method uses other ranking method (please describe below)']
- What data is used to develop the ranking method in this run (check all that apply)?
- ['method uses provided automatic baseline', 'method is trained on other datasets (please describe below)']
- Please specify all the methods used to handle feedback or clarification responses from the user (check all that apply).
- ['method does not treat them specially']
- Please describe the method used to generate the final conversational responses from one or more retrieved passages (check all that apply).
- ['method uses multiple sources (multiple passages)', 'method uses large language models to generate the summary.']
- Please describe how you integrate the PTKBs in your run (check all that apply)
- [" method integrates PTKBs in the response generation method (e.g. include in the LLM's prompt)"]
- Which LLM did you use to generate the final response?
- ['method uses closed-source commercial LLMs (e.g. GPT-*)']
- Please describe the external resources used by this run, if applicable.
- This run uses the collection and dataset from SCAI-QReCC-22 as the primary source of training data.
Additionally, we incorporate selected data from InfoCQR to construct qrel files for training the AdaRewriter reward model.
- Please provide a short description of this run.
- Overall Pipeline of This Work:
0) Query Rewriting
We use CHIQ-AD and LLM4CS, two prompt-based rewriting methods, to generate N=10 rewrite candidates per query independently. Then, AdaRewriter, a learned reward model, selects the best rewrite from each method by scoring the candidates.
1) Passage Ranking
We perform sparse retrieval using SPLADE for both selected rewrites separately. The retrieved results are fused using RRF, followed by reranking using naver/trecdl22-crossencoder-debertav3.
2) PTKB Statement Classification
We implement a prompt-based method to simulate dynamic PTKBs. For each conversation turn, we classify relevant PTKB statements based on relevance using gpt-4o-mini.
3) Response Generation
We use GPT-4o-mini to summarize the top-k retrieved passages and selected PTKBs. The final response is generated using the same model with tailored prompts.
====
Component Details:
1) CHIQ-AD
A prompt-based query rewriting method originally designed with a History Summarization (HS) module.
DIFF: In our setup, we omit the HS component.
Generation is performed with gpt-4o-mini.
2) LLM4CS
A prompt-based query rewriting method that typically uses pseudo-responses concatenated with the input.
DIFF: We omit the pseudo-response concatenation step.
Generation is performed with gpt-4o-mini.
3) AdaRewriter
A reward model based on deberta-v3-base, used to score and select the best rewrite among candidates.
DIFF: We remove dense retrieval score terms from its scoring function, since our pipeline relies solely on sparse retrieval.
Training was performed for up to 13 epochs using the QReCC dataset, supplemented with qrel files built from InfoCQR dataset.
- Please provide a priority for assessing this run. (If resources do not allow all runs to be assessed, NIST will work in priority order, resolving ties arbitrarily).
- 2
Evaluation Files
Paper