cosine-orconvqa-sum-top10 — Passage Ranking and Response Generation

Submission Details

Organization: guidance
Track: Interactive Knowledge Acquisition Track
Task: Passage Ranking and Response Generation
Date: 2025-07-27

Run Description

What type of manually annotated information does the system use?: automatic: system does not use any manually annotated data and relies only on the user utterance and system responses (canonical responses of previous turns)
How is conversation understanding (NLP/rewriting) performed in this run (check all that apply)?: ['method identifies intent/discourse/sentiment (e.g. feedback¸ clarification¸ etc.)', 'method uses large language models like LLaMA and GPT-x.']
What data is used for conversational query understanding in this run (check all that apply)?: ['method uses iKAT 23 data']
How is ranking performed in this run (check all that apply)?: ['method uses learned sparse retrieval (e.g.¸ SPLADE¸ etc.)', 'method performs re-ranking with large langauge models (LLaMA¸ GPT-x¸ etc.) (please describe specifics in the description field below)']
What data is used to develop the ranking method in this run (check all that apply)?: ['method is trained with TREC Deep Learning Track and/or MS MARCO dataset', 'method is trained on other datasets (please describe below)']
Please specify all the methods used to handle feedback or clarification responses from the user (check all that apply).: ['method does not treat them specially']
Please describe the method used to generate the final conversational responses from one or more retrieved passages (check all that apply).: ['method uses multiple sources (multiple passages)', 'method uses large language models to generate the summary.']
Please describe how you integrate the PTKBs in your run (check all that apply): [' method uses a PTKB relevance model to detect the relevant ones', " method integrates PTKBs in the response generation method (e.g. include in the LLM's prompt)"]
Which LLM did you use to generate the final response?: ['method uses closed-source commercial LLMs (e.g. GPT-*)']
Please describe the external resources used by this run, if applicable.: -GReCC dataset (to train 1st stage ranker)
Please provide a short description of this run.: retrieval: CoSPLADE model with a SPLADEv3 backbone, trained on OrConvQA with cosine loss, history size of 1. generation: gpt-4.1 pipeline to extract relevant PTKBs, to detect the need for clarification question, and to answer the user's query. use top-10 documents from retrieval and summarize each of them before querying.
Please provide a priority for assessing this run. (If resources do not allow all runs to be assessed, NIST will work in priority order, resolving ties arbitrarily).: 1 (top)

Evaluation Files

cosine-orconvqa-sum-top10.docs_trec_eval (docs_trec_eval)
cosine-orconvqa-sum-top10.ptkb_trec_eval (ptkb_trec_eval)

Paper