The Thirty-Third Text REtrieval Conference
(TREC 2024)

Medical Video Question Answering Query-focused instructional step captioning task Appendix

Runtag	Org	Describe the LLMs/LVLMs used to generate the captions	Please provide a short description of this run	Additional Details/Comments	Please give this run a priority for inclusion in manual assessments
LLaVA-NeXT-Video-32B-Qwen + GPT4o	PolySmart	LLaVA-NeXT-Video-32B-Qwen + GPT4o	LLaVA-NeXT-Video-32B-Qwen + GPT4o	LLaVA-NeXT-Video-32B-Qwen + GPT4o	1 (highest priority)
chatGPT_zeroshot_prompt (paper)	DoshishaUzlDfki	GPT-4o	Firstly we generate the step segment timestamps using the GPT-4o API with our zeroshot prompting and then create the step captions using the same LLM with our meta prompting.	There is no big difference between our first 4 runs. So we assign our first 4 runs an arbitrary order except for the 5th run.	1 (highest priority)
mistral_meta_prompt (paper)	DoshishaUzlDfki	Mistral-large-latest	Firstly we generate the step segment timestamps using the Mistral-large-latest API with our meta prompting and then create the step captions using the same LLM with our meta prompting.	There is no big difference between our first 4 runs. So we assign our first 4 runs an arbitrary order except for the 5th run.	2
mistral_fewshot_prompt (paper)	DoshishaUzlDfki	Mistral-large-latest	Firstly we generate the step segment timestamps using the Mistral-large-latest API with our fewshot prompting and then create the step captions using the same LLM with our meta prompting.	There is no big difference between our first 4 runs. So we assign our first 4 runs an arbitrary order except for the 5th run.	3
GPT_meta_prompt (paper)	DoshishaUzlDfki	GPT-4o	Firstly we generate the step segment timestamps using the GPT-4o API with our meta prompting and then create the step captions using the same LLM with our meta prompting.	There is no big difference between our first 4 runs. So we assign our first 4 runs an arbitrary order except for the 5th run.	4
CoSeg_meta_prompt (paper)	DoshishaUzlDfki	Mistral_large_latest	Firstly we generate the step segment timestamps using our trained CoSeg model and then create the step captions using the Mistral-large-latest API with our meta prompting.	We assign it the 5th run since we think its performance is worse than our first 4 runs.	5 (lowest priority)