The Thirty-Third Text REtrieval Conference
(TREC 2024)

Medical Video Question Answering Query-focused instructional step captioning task Appendix

RuntagOrgDescribe the LLMs/LVLMs used to generate the captionsPlease provide a short description of this runAdditional Details/CommentsPlease give this run a priority for inclusion in manual assessments
LLaVA-NeXT-Video-32B-Qwen + GPT4o PolySmart
LLaVA-NeXT-Video-32B-Qwen + GPT4o
LLaVA-NeXT-Video-32B-Qwen + GPT4o
LLaVA-NeXT-Video-32B-Qwen + GPT4o
1 (highest priority)
chatGPT_zeroshot_prompt (paper)DoshishaUzlDfki
GPT-4o
Firstly we generate the step segment timestamps using the GPT-4o API with our zeroshot prompting and then create the step captions using the same LLM with our meta prompting.
There is no big difference between our first 4 runs. So we assign our first 4 runs an arbitrary order except for the 5th run.
1 (highest priority)
mistral_meta_prompt (paper)DoshishaUzlDfki
Mistral-large-latest
Firstly we generate the step segment timestamps using the Mistral-large-latest API with our meta prompting and then create the step captions using the same LLM with our meta prompting.
There is no big difference between our first 4 runs. So we assign our first 4 runs an arbitrary order except for the 5th run.
2
mistral_fewshot_prompt (paper)DoshishaUzlDfki
Mistral-large-latest
Firstly we generate the step segment timestamps using the Mistral-large-latest API with our fewshot prompting and then create the step captions using the same LLM with our meta prompting.
There is no big difference between our first 4 runs. So we assign our first 4 runs an arbitrary order except for the 5th run.
3
GPT_meta_prompt (paper)DoshishaUzlDfki
GPT-4o
Firstly we generate the step segment timestamps using the GPT-4o API with our meta prompting and then create the step captions using the same LLM with our meta prompting.
There is no big difference between our first 4 runs. So we assign our first 4 runs an arbitrary order except for the 5th run.
4
CoSeg_meta_prompt (paper)DoshishaUzlDfki
Mistral_large_latest
Firstly we generate the step segment timestamps using our trained CoSeg model and then create the step captions using the Mistral-large-latest API with our meta prompting.
We assign it the 5th run since we think its performance is worse than our first 4 runs.
5 (lowest priority)