TREC 2025 Proceedings
clap
Submission Details
- Organization
- ncsu-las
- Track
- Adhoc Video Search
- Task
- Video Search Task
- Date
- 2025-07-28
Run Description
- Is this run manual or automatic?
- automatic
- Describe the retrieval model used.
- gpt-4.1-mini decomposes query into visual and (non-speech) audio components. Visual component is searched using SigLIP2-base-patch16-naflex embeddings and audio component is searched on CLAP embeddings. The normalized scores from both search techniques are added together for the final ranking. If the LLM decided there was no audio component, then only the SigLIP2 embeddings are used.
- Describe any external resources used.
- gpt-4.1-mini uses for decomposing query, SigLIP2-base-patch16-naflex embeddings used for visual search, and LAION-AI/CLAP used for audio sound search.
- Training type:
- D
Evaluation Files
Paper