TREC 2025 Proceedings

clap

Submission Details

Organization
ncsu-las
Track
Adhoc Video Search
Task
Video Search Task
Date
2025-07-28

Run Description

Is this run manual or automatic?
automatic
Describe the retrieval model used.
gpt-4.1-mini decomposes query into visual and (non-speech) audio components. Visual component is searched using SigLIP2-base-patch16-naflex embeddings and audio component is searched on CLAP embeddings. The normalized scores from both search techniques are added together for the final ranking. If the LLM decided there was no audio component, then only the SigLIP2 embeddings are used.
Describe any external resources used.
gpt-4.1-mini uses for decomposing query, SigLIP2-base-patch16-naflex embeddings used for visual search, and LAION-AI/CLAP used for audio sound search.
Training type:
D

Evaluation Files

Paper