TREC 2025 Proceedings

fg-clip

Submission Details

Organization
ncsu-las
Track
Adhoc Video Search
Task
Video Search Task
Date
2025-07-28

Run Description

Is this run manual or automatic?
automatic
Describe the retrieval model used.
This run uses FG-CLIP embeddings to retrieve the most relevant keyframes. FG-CLIP is a fine-tuned version of OpenAI's clip-vit-base-patch32, trained on V3C1 keyframes with captions generated by Phi-3-Vision. The fine-tune training used a modified loss function for fine-grain token level comparison.
Describe any external resources used.
The search uses embeddings from clip-vit-base-patch32 fine-tuned on V3C1 keyframes. The captions for training with V3C1 were generated by phi-3-vision.
Training type:
A

Evaluation Files

Paper