fg-clip — Video Search Task

Is this run manual or automatic?: automatic
Describe the retrieval model used.: This run uses FG-CLIP embeddings to retrieve the most relevant keyframes. FG-CLIP is a fine-tuned version of OpenAI's clip-vit-base-patch32, trained on V3C1 keyframes with captions generated by Phi-3-Vision. The fine-tune training used a modified loss function for fine-grain token level comparison.
Describe any external resources used.: The search uses embeddings from clip-vit-base-patch32 fine-tuned on V3C1 keyframes. The captions for training with V3C1 were generated by phi-3-vision.
Training type:: A