Runtag | Org | Is this run manual or automatic? | Describe the retrieval model used. | Describe any external resources used. |
---|---|---|---|---|
SoftbankMeisei - Progress Run 1 (paper) | softbank-meisei | automatic | Pretrained embedding models mostly from OpenCLIP | LLM (GPT)
Image Generation (Stable Diffusion) |
SoftbankMeisei - Progress Run 2 (paper) | softbank-meisei | automatic | Pretrained embedding models mostly from OpenCLIP | LLM (GPT)
Image Generation (Stable Diffusion) |
SoftbankMeisei - Progress Run 3 (paper) | softbank-meisei | automatic | Pretrained embedding models mostly from OpenCLIP | LLM (GPT)
Image Generation (Stable Diffusion) |
SoftbankMeisei - Progress Run 4 (paper) | softbank-meisei | automatic | Pretrained embedding models mostly from OpenCLIP | LLM (GPT)
Image Generation (Stable Diffusion) |
Expansion_Fusion_Reranking_Progress | NII_UIT | automatic | InternVL-G
BEiT-3 (COCO)
OpenCLIP-L/14 DataComp
OpenCLIP-H/14 Laion2B
OpenCLIP-H/14 DFN5b
OpenAI RN101
BLIP-2 (COCO)
XCLIP
VILA-1.5 | We used pre-trained of all models |
Expansion_Fusion_Rerank_Auto_Decompose_P | NII_UIT | automatic | InternVL-G
BEiT-3 (COCO)
OpenCLIP-L/14 DataComp
OpenCLIP-H/14 Laion2B
OpenCLIP-H/14 DFN5b
OpenAI RN101
BLIP-2 (COCO)
XCLIP
VILA-1.5 | We used pre-trained of all models |
Expansion_Fusion_Rerank_Manual_Decompose_P | NII_UIT | manual | InternVL-G
BEiT-3 (COCO)
OpenCLIP-L/14 DataComp
OpenCLIP-H/14 Laion2B
OpenCLIP-H/14 DFN5b
OpenAI RN101
BLIP-2 (COCO)
XCLIP
VILA-1.5 | We used pre-trained of all models |
certh.iti.avs.24.progress.run.1 (paper) | CERTH-ITI | automatic | A trainable network has been taught to combine text and video similarities from various cross-modal networks. The similarities have been normalized, considering the queries from 2022, 2023, and 2024. | The model has been trained on MSR-VTT, TGIF, VATEX, ActivityNet Captions |
certh.iti.avs.24.progress.run.2 (paper) | CERTH-ITI | automatic | A trainable network has been taught to combine text and video similarities from various cross-modal networks. The similarities have been normalized, considering only this year's queries. | The model has been trained on MSR-VTT, TGIF, VATEX, ActivityNet Captions |
certh.iti.avs.24.progress.run.3 (paper) | CERTH-ITI | automatic | A trainable network has been taught to combine text and video similarities from various cross-modal networks. No normalization of the similarities was performed. | The model has been trained on MSR-VTT, TGIF, VATEX, ActivityNet Captions |
Expan_Fu_Rerank_M_Decompose_P_CRerank | NII_UIT | manual | InternVL-G
BEiT-3 (COCO)
OpenCLIP-L/14 DataComp
OpenCLIP-H/14 Laion2B
OpenCLIP-H/14 DFN5b
OpenAI RN101
BLIP-2 (COCO)
XCLIP
VILA-1.5 | We used pre-trained of all models |
rucmm_avs_P_run1 | RUCMM | automatic | An average ensemble of 7 LAFF models trained on ChinaOpen-100k, V3C1-PC, and TGIF-MSVDTT10K-VATEX. Models are selected based on infAP and Spearman’s coefficient. | None. |
rucmm_avs_P_run3 | RUCMM | automatic | An ensemble of 6 LAFF models trained on ChinaOpen-100k,v3c1-pc and tgif-msrvtt10k-vatex. The models' weights are learned with gradient descent and greedy search to maximize infAP of a mixed version of TV22 and TV23. | None. |
rucmm_avs_P_run2 | RUCMM | automatic | A LAFF model that maximizes its performance on TV22-23 with CLIP-ViT-L-14/336px + blip-base + CLIP-ViT-B-32 as its text features, CLIP-ViT-L-14/336px + blip-base +CLIP-ViT-B-32 +irCSN+ beit +wsl+ video-LLaMA + DINOv2 as video features. It is pre-trained on V3C1-PC and fine-tuned on TGIF-MSVDTT10K-VATEX, with reranking by BLIP-2 and OpenCLIP and an open-vocabulary detection model YOLO8x-worldv2. | None. |
rucmm_avs_P_run4 | RUCMM | automatic | A LAFF model that maximizes its performance on TV22-23 with CLIP-ViT-L-14/336px + blip-base + CLIP-ViT-B-32 as its text features, CLIP-ViT-L-14/336px + blip-base +CLIP-ViT-B-32 +irCSN+ beit +wsl+ video-LLaMA + DINOv2 as video features. It is pre-trained on V3C1-PC and fine-tuned on TGIF-MSVDTT10K-VATEX, with reranking by BLIP-2 and OpenCLIP. | None. |
Fusion_Query_No_Reranking_P | NII_UIT | automatic | InternVL-G
BEiT-3
OpenCLIP-L/14 DataComp
OpenCLIP-H/14 Laion2B
OpenCLIP-H/14 DFN5b
OpenAI RN101
BLIP-2 (COCO)
XCLIP | We used pre-trained of all models |
Expansion_Fusion_Rerank_Auto_Decompose_P_Pij | NII_UIT | automatic | InternVL-G
BEiT-3
OpenCLIP-L/14 DataComp
OpenCLIP-H/14 Laion2B
OpenCLIP-H/14 DFN5b
OpenAI RN101
BLIP-2 (COCO)
XCLIP
VILA-1.5 | We used pre-trained of all models |
PolySmartAndVIREO_progress_run4 | PolySmart | automatic | original query, progress run | Improved-ITV model |
progress_manual_run4 | PolySmart | manual | progress manual run 4 | Improved_ITV model |
PolySmartAndVIREO_progressrun_manual_run3 | PolySmart | manual | Manual select genImg | Improved-ITV feature |
PolySmartAndVIREO_progressrun_manual_run2 | PolySmart | manual | run2 | run2 |