The Thirty-Third Text REtrieval Conference
(TREC 2024)

Adhoc Video Search Progress task Appendix

RuntagOrgIs this run manual or automatic?Describe the retrieval model used.Describe any external resources used.
SoftbankMeisei - Progress Run 1 (paper)softbank-meisei
automatic
Pretrained embedding models mostly from OpenCLIP
LLM (GPT) Image Generation (Stable Diffusion)
SoftbankMeisei - Progress Run 2 (paper)softbank-meisei
automatic
Pretrained embedding models mostly from OpenCLIP
LLM (GPT) Image Generation (Stable Diffusion)
SoftbankMeisei - Progress Run 3 (paper)softbank-meisei
automatic
Pretrained embedding models mostly from OpenCLIP
LLM (GPT) Image Generation (Stable Diffusion)
SoftbankMeisei - Progress Run 4 (paper)softbank-meisei
automatic
Pretrained embedding models mostly from OpenCLIP
LLM (GPT) Image Generation (Stable Diffusion)
Expansion_Fusion_Reranking_Progress NII_UIT
automatic
InternVL-G BEiT-3 (COCO) OpenCLIP-L/14 DataComp OpenCLIP-H/14 Laion2B OpenCLIP-H/14 DFN5b OpenAI RN101 BLIP-2 (COCO) XCLIP VILA-1.5
We used pre-trained of all models
Expansion_Fusion_Rerank_Auto_Decompose_P NII_UIT
automatic
InternVL-G BEiT-3 (COCO) OpenCLIP-L/14 DataComp OpenCLIP-H/14 Laion2B OpenCLIP-H/14 DFN5b OpenAI RN101 BLIP-2 (COCO) XCLIP VILA-1.5
We used pre-trained of all models
Expansion_Fusion_Rerank_Manual_Decompose_P NII_UIT
manual
InternVL-G BEiT-3 (COCO) OpenCLIP-L/14 DataComp OpenCLIP-H/14 Laion2B OpenCLIP-H/14 DFN5b OpenAI RN101 BLIP-2 (COCO) XCLIP VILA-1.5
We used pre-trained of all models
certh.iti.avs.24.progress.run.1 (paper)CERTH-ITI
automatic
A trainable network has been taught to combine text and video similarities from various cross-modal networks. The similarities have been normalized, considering the queries from 2022, 2023, and 2024.
The model has been trained on MSR-VTT, TGIF, VATEX, ActivityNet Captions
certh.iti.avs.24.progress.run.2 (paper)CERTH-ITI
automatic
A trainable network has been taught to combine text and video similarities from various cross-modal networks. The similarities have been normalized, considering only this year's queries.
The model has been trained on MSR-VTT, TGIF, VATEX, ActivityNet Captions
certh.iti.avs.24.progress.run.3 (paper)CERTH-ITI
automatic
A trainable network has been taught to combine text and video similarities from various cross-modal networks. No normalization of the similarities was performed.
The model has been trained on MSR-VTT, TGIF, VATEX, ActivityNet Captions
Expan_Fu_Rerank_M_Decompose_P_CRerank NII_UIT
manual
InternVL-G BEiT-3 (COCO) OpenCLIP-L/14 DataComp OpenCLIP-H/14 Laion2B OpenCLIP-H/14 DFN5b OpenAI RN101 BLIP-2 (COCO) XCLIP VILA-1.5
We used pre-trained of all models
rucmm_avs_P_run1 RUCMM
automatic
An average ensemble of 7 LAFF models trained on ChinaOpen-100k, V3C1-PC, and TGIF-MSVDTT10K-VATEX. Models are selected based on infAP and Spearman’s coefficient.
None.
rucmm_avs_P_run3 RUCMM
automatic
An ensemble of 6 LAFF models trained on ChinaOpen-100k,v3c1-pc and tgif-msrvtt10k-vatex. The models' weights are learned with gradient descent and greedy search to maximize infAP of a mixed version of TV22 and TV23.
None.
rucmm_avs_P_run2 RUCMM
automatic
A LAFF model that maximizes its performance on TV22-23 with CLIP-ViT-L-14/336px + blip-base + CLIP-ViT-B-32 as its text features, CLIP-ViT-L-14/336px + blip-base +CLIP-ViT-B-32 +irCSN+ beit +wsl+ video-LLaMA + DINOv2 as video features. It is pre-trained on V3C1-PC and fine-tuned on TGIF-MSVDTT10K-VATEX, with reranking by BLIP-2 and OpenCLIP and an open-vocabulary detection model YOLO8x-worldv2.
None.
rucmm_avs_P_run4 RUCMM
automatic
A LAFF model that maximizes its performance on TV22-23 with CLIP-ViT-L-14/336px + blip-base + CLIP-ViT-B-32 as its text features, CLIP-ViT-L-14/336px + blip-base +CLIP-ViT-B-32 +irCSN+ beit +wsl+ video-LLaMA + DINOv2 as video features. It is pre-trained on V3C1-PC and fine-tuned on TGIF-MSVDTT10K-VATEX, with reranking by BLIP-2 and OpenCLIP.
None.
Fusion_Query_No_Reranking_P NII_UIT
automatic
InternVL-G BEiT-3 OpenCLIP-L/14 DataComp OpenCLIP-H/14 Laion2B OpenCLIP-H/14 DFN5b OpenAI RN101 BLIP-2 (COCO) XCLIP
We used pre-trained of all models
Expansion_Fusion_Rerank_Auto_Decompose_P_Pij NII_UIT
automatic
InternVL-G BEiT-3 OpenCLIP-L/14 DataComp OpenCLIP-H/14 Laion2B OpenCLIP-H/14 DFN5b OpenAI RN101 BLIP-2 (COCO) XCLIP VILA-1.5
We used pre-trained of all models
PolySmartAndVIREO_progress_run4 PolySmart
automatic
original query, progress run
Improved-ITV model
progress_manual_run4 PolySmart
manual
progress manual run 4
Improved_ITV model
PolySmartAndVIREO_progressrun_manual_run3 PolySmart
manual
Manual select genImg
Improved-ITV feature
PolySmartAndVIREO_progressrun_manual_run2 PolySmart
manual
run2
run2