TREC 2024 (33rd Text REtrieval Conference)

Runtag	Org	Is this a manual (human intervention) or automatic run?	Does this run leverage neural networks?	Does this run leverage proprietary models in any step of the retrieval pipeline?	Does this run leverage open-weight LLMs (> 5B parameters) in any step of the retrieval pipeline?	Does this run leverage smaller open-weight language models in any step of the retrieval pipeline?	Was this run padded with results from a baseline run?	What would you categorize this run as?	Please provide a short description of this run	Please give this run a priority for inclusion in manual assessments.
fs4_bm25+rocchio_snowael_snowaem_gtel+monot5_rrf+rz_rrf.rag24.test (trec_eval) (llm_eval) (paper)	coordinators	automatic	yes	no	yes	yes	no	Multi-Stage Pip	First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RRF(Second Stage, RankZephyr)	1 (top)
neu (trec_eval) (llm_eval)	neu	automatic	yes	yes	no	yes	no	Learned Dense Only	We use a retriever we trained for retrieval. This is a retriever finetuned based on minicpm2.4b.	2
neurerank (trec_eval) (llm_eval)	neu	automatic	yes	yes	no	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	We first use a retriever we trained to retrieve, and then rerank based on a pairwise reranker. The retriever is fine-tuned based on minicpm, and the reranker selects bge-reranker-v2-minicpm-layerwise	1 (top)
rtask-bm25-colbert_faiss (trec_eval) (llm_eval) (paper)	softbank-meisei	automatic	yes	no	no	yes	no	Ensemble/Fusion of First Stages	Retrieval process of this run is as follows: 1. Topic list preprocessing stage a. Used gpt4o to correct the grammar, spelling mistake and text incompletions b. Manual checking to make sure there are no errors still existing 2. BM25 to retrieve the top-100 segments 3. Vector embeddings generation stage a. Used castorini/tct_colbert-v2-hnp-msmarco to generate embeddings for the segment corpus. b. Used faiss indexing to create index at document level (containing segment embeddings). c. Used castorini/tct_colbert-v2-msmarco-cqe to generate embeddings for the prerpocessed topics. 4. For each topic, filtered the set of documents to search for based on the bm25 top-100 retrieval results. 5. Retrieve top-100 segments from each filtered document for the query. 6. Group all set of retrieved segments and sort in descending order 7. Top-100 from the sorted list is submitted as the result	1 (top)
rtask-bm25-rank_zephyr (trec_eval) (llm_eval) (paper)	softbank-meisei	automatic	yes	yes	no	no	no	Ensemble/Fusion of First Stages	The retrieval process of this run is as follows: 1. Topic list preprocessing stage: a. Used GPT4o to preprocess the query in order to correct the grammar, spelling errors and text incompletion b. Manual checking of all 301 queries to correct any errors that still exist 2. BM25 to retrieve the relevant top-100 segments 3. Rank zephyr to rerank the retrieved top-100 segments	2
LAS_ENN_T5_RERANKED_MXBAI (trec_eval) (llm_eval) (paper)	ncsu-las	automatic	yes	no	no	no	no	Learned Dense Only	t5 exact nearest neigbors reranked by mxbai	3
sim_and_rerank_v1 (trec_eval) (llm_eval)	KML	manual	yes	yes	no	no	yes	Multi-Stage Pipeline pointwise+pair/listwise	cohere embeddings + re-rank	1 (top)
monster (trec_eval) (llm_eval) (paper)	WaterlooClarke	automatic	yes	yes	yes	yes	yes	Generation-in-the-loop Pipeline	RRF of uwc1+uwc2	1 (top)
uwc1 (trec_eval) (llm_eval) (paper)	WaterlooClarke	automatic	yes	yes	yes	yes	no	Generation-in-the-loop Pipeline	Runs contributing to uwc0 were pooled to depth 20 and relevance judged by GPT-4o (both graded and preferences). Results were ranked based on these judgments with uwc0 use to break ties and pad runs.	2
uwc2 (trec_eval) (llm_eval) (paper)	WaterlooClarke	automatic	yes	yes	yes	yes	yes	Generation-in-the-loop Pipeline	Top-25 documents of the track baseline run were judged by GPT-4o on a graded scale and then re-ranked with the grade forming the primary key and the original score as the secondary key.	3
uwc0 (trec_eval) (llm_eval) (paper)	WaterlooClarke	automatic	yes	yes	yes	yes	no	Generation-in-the-loop Pipeline	RRF of 15 different runs with 6 different ranking stacks, starting from two different query sets (original and query2doc expanded), along with optional final re-rankings	4
uwcCQAR (trec_eval) (llm_eval) (paper)	WaterlooClarke	automatic	yes	yes	yes	yes	no	Generation-in-the-loop Pipeline	uwcCQ with query2doc expansion and re-ranking	5
uwcCQA (trec_eval) (llm_eval) (paper)	WaterlooClarke	automatic	yes	yes	no	no	no	Generation-in-the-loop Pipeline	uwcCQ with query2doc expansion	6
uwcCQR (trec_eval) (llm_eval) (paper)	WaterlooClarke	automatic	yes	yes	yes	yes	no	Generation-in-the-loop Pipeline	uwcCQ with re-ranking	7
uwcCQ (trec_eval) (llm_eval) (paper)	WaterlooClarke	automatic	yes	yes	no	no	no	Learned Dense Only	Cohere baseline	8
uwcBA (trec_eval) (llm_eval) (paper)	WaterlooClarke	automatic	yes	yes	no	no	no	Generation-in-the-loop Pipeline	BM25 with query2doc queries	9
uwcBQ (trec_eval) (llm_eval) (paper)	WaterlooClarke	automatic	no	no	no	no	no	Traditional Only	BM25 with PRF	10 (bottom)
LAS-splade-mxbai-rrf (trec_eval) (llm_eval) (paper)	ncsu-las	automatic	yes	yes	no	no	no	Generation-in-the-loop Pipeline	Topic decomposition with GPT4o, SPLADE, rerank with mxbai sentence transformer, RRF to consolidate topic+subtopic results for final ranking	1 (top)
LAS-splade-mxbai (trec_eval) (llm_eval) (paper)	ncsu-las	automatic	yes	no	no	no	no	Multi-Stage Pipeline pointwise	SPLADE retrieval plus mxbai embedding rerank	2
grill_fine_grained_rel_2_full_doc_cohere (trec_eval) (llm_eval)	grilllab	automatic	yes	yes	no	yes	no	Multi-Stage Pipeline pointwise	This run uses Cohere embeddings for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. The top-scoring relevant passages are then used to retrieve similar documents (using cohere embeddings), which are pooled together with the relevant ones. Finally, a MonoT5 reranker is applied to re-rank the combined passages set.	1 (top)
grill_fine_grained_rel_2_full_doc (trec_eval) (llm_eval)	grilllab	automatic	yes	yes	no	yes	no	Multi-Stage Pipeline pointwise	This run uses BM25 for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. The top-scoring relevant passages are then used to retrieve similar documents (using BM25), which are pooled together with the relevant ones. Finally, a MonoT5 reranker is applied to re-rank the combined passages set.	1 (top)
grill_fine_grained_summaries (trec_eval) (llm_eval)	grilllab	automatic	yes	yes	no	yes	no	Multi-Stage Pipeline pointwise	This run starts with BM25 for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. GPT4o-mini then generates concise, query-relevant summaries for each top-scoring passage. These summaries are used to retrieve similar documents via BM25. The combined set of retrieved passages is then pooled and re-ranked using a MonoT5 reranker.	2
grill_fine_grained_rel_2_summaries_cohere (trec_eval) (llm_eval)	grilllab	automatic	yes	yes	no	yes	no	Multi-Stage Pipeline pointwise	This run uses Cohere embeddings for initial retrieval, followed by GPT4o-mini to filter out irrelevant passages. GPT4o-mini then generates concise, query-relevant summaries for each top-scoring passage. These summaries are used to retrieve similar documents via cohere embeddings. The combined set of retrieved passages is then pooled and re-ranked using a MonoT5 reranker.	2
grill_fine_grained_rel_2_summaries_rrf_bm25_cohere (trec_eval) (llm_eval)	grilllab	automatic	yes	yes	no	yes	no	Ensemble/Fusion of First Stages	This run fuses the retrieval for BM25 and Cohere after using GPT4o-mini to filter out irrelevant results from both ranked lists and similar documents found with a subset of the relevant results. The fused results are reranked with MonoT5.	1 (top)
LAS_enn_t5 (trec_eval) (llm_eval) (paper)	ncsu-las	automatic	yes	no	no	no	no	Learned Dense Only	exact nearest neighbor search with sentence t5xxl	5
LAS_ann_t5_qdrant (trec_eval) (llm_eval) (paper)	ncsu-las	automatic	yes	no	no	no	no	Learned Dense Only	approximate nearest neighbor search on t5 embeddings via qdrant	4
sim_and_rerank_200_docs (trec_eval) (llm_eval)	KML	automatic	no	yes	no	no	no	Multi-Stage Pipeline pointwise+pair/listwise	cohere embeddings with 200 docs retrieval and rerank for 100 docs.	1 (top)
ASCITI_co_gpt (trec_eval) (llm_eval)	citi	automatic	yes	yes	no	no	no	Multi-Stage Pipeline pointwise+pair/listwise	This run involves using the Cohere embedding model to encode documents and queries and get the retrieval results, then reranking the top 100 results using an LLM(GPT-3.5 turbo).	1 (top)
ASCITI_co_bge (trec_eval) (llm_eval)	citi	automatic	yes	yes	no	yes	no	Multi-Stage Pipeline pointwise	This run involves using Cohere model to retrieve documents and then applying a bge reranker, which was downloaded from Hugging Face, to rerank the results.	3
ASCITI_co_co (trec_eval) (llm_eval)	citi	automatic	yes	yes	no	no	no	Multi-Stage Pipeline pointwise	This run involves using Cohere's embedding to retrieve the documents and then using Cohere's reranker to rerank these results.	2
weaviate_dense_base (trec_eval) (llm_eval)	buw	automatic	yes	no	no	yes	no	Learned Dense Only	This is a baseline dense retrieval pipeline which performed considerably well. The segments were simply vectorized using the "multi-qa-MiniLM-L6-cos-v1" embedding model at FP16 precision into a local sharded weaviate instance. And the retrieval is based on cosine similarity between the query's embedding and the segments'.	1 (top)
zeph_test_rag_rrf_expand_Rtask (trec_eval) (llm_eval)	IITD-IRL	automatic	yes	no	no	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	The pipeline consists of three stages. The first stage leverages BM25 combined with dense retrieval. The second stage employs the Stella model for reranking, and the final stage uses Zepher for list-wise sorting. Before dense retrieval, the query is expanded to generate a small passage on the central theme. Retrieval is then performed using both the raw query and the generated paragraph. RRF is performed at the first stage	1 (top)
zeph_test_rag_rrf_raw_query_Rtrack (trec_eval) (llm_eval)	IITD-IRL	automatic	yes	no	no	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	It is a three stage pipeline, first stage leverages BM25+dense retrieval. second stage uses Stella model for reranking and final step uses Zepher to generate list wise sorting. First stage retrieval is performed using raw query.RRF is applied on the first stage itself.	2
zeph_test_rag24_doc_query_expansion+rrf_Rtask (trec_eval) (llm_eval)	IITD-IRL	automatic	yes	no	no	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	The pipeline consists of three stages. The first stage leverages BM25 combined with dense retrieval. The second stage employs the Stella model for reranking, and the final stage uses Zepher for list-wise sorting. Before dense retrieval, the query is expanded to generate a small passage on the central theme. Retrieval is then performed using both the raw query and the generated paragraph. RRF is performed at the first stage	3
qdrant_bge_small (trec_eval) (llm_eval)	SGU	manual	yes	yes	no	yes	yes	Traditional Only	Due to hardware issues and limitations, we only used 80% of the organizer's data, we used Qdrant (cosine) as a storage, and used the public bge embedding model.	1 (top)
SpladeV3_only (trec_eval) (llm_eval)	TLSE3	manual	yes	no	no	yes	no	Learned Sparse Only	Simple SPLADE sparse retrieval, no quantization, used Qdrant	5
SPLADE+Jina (trec_eval) (llm_eval)	TLSE3	manual	yes	no	no	yes	no	Multi-Stage Pipeline pointwise	SPLADE first stage with jina-reranker-v2-base-multilingual reranker	4
SPLADE+BGEv2m3 (trec_eval) (llm_eval)	TLSE3	manual	yes	no	no	yes	no	Multi-Stage Pipeline pointwise	SPLADE first stage then BAAI/bge-reranker-v2-m3 reranker	3
UDInfolab.bge (trec_eval) (llm_eval)	InfoLab	manual	yes	no	yes	yes	no	Learned Dense Only	This run uses BGE	3
SPLADE+Gemini (trec_eval) (llm_eval)	TLSE3	manual	yes	yes	no	yes	no	Generation-in-the-loop Pipeline	SPLADE first stage + RankGPT with Gemini, one context window, RankGPT almost unmodified besides Gemini support.	1 (top)
webis-01 (trec_eval) (llm_eval) (paper)	webis	automatic	yes	yes	yes	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	We use multiple systems to create a re-ranking pool for MonoT5 and MonoElectra that are subsequently fused and re-ranked with RankZephyr. The re-ranking pool was created by fusing the results of traditional retrieval systems with a learned dense model and automatically created boolean query variants retrieved against traditional retrieval systems and additionally enriched by corpus graph retrieval. For the traditional retrieval, we submitted the original queries against Anserini (BM25, INL2, QLD) and ChatNoir (BM25F with a boost for Wikipedia). For the dense retrieval, we used weaviate. We created boolean query variants by using GPT-4o-mini and Llama3.1 by first extracting potential aspects of the query and subsequently generating boolean queries with the LLMs to capture those aspects, the boolean queries were retrieved against ChatNoir. We did re-rank the pools with monoT5-3b and MonoElectra, and used the top-results for-adaptive re-ranking against ChatNoir (i.e., the corpus graph concept). The top-100 monoT5 and monoElectra documents were re-ranked with RankZephyr yielding two runs that we fused with reciprocal rank fusion. On this run, we again re-ranked the top-100 results with RankZephyr, using cascading re-ranking (i.e., re-rank the results of RankZephyr multiple times, we stopped after three iterations). For retrieval, we used the segment, headings, and titles as text. For re-ranking (i.e., with MonoT5, MonoElectra, and RankZephyr), we used only the segment text, i.e., not the title and headings.	1 (top)
UDInfolab.bge.AnsAi (trec_eval) (llm_eval)	InfoLab	manual	yes	yes	no	yes	no	Multi-Stage Pipeline pointwise	Implementation using doc2query with BGE	1 (top)
UDInfolab.bge.query (trec_eval) (llm_eval)	InfoLab	manual	yes	yes	no	yes	no	Multi-Stage Pipeline pointwise	Implementation rewriting the query with BGE	2
UDInfolab.bge.ranker (trec_eval) (llm_eval)	InfoLab	manual	yes	no	no	yes	no	Ensemble/Fusion of First Stages	Implementation reraking	3
UDInfolab.bm25.ro.tuned (trec_eval) (llm_eval)	InfoLab	manual	no	no	no	no	no	Traditional Only	BM25+Roccio Tuned	5
UDInfolab.bm25.ro (trec_eval) (llm_eval)	InfoLab	manual	no	no	no	no	no	Traditional Only	BM25+Rocchio	6
UDInfolab.bm25 (trec_eval) (llm_eval)	InfoLab	manual	no	no	no	no	no	Traditional Only	BM25	7
webis-02 (trec_eval) (llm_eval) (paper)	webis	automatic	yes	yes	yes	yes	yes	Multi-Stage Pipeline pointwise+pair/listwise	This run aims to increase the recall base, therefore, the run only consists of documents that are not retrieved within the top-1000 of BM25, QLD, INL2 as implemented in Anserini, BM25F as implemented in ChatNoir, and the top-1000 of our weaviate implementation (dense retrieval). The documents were retrieved via adaptive re-ranking (i.e., the corpus graph) of the top results of RankZephyr and our boolean query formulation as used in the run webis-01). To not waste judgment budget, we only include documents that make it into the top-75 of our webis-01 run (that incorporated cascading re-ranking). For some topics that did not retrieve new documents we pad with the baseline.	2
webis-03 (trec_eval) (llm_eval) (paper)	webis	automatic	yes	yes	yes	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	This is our run webis-01 but diversified so that each segment is removed for which a neighbouring segment was already retrieved. This aims to ensure that an LLM (for the retrieval augmented generation) sees more diverse retrieval content.	3
webis-04 (trec_eval) (llm_eval) (paper)	webis	automatic	yes	yes	yes	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	This is our run webis-01 but diversified so that per page only the top-segment retrieved. This aims to ensure that an LLM (for the retrieval augmented generation) sees more diverse retrieval content.	10 (bottom)
ldilab_repllama_listt5_pass2 (trec_eval) (llm_eval)	ldisnu	manual	yes	no	yes	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass)	3
ldilab_repllama_listt5_pass1 (trec_eval) (llm_eval)	ldisnu	manual	yes	no	yes	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass)	4
iiia_standard (trec_eval) (llm_eval)	IIIA-UNIPD	automatic	yes	no	no	yes	no	Learned Dense Only	This run was generated using sentence-transformers/msmarco-distilbert-base-tas-b	10 (bottom)
iiia_dedup (trec_eval) (llm_eval)	IIIA-UNIPD	automatic	yes	no	no	yes	no	Learned Dense Only	This run was created using sentence-transformers/msmarco-distilbert-base-tas-b and filtering the duplicated documents	10 (bottom)
dense_on_sparse (trec_eval) (llm_eval)	buw	automatic	yes	no	no	no	no	Ensemble/Fusion of First Stages	This run includes hybrid retrieval results based on baseline Pyserini retrieval. Basically, top 1000 segments per query were retrieved using Pyserini indices and these were vectorized with "multi-qa-MiniLM-L6-cos-v1" to retrieve top 100 using hybrid search.	1 (top)
webis-05 (trec_eval) (llm_eval) (paper)	webis	automatic	yes	yes	yes	yes	no	Multi-Stage Pipeline pointwise	We use multiple systems to create a re-ranking pool for MonoElectra. The re-ranking pool was created by fusing the results of traditional retrieval systems with a learned dense model and automatically created boolean query variants retrieved against traditional retrieval systems and additionally enriched by corpus graph retrieval. For the traditional retrieval, we submitted the original queries against Anserini (BM25, INL2, QLD) and ChatNoir (BM25F with a boost for Wikipedia). For the dense retrieval, we used weaviate. We created boolean query variants by using GPT-4o-mini and Llama3.1 by first extracting potential aspects of the query and subsequently generating boolean queries with the LLMs to capture those aspects, the boolean queries were retrieved against ChatNoir. We did re-rank the pools with MonoElectra.	4
test.rag24.rrf.expanded.BM25.MiniLM (trec_eval) (llm_eval)	IITD-IRL	automatic	yes	no	no	yes	no	Multi-Stage Pipeline pointwise	This is 2 stage pipeline. With first stage is combination of BM25 + dense retrieval followed by reranking step. First step is performed with similar queries generated by small LLM. All retrieved passages are rank fused.	4
test.rag24.no.rrf.no.expansion (trec_eval) (llm_eval)	IITD-IRL	automatic	yes	no	no	yes	no	Multi-Stage Pipeline pointwise	This is 2 stage pipeline. With first stage is combination of dense retrieval followed by reranking step. First step is performed with the raw queries. This is the classical two stage pipeline with dense retrieval + reranking	6
anserini_bm25+rocchio.rag24.test_top100 (trec_eval) (llm_eval) (paper)	coordinators	automatic	no	no	no	no	no	Traditional Only	Anserini BM25 + Rocchio	4
anserini_bm25.rag24.test_top100 (trec_eval) (llm_eval) (paper)	coordinators	automatic	no	no	no	no	no	Traditional Only	Anserini BM25	5
fs4+monot5_rz.rag24.test_top100 (trec_eval) (llm_eval) (paper)	coordinators	automatic	yes	no	yes	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankZephyr	2
test.rag24.rrf.raw.query.MiniLM+BM25 (trec_eval) (llm_eval)	IITD-IRL	automatic	yes	no	no	yes	no	Multi-Stage Pipeline pointwise	This is 2 stage pipeline. With first stage is combination of BM25 + dense retrieval followed by reranking step. First step is performed with raw queries LLM. All retrieved passages are rank fused.	5
fs4+monot5.rag24.test_top100 (trec_eval) (llm_eval) (paper)	coordinators	automatic	yes	no	no	yes	no	Multi-Stage Pipeline pointwise	First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B)	3
fs4+monot5_listgalore.rag24.test_top100 (trec_eval) (llm_eval)	h2oloo	automatic	yes	yes	yes	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RRF(RankGPT4-o, RankLLaMA3.1-70B, RankZephyr)	1 (top)
ldilab_repllama_listt5_pass3 (trec_eval) (llm_eval)	ldisnu	manual	yes	no	yes	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass)	1 (top)
fs4+monot5_rg4o.rag24.test_top100 (trec_eval) (llm_eval)	h2oloo	automatic	yes	yes	no	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankGPT4o	2
ldilab_repllama_listt5_pass4 (trec_eval) (llm_eval)	ldisnu	manual	yes	no	yes	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass	2
fs4+monot5_rl31-70b.rag24.test_top100 (trec_eval) (llm_eval)	h2oloo	automatic	yes	no	yes	yes	no	Multi-Stage Pipeline pointwise+pair/listwise	First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankLLaMA3.1-70B	3
ielab-blender-llama70b-filtered (trec_eval) (llm_eval)	ielab	automatic	yes	no	yes	yes	no	Generation-in-the-loop Pipeline	Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator.	1 (top)
ielab-blender-llama70b (trec_eval) (llm_eval)	ielab	automatic	yes	no	yes	yes	no	Generation-in-the-loop Pipeline	Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 70B answer generation. No filter at the end.	2
ielab-blender-llama8b (trec_eval) (llm_eval)	ielab	automatic	yes	no	yes	yes	no	Generation-in-the-loop Pipeline	Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 8B answer generator. no filter at the end	4
ielab-blender-llama8b-filtered (trec_eval) (llm_eval)	ielab	automatic	yes	no	yes	yes	no	Generation-in-the-loop Pipeline	Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 8B answer generator.	3
ielab-bm25-stella-hybrid (trec_eval) (llm_eval)	ielab	automatic	yes	no	no	yes	no	Ensemble/Fusion of First Stages	BM25 + Stella dense retriever hybrid run	5
ielab-blender-llama70b-external-only (trec_eval) (llm_eval)	ielab	automatic	yes	no	yes	yes	no	Generation-in-the-loop Pipeline	The external knowledge (rag) component only of the Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator.	6
ielab-blender-llama70b-internal-only (trec_eval) (llm_eval)	ielab	automatic	yes	no	yes	yes	no	Generation-in-the-loop Pipeline	The internal knowledge (direct LLM answer generation) component only of the Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator.	7
ISIR-IRIT-GEN (trec_eval) (llm_eval)	IRIT	automatic	yes	no	no	no	no	Generation-in-the-loop Pipeline	I used the zephyr model for the generation of 4 sub-questions with fewshot or I use 4 examples then I use LuceneSearcher with the initial query and the sub-queries generated to do an individual search and I rerank with MonoT5 and I take the first 20 for each query then I concatenate all the documents and I rerank everything with MonoT5 and I take the first 100.	1 (top)
ISIR-IRIT-Vanilla (trec_eval) (llm_eval)	IRIT	automatic	yes	no	no	no	no	Multi-Stage Pipeline pointwise	with the basic search and LuceneSearcher I do a search for docs and then the 100 docs returned I use MonoT5 to re-rank the results	1 (top)

Runtag

Org

Is this a manual (human intervention) or automatic run?

Does this run leverage neural networks?

Does this run leverage proprietary models in any step of the retrieval pipeline?

Does this run leverage open-weight LLMs (> 5B parameters) in any step of the retrieval pipeline?

Does this run leverage smaller open-weight language models in any step of the retrieval pipeline?

Was this run padded with results from a baseline run?

What would you categorize this run as?

Please provide a short description of this run

Please give this run a priority for inclusion in manual assessments.

fs4_bm25+rocchio_snowael_snowaem_gtel+monot5_rrf+rz_rrf.rag24.test (trec_eval) (llm_eval) (paper)

coordinators

automatic

yes

Multi-Stage Pip

First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RRF(Second Stage, RankZephyr)

1 (top)

neu (trec_eval) (llm_eval)

neu

automatic

yes

Learned Dense Only

We use a retriever we trained for retrieval. This is a retriever finetuned based on minicpm2.4b.

neurerank (trec_eval) (llm_eval)

neu

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

We first use a retriever we trained to retrieve, and then rerank based on a pairwise reranker. The retriever is fine-tuned based on minicpm, and the reranker selects bge-reranker-v2-minicpm-layerwise

1 (top)

rtask-bm25-colbert_faiss (trec_eval) (llm_eval) (paper)

softbank-meisei

automatic

yes

Ensemble/Fusion of First Stages

Retrieval process of this run is as follows: 1. Topic list preprocessing stage a. Used gpt4o to correct the grammar, spelling mistake and text incompletions b. Manual checking to make sure there are no errors still existing 2. BM25 to retrieve the top-100 segments 3. Vector embeddings generation stage a. Used castorini/tct_colbert-v2-hnp-msmarco to generate embeddings for the segment corpus. b. Used faiss indexing to create index at document level (containing segment embeddings). c. Used castorini/tct_colbert-v2-msmarco-cqe to generate embeddings for the prerpocessed topics. 4. For each topic, filtered the set of documents to search for based on the bm25 top-100 retrieval results. 5. Retrieve top-100 segments from each filtered document for the query. 6. Group all set of retrieved segments and sort in descending order 7. Top-100 from the sorted list is submitted as the result

1 (top)

rtask-bm25-rank_zephyr (trec_eval) (llm_eval) (paper)

softbank-meisei

automatic

yes

Ensemble/Fusion of First Stages

The retrieval process of this run is as follows: 1. Topic list preprocessing stage: a. Used GPT4o to preprocess the query in order to correct the grammar, spelling errors and text incompletion b. Manual checking of all 301 queries to correct any errors that still exist 2. BM25 to retrieve the relevant top-100 segments 3. Rank zephyr to rerank the retrieved top-100 segments

LAS_ENN_T5_RERANKED_MXBAI (trec_eval) (llm_eval) (paper)

ncsu-las

automatic

yes

Learned Dense Only

t5 exact nearest neigbors reranked by mxbai

sim_and_rerank_v1 (trec_eval) (llm_eval)

KML

manual

yes

Multi-Stage Pipeline pointwise+pair/listwise

cohere embeddings + re-rank

1 (top)

monster (trec_eval) (llm_eval) (paper)

WaterlooClarke

automatic

yes

Generation-in-the-loop Pipeline

RRF of uwc1+uwc2

1 (top)

uwc1 (trec_eval) (llm_eval) (paper)

WaterlooClarke

automatic

yes

Generation-in-the-loop Pipeline

Runs contributing to uwc0 were pooled to depth 20 and relevance judged by GPT-4o (both graded and preferences). Results were ranked based on these judgments with uwc0 use to break ties and pad runs.

uwc2 (trec_eval) (llm_eval) (paper)

WaterlooClarke

automatic

yes

Generation-in-the-loop Pipeline

Top-25 documents of the track baseline run were judged by GPT-4o on a graded scale and then re-ranked with the grade forming the primary key and the original score as the secondary key.

uwc0 (trec_eval) (llm_eval) (paper)

WaterlooClarke

automatic

yes

Generation-in-the-loop Pipeline

RRF of 15 different runs with 6 different ranking stacks, starting from two different query sets (original and query2doc expanded), along with optional final re-rankings

uwcCQAR (trec_eval) (llm_eval) (paper)

WaterlooClarke

automatic

yes

Generation-in-the-loop Pipeline

uwcCQ with query2doc expansion and re-ranking

uwcCQA (trec_eval) (llm_eval) (paper)

WaterlooClarke

automatic

yes

Generation-in-the-loop Pipeline

uwcCQ with query2doc expansion

uwcCQR (trec_eval) (llm_eval) (paper)

WaterlooClarke

automatic

yes

Generation-in-the-loop Pipeline

uwcCQ with re-ranking

uwcCQ (trec_eval) (llm_eval) (paper)

WaterlooClarke

automatic

yes

Learned Dense Only

Cohere baseline

uwcBA (trec_eval) (llm_eval) (paper)

WaterlooClarke

automatic

yes

Generation-in-the-loop Pipeline

BM25 with query2doc queries

uwcBQ (trec_eval) (llm_eval) (paper)

WaterlooClarke

automatic

Traditional Only

BM25 with PRF

10 (bottom)

LAS-splade-mxbai-rrf (trec_eval) (llm_eval) (paper)

ncsu-las

automatic

yes

Generation-in-the-loop Pipeline

Topic decomposition with GPT4o, SPLADE, rerank with mxbai sentence transformer, RRF to consolidate topic+subtopic results for final ranking

1 (top)

LAS-splade-mxbai (trec_eval) (llm_eval) (paper)

ncsu-las

automatic

yes

Multi-Stage Pipeline pointwise

SPLADE retrieval plus mxbai embedding rerank

grill_fine_grained_rel_2_full_doc_cohere (trec_eval) (llm_eval)

grilllab

automatic

yes

Multi-Stage Pipeline pointwise

This run uses Cohere embeddings for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. The top-scoring relevant passages are then used to retrieve similar documents (using cohere embeddings), which are pooled together with the relevant ones. Finally, a MonoT5 reranker is applied to re-rank the combined passages set.

1 (top)

grill_fine_grained_rel_2_full_doc (trec_eval) (llm_eval)

grilllab

automatic

yes

Multi-Stage Pipeline pointwise

This run uses BM25 for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. The top-scoring relevant passages are then used to retrieve similar documents (using BM25), which are pooled together with the relevant ones. Finally, a MonoT5 reranker is applied to re-rank the combined passages set.

1 (top)

grill_fine_grained_summaries (trec_eval) (llm_eval)

grilllab

automatic

yes

Multi-Stage Pipeline pointwise

This run starts with BM25 for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. GPT4o-mini then generates concise, query-relevant summaries for each top-scoring passage. These summaries are used to retrieve similar documents via BM25. The combined set of retrieved passages is then pooled and re-ranked using a MonoT5 reranker.

grill_fine_grained_rel_2_summaries_cohere (trec_eval) (llm_eval)

grilllab

automatic

yes

Multi-Stage Pipeline pointwise

This run uses Cohere embeddings for initial retrieval, followed by GPT4o-mini to filter out irrelevant passages. GPT4o-mini then generates concise, query-relevant summaries for each top-scoring passage. These summaries are used to retrieve similar documents via cohere embeddings. The combined set of retrieved passages is then pooled and re-ranked using a MonoT5 reranker.

grill_fine_grained_rel_2_summaries_rrf_bm25_cohere (trec_eval) (llm_eval)

grilllab

automatic

yes

Ensemble/Fusion of First Stages

This run fuses the retrieval for BM25 and Cohere after using GPT4o-mini to filter out irrelevant results from both ranked lists and similar documents found with a subset of the relevant results. The fused results are reranked with MonoT5.

1 (top)

LAS_enn_t5 (trec_eval) (llm_eval) (paper)

ncsu-las

automatic

yes

Learned Dense Only

exact nearest neighbor search with sentence t5xxl

LAS_ann_t5_qdrant (trec_eval) (llm_eval) (paper)

ncsu-las

automatic

yes

Learned Dense Only

approximate nearest neighbor search on t5 embeddings via qdrant

sim_and_rerank_200_docs (trec_eval) (llm_eval)

KML

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

cohere embeddings with 200 docs retrieval and rerank for 100 docs.

1 (top)

ASCITI_co_gpt (trec_eval) (llm_eval)

citi

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

This run involves using the Cohere embedding model to encode documents and queries and get the retrieval results, then reranking the top 100 results using an LLM(GPT-3.5 turbo).

1 (top)

ASCITI_co_bge (trec_eval) (llm_eval)

citi

automatic

yes

Multi-Stage Pipeline pointwise

This run involves using Cohere model to retrieve documents and then applying a bge reranker, which was downloaded from Hugging Face, to rerank the results.

ASCITI_co_co (trec_eval) (llm_eval)

citi

automatic

yes

Multi-Stage Pipeline pointwise

This run involves using Cohere's embedding to retrieve the documents and then using Cohere's reranker to rerank these results.

weaviate_dense_base (trec_eval) (llm_eval)

buw

automatic

yes

Learned Dense Only

This is a baseline dense retrieval pipeline which performed considerably well. The segments were simply vectorized using the "multi-qa-MiniLM-L6-cos-v1" embedding model at FP16 precision into a local sharded weaviate instance. And the retrieval is based on cosine similarity between the query's embedding and the segments'.

1 (top)

zeph_test_rag_rrf_expand_Rtask (trec_eval) (llm_eval)

IITD-IRL

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

The pipeline consists of three stages. The first stage leverages BM25 combined with dense retrieval. The second stage employs the Stella model for reranking, and the final stage uses Zepher for list-wise sorting. Before dense retrieval, the query is expanded to generate a small passage on the central theme. Retrieval is then performed using both the raw query and the generated paragraph. RRF is performed at the first stage

1 (top)

zeph_test_rag_rrf_raw_query_Rtrack (trec_eval) (llm_eval)

IITD-IRL

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

It is a three stage pipeline, first stage leverages BM25+dense retrieval. second stage uses Stella model for reranking and final step uses Zepher to generate list wise sorting. First stage retrieval is performed using raw query.RRF is applied on the first stage itself.

zeph_test_rag24_doc_query_expansion+rrf_Rtask (trec_eval) (llm_eval)

IITD-IRL

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

qdrant_bge_small (trec_eval) (llm_eval)

SGU

manual

yes

Traditional Only

Due to hardware issues and limitations, we only used 80% of the organizer's data, we used Qdrant (cosine) as a storage, and used the public bge embedding model.

1 (top)

SpladeV3_only (trec_eval) (llm_eval)

TLSE3

manual

yes

Learned Sparse Only

Simple SPLADE sparse retrieval, no quantization, used Qdrant

SPLADE+Jina (trec_eval) (llm_eval)

TLSE3

manual

yes

Multi-Stage Pipeline pointwise

SPLADE first stage with jina-reranker-v2-base-multilingual reranker

SPLADE+BGEv2m3 (trec_eval) (llm_eval)

TLSE3

manual

yes

Multi-Stage Pipeline pointwise

SPLADE first stage then BAAI/bge-reranker-v2-m3 reranker

UDInfolab.bge (trec_eval) (llm_eval)

InfoLab

manual

yes

Learned Dense Only

This run uses BGE

SPLADE+Gemini (trec_eval) (llm_eval)

TLSE3

manual

yes

Generation-in-the-loop Pipeline

SPLADE first stage + RankGPT with Gemini, one context window, RankGPT almost unmodified besides Gemini support.

1 (top)

webis-01 (trec_eval) (llm_eval) (paper)

webis

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

We use multiple systems to create a re-ranking pool for MonoT5 and MonoElectra that are subsequently fused and re-ranked with RankZephyr. The re-ranking pool was created by fusing the results of traditional retrieval systems with a learned dense model and automatically created boolean query variants retrieved against traditional retrieval systems and additionally enriched by corpus graph retrieval. For the traditional retrieval, we submitted the original queries against Anserini (BM25, INL2, QLD) and ChatNoir (BM25F with a boost for Wikipedia). For the dense retrieval, we used weaviate. We created boolean query variants by using GPT-4o-mini and Llama3.1 by first extracting potential aspects of the query and subsequently generating boolean queries with the LLMs to capture those aspects, the boolean queries were retrieved against ChatNoir. We did re-rank the pools with monoT5-3b and MonoElectra, and used the top-results for-adaptive re-ranking against ChatNoir (i.e., the corpus graph concept). The top-100 monoT5 and monoElectra documents were re-ranked with RankZephyr yielding two runs that we fused with reciprocal rank fusion. On this run, we again re-ranked the top-100 results with RankZephyr, using cascading re-ranking (i.e., re-rank the results of RankZephyr multiple times, we stopped after three iterations). For retrieval, we used the segment, headings, and titles as text. For re-ranking (i.e., with MonoT5, MonoElectra, and RankZephyr), we used only the segment text, i.e., not the title and headings.

1 (top)

UDInfolab.bge.AnsAi (trec_eval) (llm_eval)

InfoLab

manual

yes

Multi-Stage Pipeline pointwise

Implementation using doc2query with BGE

1 (top)

UDInfolab.bge.query (trec_eval) (llm_eval)

InfoLab

manual

yes

Multi-Stage Pipeline pointwise

Implementation rewriting the query with BGE

UDInfolab.bge.ranker (trec_eval) (llm_eval)

InfoLab

manual

yes

Ensemble/Fusion of First Stages

Implementation reraking

UDInfolab.bm25.ro.tuned (trec_eval) (llm_eval)

InfoLab

manual

Traditional Only

BM25+Roccio Tuned

UDInfolab.bm25.ro (trec_eval) (llm_eval)

InfoLab

manual

Traditional Only

BM25+Rocchio

UDInfolab.bm25 (trec_eval) (llm_eval)

InfoLab

manual

Traditional Only

BM25

webis-02 (trec_eval) (llm_eval) (paper)

webis

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

This run aims to increase the recall base, therefore, the run only consists of documents that are not retrieved within the top-1000 of BM25, QLD, INL2 as implemented in Anserini, BM25F as implemented in ChatNoir, and the top-1000 of our weaviate implementation (dense retrieval). The documents were retrieved via adaptive re-ranking (i.e., the corpus graph) of the top results of RankZephyr and our boolean query formulation as used in the run webis-01). To not waste judgment budget, we only include documents that make it into the top-75 of our webis-01 run (that incorporated cascading re-ranking). For some topics that did not retrieve new documents we pad with the baseline.

webis-03 (trec_eval) (llm_eval) (paper)

webis

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

This is our run webis-01 but diversified so that each segment is removed for which a neighbouring segment was already retrieved. This aims to ensure that an LLM (for the retrieval augmented generation) sees more diverse retrieval content.

webis-04 (trec_eval) (llm_eval) (paper)

webis

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

This is our run webis-01 but diversified so that per page only the top-segment retrieved. This aims to ensure that an LLM (for the retrieval augmented generation) sees more diverse retrieval content.

10 (bottom)

ldilab_repllama_listt5_pass2 (trec_eval) (llm_eval)

ldisnu

manual

yes

Multi-Stage Pipeline pointwise+pair/listwise

We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass)

ldilab_repllama_listt5_pass1 (trec_eval) (llm_eval)

ldisnu

manual

yes

Multi-Stage Pipeline pointwise+pair/listwise

iiia_standard (trec_eval) (llm_eval)

IIIA-UNIPD

automatic

yes

Learned Dense Only

This run was generated using sentence-transformers/msmarco-distilbert-base-tas-b

10 (bottom)

iiia_dedup (trec_eval) (llm_eval)

IIIA-UNIPD

automatic

yes

Learned Dense Only

This run was created using sentence-transformers/msmarco-distilbert-base-tas-b and filtering the duplicated documents

10 (bottom)

dense_on_sparse (trec_eval) (llm_eval)

buw

automatic

yes

Ensemble/Fusion of First Stages

This run includes hybrid retrieval results based on baseline Pyserini retrieval. Basically, top 1000 segments per query were retrieved using Pyserini indices and these were vectorized with "multi-qa-MiniLM-L6-cos-v1" to retrieve top 100 using hybrid search.

1 (top)

webis-05 (trec_eval) (llm_eval) (paper)

webis

automatic

yes

Multi-Stage Pipeline pointwise

We use multiple systems to create a re-ranking pool for MonoElectra. The re-ranking pool was created by fusing the results of traditional retrieval systems with a learned dense model and automatically created boolean query variants retrieved against traditional retrieval systems and additionally enriched by corpus graph retrieval. For the traditional retrieval, we submitted the original queries against Anserini (BM25, INL2, QLD) and ChatNoir (BM25F with a boost for Wikipedia). For the dense retrieval, we used weaviate. We created boolean query variants by using GPT-4o-mini and Llama3.1 by first extracting potential aspects of the query and subsequently generating boolean queries with the LLMs to capture those aspects, the boolean queries were retrieved against ChatNoir. We did re-rank the pools with MonoElectra.

test.rag24.rrf.expanded.BM25.MiniLM (trec_eval) (llm_eval)

IITD-IRL

automatic

yes

Multi-Stage Pipeline pointwise

This is 2 stage pipeline. With first stage is combination of BM25 + dense retrieval followed by reranking step. First step is performed with similar queries generated by small LLM. All retrieved passages are rank fused.

test.rag24.no.rrf.no.expansion (trec_eval) (llm_eval)

IITD-IRL

automatic

yes

Multi-Stage Pipeline pointwise

This is 2 stage pipeline. With first stage is combination of dense retrieval followed by reranking step. First step is performed with the raw queries. This is the classical two stage pipeline with dense retrieval + reranking

anserini_bm25+rocchio.rag24.test_top100 (trec_eval) (llm_eval) (paper)

coordinators

automatic

Traditional Only

Anserini BM25 + Rocchio

anserini_bm25.rag24.test_top100 (trec_eval) (llm_eval) (paper)

coordinators

automatic

Traditional Only

Anserini BM25

fs4+monot5_rz.rag24.test_top100 (trec_eval) (llm_eval) (paper)

coordinators

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankZephyr

test.rag24.rrf.raw.query.MiniLM+BM25 (trec_eval) (llm_eval)

IITD-IRL

automatic

yes

Multi-Stage Pipeline pointwise

This is 2 stage pipeline. With first stage is combination of BM25 + dense retrieval followed by reranking step. First step is performed with raw queries LLM. All retrieved passages are rank fused.

fs4+monot5.rag24.test_top100 (trec_eval) (llm_eval) (paper)

coordinators

automatic

yes

Multi-Stage Pipeline pointwise

First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B)

fs4+monot5_listgalore.rag24.test_top100 (trec_eval) (llm_eval)

h2oloo

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RRF(RankGPT4-o, RankLLaMA3.1-70B, RankZephyr)

1 (top)

ldilab_repllama_listt5_pass3 (trec_eval) (llm_eval)

ldisnu

manual

yes

Multi-Stage Pipeline pointwise+pair/listwise

1 (top)

fs4+monot5_rg4o.rag24.test_top100 (trec_eval) (llm_eval)

h2oloo

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankGPT4o

ldilab_repllama_listt5_pass4 (trec_eval) (llm_eval)

ldisnu

manual

yes

Multi-Stage Pipeline pointwise+pair/listwise

fs4+monot5_rl31-70b.rag24.test_top100 (trec_eval) (llm_eval)

h2oloo

automatic

yes

Multi-Stage Pipeline pointwise+pair/listwise

First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankLLaMA3.1-70B

ielab-blender-llama70b-filtered (trec_eval) (llm_eval)

ielab

automatic

yes

Generation-in-the-loop Pipeline

Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator.

1 (top)

ielab-blender-llama70b (trec_eval) (llm_eval)

ielab

automatic

yes

Generation-in-the-loop Pipeline

Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 70B answer generation. No filter at the end.

ielab-blender-llama8b (trec_eval) (llm_eval)

ielab

automatic

yes

Generation-in-the-loop Pipeline

Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 8B answer generator. no filter at the end

ielab-blender-llama8b-filtered (trec_eval) (llm_eval)

ielab

automatic

yes

Generation-in-the-loop Pipeline

Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 8B answer generator.

ielab-bm25-stella-hybrid (trec_eval) (llm_eval)

ielab

automatic

yes

Ensemble/Fusion of First Stages

BM25 + Stella dense retriever hybrid run

ielab-blender-llama70b-external-only (trec_eval) (llm_eval)

ielab

automatic

yes

Generation-in-the-loop Pipeline

The external knowledge (rag) component only of the Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator.

ielab-blender-llama70b-internal-only (trec_eval) (llm_eval)

ielab

automatic

yes

Generation-in-the-loop Pipeline

The internal knowledge (direct LLM answer generation) component only of the Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator.

ISIR-IRIT-GEN (trec_eval) (llm_eval)

IRIT

automatic

yes

Generation-in-the-loop Pipeline

I used the zephyr model for the generation of 4 sub-questions with fewshot or I use 4 examples then I use LuceneSearcher with the initial query and the sub-queries generated to do an individual search and I rerank with MonoT5 and I take the first 20 for each query then I concatenate all the documents and I rerank everything with MonoT5 and I take the first 100.

1 (top)

ISIR-IRIT-Vanilla (trec_eval) (llm_eval)

IRIT

automatic

yes

Multi-Stage Pipeline pointwise

with the basic search and LuceneSearcher I do a search for docs and then the 100 docs returned I use MonoT5 to re-rank the results

1 (top)

The Thirty-Third Text REtrieval Conference
(TREC 2024)

Retrieval-Augmented Generation Retrieval task Appendix