The Thirty-Third Text REtrieval Conference
(TREC 2024)

Retrieval-Augmented Generation Retrieval task Appendix

RuntagOrgIs this a manual (human intervention) or automatic run?Does this run leverage neural networks?Does this run leverage proprietary models in any step of the retrieval pipeline?Does this run leverage open-weight LLMs (> 5B parameters) in any step of the retrieval pipeline?Does this run leverage smaller open-weight language models in any step of the retrieval pipeline?Was this run padded with results from a baseline run?What would you categorize this run as?Please provide a short description of this runPlease give this run a priority for inclusion in manual assessments.
fs4_bm25+rocchio_snowael_snowaem_gtel+monot5_rrf+rz_rrf.rag24.test (trec_eval) (llm_eval) (paper)coordinators
automatic
yes
no
yes
yes
no
Multi-Stage Pip
First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RRF(Second Stage, RankZephyr)
1 (top)
neu (trec_eval) (llm_eval) neu
automatic
yes
yes
no
yes
no
Learned Dense Only
We use a retriever we trained for retrieval. This is a retriever finetuned based on minicpm2.4b.
2
neurerank (trec_eval) (llm_eval) neu
automatic
yes
yes
no
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
We first use a retriever we trained to retrieve, and then rerank based on a pairwise reranker. The retriever is fine-tuned based on minicpm, and the reranker selects bge-reranker-v2-minicpm-layerwise
1 (top)
rtask-bm25-colbert_faiss (trec_eval) (llm_eval) (paper)softbank-meisei
automatic
yes
no
no
yes
no
Ensemble/Fusion of First Stages
Retrieval process of this run is as follows: 1. Topic list preprocessing stage a. Used gpt4o to correct the grammar, spelling mistake and text incompletions b. Manual checking to make sure there are no errors still existing 2. BM25 to retrieve the top-100 segments 3. Vector embeddings generation stage a. Used castorini/tct_colbert-v2-hnp-msmarco to generate embeddings for the segment corpus. b. Used faiss indexing to create index at document level (containing segment embeddings). c. Used castorini/tct_colbert-v2-msmarco-cqe to generate embeddings for the prerpocessed topics. 4. For each topic, filtered the set of documents to search for based on the bm25 top-100 retrieval results. 5. Retrieve top-100 segments from each filtered document for the query. 6. Group all set of retrieved segments and sort in descending order 7. Top-100 from the sorted list is submitted as the result
1 (top)
rtask-bm25-rank_zephyr (trec_eval) (llm_eval) (paper)softbank-meisei
automatic
yes
yes
no
no
no
Ensemble/Fusion of First Stages
The retrieval process of this run is as follows: 1. Topic list preprocessing stage: a. Used GPT4o to preprocess the query in order to correct the grammar, spelling errors and text incompletion b. Manual checking of all 301 queries to correct any errors that still exist 2. BM25 to retrieve the relevant top-100 segments 3. Rank zephyr to rerank the retrieved top-100 segments
2
LAS_ENN_T5_RERANKED_MXBAI (trec_eval) (llm_eval) (paper)ncsu-las
automatic
yes
no
no
no
no
Learned Dense Only
t5 exact nearest neigbors reranked by mxbai
3
sim_and_rerank_v1 (trec_eval) (llm_eval) KML
manual
yes
yes
no
no
yes
Multi-Stage Pipeline pointwise+pair/listwise
cohere embeddings + re-rank
1 (top)
monster (trec_eval) (llm_eval) (paper)WaterlooClarke
automatic
yes
yes
yes
yes
yes
Generation-in-the-loop Pipeline
RRF of uwc1+uwc2
1 (top)
uwc1 (trec_eval) (llm_eval) (paper)WaterlooClarke
automatic
yes
yes
yes
yes
no
Generation-in-the-loop Pipeline
Runs contributing to uwc0 were pooled to depth 20 and relevance judged by GPT-4o (both graded and preferences). Results were ranked based on these judgments with uwc0 use to break ties and pad runs.
2
uwc2 (trec_eval) (llm_eval) (paper)WaterlooClarke
automatic
yes
yes
yes
yes
yes
Generation-in-the-loop Pipeline
Top-25 documents of the track baseline run were judged by GPT-4o on a graded scale and then re-ranked with the grade forming the primary key and the original score as the secondary key.
3
uwc0 (trec_eval) (llm_eval) (paper)WaterlooClarke
automatic
yes
yes
yes
yes
no
Generation-in-the-loop Pipeline
RRF of 15 different runs with 6 different ranking stacks, starting from two different query sets (original and query2doc expanded), along with optional final re-rankings
4
uwcCQAR (trec_eval) (llm_eval) (paper)WaterlooClarke
automatic
yes
yes
yes
yes
no
Generation-in-the-loop Pipeline
uwcCQ with query2doc expansion and re-ranking
5
uwcCQA (trec_eval) (llm_eval) (paper)WaterlooClarke
automatic
yes
yes
no
no
no
Generation-in-the-loop Pipeline
uwcCQ with query2doc expansion
6
uwcCQR (trec_eval) (llm_eval) (paper)WaterlooClarke
automatic
yes
yes
yes
yes
no
Generation-in-the-loop Pipeline
uwcCQ with re-ranking
7
uwcCQ (trec_eval) (llm_eval) (paper)WaterlooClarke
automatic
yes
yes
no
no
no
Learned Dense Only
Cohere baseline
8
uwcBA (trec_eval) (llm_eval) (paper)WaterlooClarke
automatic
yes
yes
no
no
no
Generation-in-the-loop Pipeline
BM25 with query2doc queries
9
uwcBQ (trec_eval) (llm_eval) (paper)WaterlooClarke
automatic
no
no
no
no
no
Traditional Only
BM25 with PRF
10 (bottom)
LAS-splade-mxbai-rrf (trec_eval) (llm_eval) (paper)ncsu-las
automatic
yes
yes
no
no
no
Generation-in-the-loop Pipeline
Topic decomposition with GPT4o, SPLADE, rerank with mxbai sentence transformer, RRF to consolidate topic+subtopic results for final ranking
1 (top)
LAS-splade-mxbai (trec_eval) (llm_eval) (paper)ncsu-las
automatic
yes
no
no
no
no
Multi-Stage Pipeline pointwise
SPLADE retrieval plus mxbai embedding rerank
2
grill_fine_grained_rel_2_full_doc_cohere (trec_eval) (llm_eval) grilllab
automatic
yes
yes
no
yes
no
Multi-Stage Pipeline pointwise
This run uses Cohere embeddings for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. The top-scoring relevant passages are then used to retrieve similar documents (using cohere embeddings), which are pooled together with the relevant ones. Finally, a MonoT5 reranker is applied to re-rank the combined passages set.
1 (top)
grill_fine_grained_rel_2_full_doc (trec_eval) (llm_eval) grilllab
automatic
yes
yes
no
yes
no
Multi-Stage Pipeline pointwise
This run uses BM25 for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. The top-scoring relevant passages are then used to retrieve similar documents (using BM25), which are pooled together with the relevant ones. Finally, a MonoT5 reranker is applied to re-rank the combined passages set.
1 (top)
grill_fine_grained_summaries (trec_eval) (llm_eval) grilllab
automatic
yes
yes
no
yes
no
Multi-Stage Pipeline pointwise
This run starts with BM25 for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. GPT4o-mini then generates concise, query-relevant summaries for each top-scoring passage. These summaries are used to retrieve similar documents via BM25. The combined set of retrieved passages is then pooled and re-ranked using a MonoT5 reranker.
2
grill_fine_grained_rel_2_summaries_cohere (trec_eval) (llm_eval) grilllab
automatic
yes
yes
no
yes
no
Multi-Stage Pipeline pointwise
This run uses Cohere embeddings for initial retrieval, followed by GPT4o-mini to filter out irrelevant passages. GPT4o-mini then generates concise, query-relevant summaries for each top-scoring passage. These summaries are used to retrieve similar documents via cohere embeddings. The combined set of retrieved passages is then pooled and re-ranked using a MonoT5 reranker.
2
grill_fine_grained_rel_2_summaries_rrf_bm25_cohere (trec_eval) (llm_eval) grilllab
automatic
yes
yes
no
yes
no
Ensemble/Fusion of First Stages
This run fuses the retrieval for BM25 and Cohere after using GPT4o-mini to filter out irrelevant results from both ranked lists and similar documents found with a subset of the relevant results. The fused results are reranked with MonoT5.
1 (top)
LAS_enn_t5 (trec_eval) (llm_eval) (paper)ncsu-las
automatic
yes
no
no
no
no
Learned Dense Only
exact nearest neighbor search with sentence t5xxl
5
LAS_ann_t5_qdrant (trec_eval) (llm_eval) (paper)ncsu-las
automatic
yes
no
no
no
no
Learned Dense Only
approximate nearest neighbor search on t5 embeddings via qdrant
4
sim_and_rerank_200_docs (trec_eval) (llm_eval) KML
automatic
no
yes
no
no
no
Multi-Stage Pipeline pointwise+pair/listwise
cohere embeddings with 200 docs retrieval and rerank for 100 docs.
1 (top)
ASCITI_co_gpt (trec_eval) (llm_eval) citi
automatic
yes
yes
no
no
no
Multi-Stage Pipeline pointwise+pair/listwise
This run involves using the Cohere embedding model to encode documents and queries and get the retrieval results, then reranking the top 100 results using an LLM(GPT-3.5 turbo).
1 (top)
ASCITI_co_bge (trec_eval) (llm_eval) citi
automatic
yes
yes
no
yes
no
Multi-Stage Pipeline pointwise
This run involves using Cohere model to retrieve documents and then applying a bge reranker, which was downloaded from Hugging Face, to rerank the results.
3
ASCITI_co_co (trec_eval) (llm_eval) citi
automatic
yes
yes
no
no
no
Multi-Stage Pipeline pointwise
This run involves using Cohere's embedding to retrieve the documents and then using Cohere's reranker to rerank these results.
2
weaviate_dense_base (trec_eval) (llm_eval) buw
automatic
yes
no
no
yes
no
Learned Dense Only
This is a baseline dense retrieval pipeline which performed considerably well. The segments were simply vectorized using the "multi-qa-MiniLM-L6-cos-v1" embedding model at FP16 precision into a local sharded weaviate instance. And the retrieval is based on cosine similarity between the query's embedding and the segments'.
1 (top)
zeph_test_rag_rrf_expand_Rtask (trec_eval) (llm_eval) IITD-IRL
automatic
yes
no
no
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
The pipeline consists of three stages. The first stage leverages BM25 combined with dense retrieval. The second stage employs the Stella model for reranking, and the final stage uses Zepher for list-wise sorting. Before dense retrieval, the query is expanded to generate a small passage on the central theme. Retrieval is then performed using both the raw query and the generated paragraph. RRF is performed at the first stage
1 (top)
zeph_test_rag_rrf_raw_query_Rtrack (trec_eval) (llm_eval) IITD-IRL
automatic
yes
no
no
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
It is a three stage pipeline, first stage leverages BM25+dense retrieval. second stage uses Stella model for reranking and final step uses Zepher to generate list wise sorting. First stage retrieval is performed using raw query.RRF is applied on the first stage itself.
2
zeph_test_rag24_doc_query_expansion+rrf_Rtask (trec_eval) (llm_eval) IITD-IRL
automatic
yes
no
no
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
The pipeline consists of three stages. The first stage leverages BM25 combined with dense retrieval. The second stage employs the Stella model for reranking, and the final stage uses Zepher for list-wise sorting. Before dense retrieval, the query is expanded to generate a small passage on the central theme. Retrieval is then performed using both the raw query and the generated paragraph. RRF is performed at the first stage
3
qdrant_bge_small (trec_eval) (llm_eval) SGU
manual
yes
yes
no
yes
yes
Traditional Only
Due to hardware issues and limitations, we only used 80% of the organizer's data, we used Qdrant (cosine) as a storage, and used the public bge embedding model.
1 (top)
SpladeV3_only (trec_eval) (llm_eval) TLSE3
manual
yes
no
no
yes
no
Learned Sparse Only
Simple SPLADE sparse retrieval, no quantization, used Qdrant
5
SPLADE+Jina (trec_eval) (llm_eval) TLSE3
manual
yes
no
no
yes
no
Multi-Stage Pipeline pointwise
SPLADE first stage with jina-reranker-v2-base-multilingual reranker
4
SPLADE+BGEv2m3 (trec_eval) (llm_eval) TLSE3
manual
yes
no
no
yes
no
Multi-Stage Pipeline pointwise
SPLADE first stage then BAAI/bge-reranker-v2-m3 reranker
3
UDInfolab.bge (trec_eval) (llm_eval) InfoLab
manual
yes
no
yes
yes
no
Learned Dense Only
This run uses BGE
3
SPLADE+Gemini (trec_eval) (llm_eval) TLSE3
manual
yes
yes
no
yes
no
Generation-in-the-loop Pipeline
SPLADE first stage + RankGPT with Gemini, one context window, RankGPT almost unmodified besides Gemini support.
1 (top)
webis-01 (trec_eval) (llm_eval) (paper)webis
automatic
yes
yes
yes
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
We use multiple systems to create a re-ranking pool for MonoT5 and MonoElectra that are subsequently fused and re-ranked with RankZephyr. The re-ranking pool was created by fusing the results of traditional retrieval systems with a learned dense model and automatically created boolean query variants retrieved against traditional retrieval systems and additionally enriched by corpus graph retrieval. For the traditional retrieval, we submitted the original queries against Anserini (BM25, INL2, QLD) and ChatNoir (BM25F with a boost for Wikipedia). For the dense retrieval, we used weaviate. We created boolean query variants by using GPT-4o-mini and Llama3.1 by first extracting potential aspects of the query and subsequently generating boolean queries with the LLMs to capture those aspects, the boolean queries were retrieved against ChatNoir. We did re-rank the pools with monoT5-3b and MonoElectra, and used the top-results for-adaptive re-ranking against ChatNoir (i.e., the corpus graph concept). The top-100 monoT5 and monoElectra documents were re-ranked with RankZephyr yielding two runs that we fused with reciprocal rank fusion. On this run, we again re-ranked the top-100 results with RankZephyr, using cascading re-ranking (i.e., re-rank the results of RankZephyr multiple times, we stopped after three iterations). For retrieval, we used the segment, headings, and titles as text. For re-ranking (i.e., with MonoT5, MonoElectra, and RankZephyr), we used only the segment text, i.e., not the title and headings.
1 (top)
UDInfolab.bge.AnsAi (trec_eval) (llm_eval) InfoLab
manual
yes
yes
no
yes
no
Multi-Stage Pipeline pointwise
Implementation using doc2query with BGE
1 (top)
UDInfolab.bge.query (trec_eval) (llm_eval) InfoLab
manual
yes
yes
no
yes
no
Multi-Stage Pipeline pointwise
Implementation rewriting the query with BGE
2
UDInfolab.bge.ranker (trec_eval) (llm_eval) InfoLab
manual
yes
no
no
yes
no
Ensemble/Fusion of First Stages
Implementation reraking
3
UDInfolab.bm25.ro.tuned (trec_eval) (llm_eval) InfoLab
manual
no
no
no
no
no
Traditional Only
BM25+Roccio Tuned
5
UDInfolab.bm25.ro (trec_eval) (llm_eval) InfoLab
manual
no
no
no
no
no
Traditional Only
BM25+Rocchio
6
UDInfolab.bm25 (trec_eval) (llm_eval) InfoLab
manual
no
no
no
no
no
Traditional Only
BM25
7
webis-02 (trec_eval) (llm_eval) (paper)webis
automatic
yes
yes
yes
yes
yes
Multi-Stage Pipeline pointwise+pair/listwise
This run aims to increase the recall base, therefore, the run only consists of documents that are not retrieved within the top-1000 of BM25, QLD, INL2 as implemented in Anserini, BM25F as implemented in ChatNoir, and the top-1000 of our weaviate implementation (dense retrieval). The documents were retrieved via adaptive re-ranking (i.e., the corpus graph) of the top results of RankZephyr and our boolean query formulation as used in the run webis-01). To not waste judgment budget, we only include documents that make it into the top-75 of our webis-01 run (that incorporated cascading re-ranking). For some topics that did not retrieve new documents we pad with the baseline.
2
webis-03 (trec_eval) (llm_eval) (paper)webis
automatic
yes
yes
yes
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
This is our run webis-01 but diversified so that each segment is removed for which a neighbouring segment was already retrieved. This aims to ensure that an LLM (for the retrieval augmented generation) sees more diverse retrieval content.
3
webis-04 (trec_eval) (llm_eval) (paper)webis
automatic
yes
yes
yes
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
This is our run webis-01 but diversified so that per page only the top-segment retrieved. This aims to ensure that an LLM (for the retrieval augmented generation) sees more diverse retrieval content.
10 (bottom)
ldilab_repllama_listt5_pass2 (trec_eval) (llm_eval) ldisnu
manual
yes
no
yes
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass)
3
ldilab_repllama_listt5_pass1 (trec_eval) (llm_eval) ldisnu
manual
yes
no
yes
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass)
4
iiia_standard (trec_eval) (llm_eval) IIIA-UNIPD
automatic
yes
no
no
yes
no
Learned Dense Only
This run was generated using sentence-transformers/msmarco-distilbert-base-tas-b
10 (bottom)
iiia_dedup (trec_eval) (llm_eval) IIIA-UNIPD
automatic
yes
no
no
yes
no
Learned Dense Only
This run was created using sentence-transformers/msmarco-distilbert-base-tas-b and filtering the duplicated documents
10 (bottom)
dense_on_sparse (trec_eval) (llm_eval) buw
automatic
yes
no
no
no
no
Ensemble/Fusion of First Stages
This run includes hybrid retrieval results based on baseline Pyserini retrieval. Basically, top 1000 segments per query were retrieved using Pyserini indices and these were vectorized with "multi-qa-MiniLM-L6-cos-v1" to retrieve top 100 using hybrid search.
1 (top)
webis-05 (trec_eval) (llm_eval) (paper)webis
automatic
yes
yes
yes
yes
no
Multi-Stage Pipeline pointwise
We use multiple systems to create a re-ranking pool for MonoElectra. The re-ranking pool was created by fusing the results of traditional retrieval systems with a learned dense model and automatically created boolean query variants retrieved against traditional retrieval systems and additionally enriched by corpus graph retrieval. For the traditional retrieval, we submitted the original queries against Anserini (BM25, INL2, QLD) and ChatNoir (BM25F with a boost for Wikipedia). For the dense retrieval, we used weaviate. We created boolean query variants by using GPT-4o-mini and Llama3.1 by first extracting potential aspects of the query and subsequently generating boolean queries with the LLMs to capture those aspects, the boolean queries were retrieved against ChatNoir. We did re-rank the pools with MonoElectra.
4
test.rag24.rrf.expanded.BM25.MiniLM (trec_eval) (llm_eval) IITD-IRL
automatic
yes
no
no
yes
no
Multi-Stage Pipeline pointwise
This is 2 stage pipeline. With first stage is combination of BM25 + dense retrieval followed by reranking step. First step is performed with similar queries generated by small LLM. All retrieved passages are rank fused.
4
test.rag24.no.rrf.no.expansion (trec_eval) (llm_eval) IITD-IRL
automatic
yes
no
no
yes
no
Multi-Stage Pipeline pointwise
This is 2 stage pipeline. With first stage is combination of dense retrieval followed by reranking step. First step is performed with the raw queries. This is the classical two stage pipeline with dense retrieval + reranking
6
anserini_bm25+rocchio.rag24.test_top100 (trec_eval) (llm_eval) (paper)coordinators
automatic
no
no
no
no
no
Traditional Only
Anserini BM25 + Rocchio
4
anserini_bm25.rag24.test_top100 (trec_eval) (llm_eval) (paper)coordinators
automatic
no
no
no
no
no
Traditional Only
Anserini BM25
5
fs4+monot5_rz.rag24.test_top100 (trec_eval) (llm_eval) (paper)coordinators
automatic
yes
no
yes
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankZephyr
2
test.rag24.rrf.raw.query.MiniLM+BM25 (trec_eval) (llm_eval) IITD-IRL
automatic
yes
no
no
yes
no
Multi-Stage Pipeline pointwise
This is 2 stage pipeline. With first stage is combination of BM25 + dense retrieval followed by reranking step. First step is performed with raw queries LLM. All retrieved passages are rank fused.
5
fs4+monot5.rag24.test_top100 (trec_eval) (llm_eval) (paper)coordinators
automatic
yes
no
no
yes
no
Multi-Stage Pipeline pointwise
First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B)
3
fs4+monot5_listgalore.rag24.test_top100 (trec_eval) (llm_eval) h2oloo
automatic
yes
yes
yes
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RRF(RankGPT4-o, RankLLaMA3.1-70B, RankZephyr)
1 (top)
ldilab_repllama_listt5_pass3 (trec_eval) (llm_eval) ldisnu
manual
yes
no
yes
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass)
1 (top)
fs4+monot5_rg4o.rag24.test_top100 (trec_eval) (llm_eval) h2oloo
automatic
yes
yes
no
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankGPT4o
2
ldilab_repllama_listt5_pass4 (trec_eval) (llm_eval) ldisnu
manual
yes
no
yes
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass
2
fs4+monot5_rl31-70b.rag24.test_top100 (trec_eval) (llm_eval) h2oloo
automatic
yes
no
yes
yes
no
Multi-Stage Pipeline pointwise+pair/listwise
First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankLLaMA3.1-70B
3
ielab-blender-llama70b-filtered (trec_eval) (llm_eval) ielab
automatic
yes
no
yes
yes
no
Generation-in-the-loop Pipeline
Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator.
1 (top)
ielab-blender-llama70b (trec_eval) (llm_eval) ielab
automatic
yes
no
yes
yes
no
Generation-in-the-loop Pipeline
Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 70B answer generation. No filter at the end.
2
ielab-blender-llama8b (trec_eval) (llm_eval) ielab
automatic
yes
no
yes
yes
no
Generation-in-the-loop Pipeline
Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 8B answer generator. no filter at the end
4
ielab-blender-llama8b-filtered (trec_eval) (llm_eval) ielab
automatic
yes
no
yes
yes
no
Generation-in-the-loop Pipeline
Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 8B answer generator.
3
ielab-bm25-stella-hybrid (trec_eval) (llm_eval) ielab
automatic
yes
no
no
yes
no
Ensemble/Fusion of First Stages
BM25 + Stella dense retriever hybrid run
5
ielab-blender-llama70b-external-only (trec_eval) (llm_eval) ielab
automatic
yes
no
yes
yes
no
Generation-in-the-loop Pipeline
The external knowledge (rag) component only of the Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator.
6
ielab-blender-llama70b-internal-only (trec_eval) (llm_eval) ielab
automatic
yes
no
yes
yes
no
Generation-in-the-loop Pipeline
The internal knowledge (direct LLM answer generation) component only of the Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator.
7
ISIR-IRIT-GEN (trec_eval) (llm_eval) IRIT
automatic
yes
no
no
no
no
Generation-in-the-loop Pipeline
I used the zephyr model for the generation of 4 sub-questions with fewshot or I use 4 examples then I use LuceneSearcher with the initial query and the sub-queries generated to do an individual search and I rerank with MonoT5 and I take the first 20 for each query then I concatenate all the documents and I rerank everything with MonoT5 and I take the first 100.
1 (top)
ISIR-IRIT-Vanilla (trec_eval) (llm_eval) IRIT
automatic
yes
no
no
no
no
Multi-Stage Pipeline pointwise
with the basic search and LuceneSearcher I do a search for docs and then the 100 docs returned I use MonoT5 to re-rank the results
1 (top)