Runtag | Org | Is this a manual (human intervention) or automatic run? | Does this run leverage neural networks? | Does this run leverage proprietary models in any step of the retrieval pipeline? | Does this run leverage open-weight LLMs (> 5B parameters) in any step of the retrieval pipeline? | Does this run leverage smaller open-weight language models in any step of the retrieval pipeline? | Was this run padded with results from a baseline run? | What would you categorize this run as? | Please provide a short description of this run | Please give this run a priority for inclusion in manual assessments. |
---|---|---|---|---|---|---|---|---|---|---|
fs4_bm25+rocchio_snowael_snowaem_gtel+monot5_rrf+rz_rrf.rag24.test (trec_eval) (llm_eval) (paper) | coordinators | automatic | yes | no | yes | yes | no | Multi-Stage Pip | First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large)
Second Stage (top-3K): RRF(First Stage, monoT5-3B)
Third Stage (top-100): RRF(Second Stage, RankZephyr) | 1 (top) |
neu (trec_eval) (llm_eval) | neu | automatic | yes | yes | no | yes | no | Learned Dense Only | We use a retriever we trained for retrieval. This is a retriever finetuned based on minicpm2.4b. | 2 |
neurerank (trec_eval) (llm_eval) | neu | automatic | yes | yes | no | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | We first use a retriever we trained to retrieve, and then rerank based on a pairwise reranker. The retriever is fine-tuned based on minicpm, and the reranker selects bge-reranker-v2-minicpm-layerwise | 1 (top) |
rtask-bm25-colbert_faiss (trec_eval) (llm_eval) (paper) | softbank-meisei | automatic | yes | no | no | yes | no | Ensemble/Fusion of First Stages | Retrieval process of this run is as follows:
1. Topic list preprocessing stage
a. Used gpt4o to correct the grammar, spelling mistake and text incompletions
b. Manual checking to make sure there are no errors still existing
2. BM25 to retrieve the top-100 segments
3. Vector embeddings generation stage
a. Used castorini/tct_colbert-v2-hnp-msmarco to generate embeddings for the segment corpus.
b. Used faiss indexing to create index at document level (containing segment embeddings).
c. Used castorini/tct_colbert-v2-msmarco-cqe to generate embeddings for the prerpocessed topics.
4. For each topic, filtered the set of documents to search for based on the bm25 top-100 retrieval results.
5. Retrieve top-100 segments from each filtered document for the query.
6. Group all set of retrieved segments and sort in descending order
7. Top-100 from the sorted list is submitted as the result | 1 (top) |
rtask-bm25-rank_zephyr (trec_eval) (llm_eval) (paper) | softbank-meisei | automatic | yes | yes | no | no | no | Ensemble/Fusion of First Stages | The retrieval process of this run is as follows:
1. Topic list preprocessing stage:
a. Used GPT4o to preprocess the query in order to correct the grammar, spelling errors and text incompletion
b. Manual checking of all 301 queries to correct any errors that still exist
2. BM25 to retrieve the relevant top-100 segments
3. Rank zephyr to rerank the retrieved top-100 segments | 2 |
LAS_ENN_T5_RERANKED_MXBAI (trec_eval) (llm_eval) (paper) | ncsu-las | automatic | yes | no | no | no | no | Learned Dense Only | t5 exact nearest neigbors reranked by mxbai | 3 |
sim_and_rerank_v1 (trec_eval) (llm_eval) | KML | manual | yes | yes | no | no | yes | Multi-Stage Pipeline pointwise+pair/listwise | cohere embeddings + re-rank | 1 (top) |
monster (trec_eval) (llm_eval) (paper) | WaterlooClarke | automatic | yes | yes | yes | yes | yes | Generation-in-the-loop Pipeline | RRF of uwc1+uwc2 | 1 (top) |
uwc1 (trec_eval) (llm_eval) (paper) | WaterlooClarke | automatic | yes | yes | yes | yes | no | Generation-in-the-loop Pipeline | Runs contributing to uwc0 were pooled to depth 20 and relevance judged by GPT-4o (both graded and preferences). Results were ranked based on these judgments with uwc0 use to break ties and pad runs. | 2 |
uwc2 (trec_eval) (llm_eval) (paper) | WaterlooClarke | automatic | yes | yes | yes | yes | yes | Generation-in-the-loop Pipeline | Top-25 documents of the track baseline run were judged by GPT-4o on a graded scale and then re-ranked with the grade forming the primary key and the original score as the secondary key. | 3 |
uwc0 (trec_eval) (llm_eval) (paper) | WaterlooClarke | automatic | yes | yes | yes | yes | no | Generation-in-the-loop Pipeline | RRF of 15 different runs with 6 different ranking stacks, starting from two different query sets (original and query2doc expanded), along with optional final re-rankings | 4 |
uwcCQAR (trec_eval) (llm_eval) (paper) | WaterlooClarke | automatic | yes | yes | yes | yes | no | Generation-in-the-loop Pipeline | uwcCQ with query2doc expansion and re-ranking | 5 |
uwcCQA (trec_eval) (llm_eval) (paper) | WaterlooClarke | automatic | yes | yes | no | no | no | Generation-in-the-loop Pipeline | uwcCQ with query2doc expansion | 6 |
uwcCQR (trec_eval) (llm_eval) (paper) | WaterlooClarke | automatic | yes | yes | yes | yes | no | Generation-in-the-loop Pipeline | uwcCQ with re-ranking | 7 |
uwcCQ (trec_eval) (llm_eval) (paper) | WaterlooClarke | automatic | yes | yes | no | no | no | Learned Dense Only | Cohere baseline | 8 |
uwcBA (trec_eval) (llm_eval) (paper) | WaterlooClarke | automatic | yes | yes | no | no | no | Generation-in-the-loop Pipeline | BM25 with query2doc queries | 9 |
uwcBQ (trec_eval) (llm_eval) (paper) | WaterlooClarke | automatic | no | no | no | no | no | Traditional Only | BM25 with PRF | 10 (bottom) |
LAS-splade-mxbai-rrf (trec_eval) (llm_eval) (paper) | ncsu-las | automatic | yes | yes | no | no | no | Generation-in-the-loop Pipeline | Topic decomposition with GPT4o, SPLADE, rerank with mxbai sentence transformer, RRF to consolidate topic+subtopic results for final ranking | 1 (top) |
LAS-splade-mxbai (trec_eval) (llm_eval) (paper) | ncsu-las | automatic | yes | no | no | no | no | Multi-Stage Pipeline pointwise | SPLADE retrieval plus mxbai embedding rerank | 2 |
grill_fine_grained_rel_2_full_doc_cohere (trec_eval) (llm_eval) | grilllab | automatic | yes | yes | no | yes | no | Multi-Stage Pipeline pointwise | This run uses Cohere embeddings for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. The top-scoring relevant passages are then used to retrieve similar documents (using cohere embeddings), which are pooled together with the relevant ones. Finally, a MonoT5 reranker is applied to re-rank the combined passages set. | 1 (top) |
grill_fine_grained_rel_2_full_doc (trec_eval) (llm_eval) | grilllab | automatic | yes | yes | no | yes | no | Multi-Stage Pipeline pointwise | This run uses BM25 for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. The top-scoring relevant passages are then used to retrieve similar documents (using BM25), which are pooled together with the relevant ones. Finally, a MonoT5 reranker is applied to re-rank the combined passages set. | 1 (top) |
grill_fine_grained_summaries (trec_eval) (llm_eval) | grilllab | automatic | yes | yes | no | yes | no | Multi-Stage Pipeline pointwise | This run starts with BM25 for initial retrieval, followed by gpt4o-mini to filter out irrelevant passages. GPT4o-mini then generates concise, query-relevant summaries for each top-scoring passage. These summaries are used to retrieve similar documents via BM25. The combined set of retrieved passages is then pooled and re-ranked using a MonoT5 reranker. | 2 |
grill_fine_grained_rel_2_summaries_cohere (trec_eval) (llm_eval) | grilllab | automatic | yes | yes | no | yes | no | Multi-Stage Pipeline pointwise | This run uses Cohere embeddings for initial retrieval, followed by GPT4o-mini to filter out irrelevant passages. GPT4o-mini then generates concise, query-relevant summaries for each top-scoring passage. These summaries are used to retrieve similar documents via cohere embeddings. The combined set of retrieved passages is then pooled and re-ranked using a MonoT5 reranker. | 2 |
grill_fine_grained_rel_2_summaries_rrf_bm25_cohere (trec_eval) (llm_eval) | grilllab | automatic | yes | yes | no | yes | no | Ensemble/Fusion of First Stages | This run fuses the retrieval for BM25 and Cohere after using GPT4o-mini to filter out irrelevant results from both ranked lists and similar documents found with a subset of the relevant results. The fused results are reranked with MonoT5. | 1 (top) |
LAS_enn_t5 (trec_eval) (llm_eval) (paper) | ncsu-las | automatic | yes | no | no | no | no | Learned Dense Only | exact nearest neighbor search with sentence t5xxl | 5 |
LAS_ann_t5_qdrant (trec_eval) (llm_eval) (paper) | ncsu-las | automatic | yes | no | no | no | no | Learned Dense Only | approximate nearest neighbor search on t5 embeddings via qdrant | 4 |
sim_and_rerank_200_docs (trec_eval) (llm_eval) | KML | automatic | no | yes | no | no | no | Multi-Stage Pipeline pointwise+pair/listwise | cohere embeddings with 200 docs retrieval and rerank for 100 docs. | 1 (top) |
ASCITI_co_gpt (trec_eval) (llm_eval) | citi | automatic | yes | yes | no | no | no | Multi-Stage Pipeline pointwise+pair/listwise | This run involves using the Cohere embedding model to encode documents and queries and get the retrieval results, then reranking the top 100 results using an LLM(GPT-3.5 turbo). | 1 (top) |
ASCITI_co_bge (trec_eval) (llm_eval) | citi | automatic | yes | yes | no | yes | no | Multi-Stage Pipeline pointwise | This run involves using Cohere model to retrieve documents and then applying a bge reranker, which was downloaded from Hugging Face, to rerank the results. | 3 |
ASCITI_co_co (trec_eval) (llm_eval) | citi | automatic | yes | yes | no | no | no | Multi-Stage Pipeline pointwise | This run involves using Cohere's embedding to retrieve the documents and then using Cohere's reranker to rerank these results. | 2 |
weaviate_dense_base (trec_eval) (llm_eval) | buw | automatic | yes | no | no | yes | no | Learned Dense Only | This is a baseline dense retrieval pipeline which performed considerably well. The segments were simply vectorized using the "multi-qa-MiniLM-L6-cos-v1" embedding model at FP16 precision into a local sharded weaviate instance. And the retrieval is based on cosine similarity between the query's embedding and the segments'. | 1 (top) |
zeph_test_rag_rrf_expand_Rtask (trec_eval) (llm_eval) | IITD-IRL | automatic | yes | no | no | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | The pipeline consists of three stages. The first stage leverages BM25 combined with dense retrieval. The second stage employs the Stella model for reranking, and the final stage uses Zepher for list-wise sorting. Before dense retrieval, the query is expanded to generate a small passage on the central theme. Retrieval is then performed using both the raw query and the generated paragraph. RRF is performed at the first stage | 1 (top) |
zeph_test_rag_rrf_raw_query_Rtrack (trec_eval) (llm_eval) | IITD-IRL | automatic | yes | no | no | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | It is a three stage pipeline, first stage leverages BM25+dense retrieval. second stage uses Stella model for reranking and final step uses Zepher to generate list wise sorting. First stage retrieval is performed using raw query.RRF is applied on the first stage itself. | 2 |
zeph_test_rag24_doc_query_expansion+rrf_Rtask (trec_eval) (llm_eval) | IITD-IRL | automatic | yes | no | no | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | The pipeline consists of three stages. The first stage leverages BM25 combined with dense retrieval. The second stage employs the Stella model for reranking, and the final stage uses Zepher for list-wise sorting. Before dense retrieval, the query is expanded to generate a small passage on the central theme. Retrieval is then performed using both the raw query and the generated paragraph. RRF is performed at the first stage | 3 |
qdrant_bge_small (trec_eval) (llm_eval) | SGU | manual | yes | yes | no | yes | yes | Traditional Only | Due to hardware issues and limitations, we only used 80% of the organizer's data, we used Qdrant (cosine) as a storage, and used the public bge embedding model. | 1 (top) |
SpladeV3_only (trec_eval) (llm_eval) | TLSE3 | manual | yes | no | no | yes | no | Learned Sparse Only | Simple SPLADE sparse retrieval, no quantization, used Qdrant | 5 |
SPLADE+Jina (trec_eval) (llm_eval) | TLSE3 | manual | yes | no | no | yes | no | Multi-Stage Pipeline pointwise | SPLADE first stage with jina-reranker-v2-base-multilingual reranker | 4 |
SPLADE+BGEv2m3 (trec_eval) (llm_eval) | TLSE3 | manual | yes | no | no | yes | no | Multi-Stage Pipeline pointwise | SPLADE first stage then BAAI/bge-reranker-v2-m3 reranker | 3 |
UDInfolab.bge (trec_eval) (llm_eval) | InfoLab | manual | yes | no | yes | yes | no | Learned Dense Only | This run uses BGE | 3 |
SPLADE+Gemini (trec_eval) (llm_eval) | TLSE3 | manual | yes | yes | no | yes | no | Generation-in-the-loop Pipeline | SPLADE first stage + RankGPT with Gemini, one context window, RankGPT almost unmodified besides Gemini support. | 1 (top) |
webis-01 (trec_eval) (llm_eval) (paper) | webis | automatic | yes | yes | yes | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | We use multiple systems to create a re-ranking pool for MonoT5 and MonoElectra that are subsequently fused and re-ranked with RankZephyr. The re-ranking pool was created by fusing the results of traditional retrieval systems with a learned dense model and automatically created boolean query variants retrieved against traditional retrieval systems and additionally enriched by corpus graph retrieval. For the traditional retrieval, we submitted the original queries against Anserini (BM25, INL2, QLD) and ChatNoir (BM25F with a boost for Wikipedia). For the dense retrieval, we used weaviate. We created boolean query variants by using GPT-4o-mini and Llama3.1 by first extracting potential aspects of the query and subsequently generating boolean queries with the LLMs to capture those aspects, the boolean queries were retrieved against ChatNoir. We did re-rank the pools with monoT5-3b and MonoElectra, and used the top-results for-adaptive re-ranking against ChatNoir (i.e., the corpus graph concept). The top-100 monoT5 and monoElectra documents were re-ranked with RankZephyr yielding two runs that we fused with reciprocal rank fusion. On this run, we again re-ranked the top-100 results with RankZephyr, using cascading re-ranking (i.e., re-rank the results of RankZephyr multiple times, we stopped after three iterations). For retrieval, we used the segment, headings, and titles as text. For re-ranking (i.e., with MonoT5, MonoElectra, and RankZephyr), we used only the segment text, i.e., not the title and headings. | 1 (top) |
UDInfolab.bge.AnsAi (trec_eval) (llm_eval) | InfoLab | manual | yes | yes | no | yes | no | Multi-Stage Pipeline pointwise | Implementation using doc2query with BGE | 1 (top) |
UDInfolab.bge.query (trec_eval) (llm_eval) | InfoLab | manual | yes | yes | no | yes | no | Multi-Stage Pipeline pointwise | Implementation rewriting the query with BGE | 2 |
UDInfolab.bge.ranker (trec_eval) (llm_eval) | InfoLab | manual | yes | no | no | yes | no | Ensemble/Fusion of First Stages | Implementation reraking | 3 |
UDInfolab.bm25.ro.tuned (trec_eval) (llm_eval) | InfoLab | manual | no | no | no | no | no | Traditional Only | BM25+Roccio Tuned | 5 |
UDInfolab.bm25.ro (trec_eval) (llm_eval) | InfoLab | manual | no | no | no | no | no | Traditional Only | BM25+Rocchio | 6 |
UDInfolab.bm25 (trec_eval) (llm_eval) | InfoLab | manual | no | no | no | no | no | Traditional Only | BM25 | 7 |
webis-02 (trec_eval) (llm_eval) (paper) | webis | automatic | yes | yes | yes | yes | yes | Multi-Stage Pipeline pointwise+pair/listwise | This run aims to increase the recall base, therefore, the run only consists of documents that are not retrieved within the top-1000 of BM25, QLD, INL2 as implemented in Anserini, BM25F as implemented in ChatNoir, and the top-1000 of our weaviate implementation (dense retrieval). The documents were retrieved via adaptive re-ranking (i.e., the corpus graph) of the top results of RankZephyr and our boolean query formulation as used in the run webis-01). To not waste judgment budget, we only include documents that make it into the top-75 of our webis-01 run (that incorporated cascading re-ranking). For some topics that did not retrieve new documents we pad with the baseline. | 2 |
webis-03 (trec_eval) (llm_eval) (paper) | webis | automatic | yes | yes | yes | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | This is our run webis-01 but diversified so that each segment is removed for which a neighbouring segment was already retrieved. This aims to ensure that an LLM (for the retrieval augmented generation) sees more diverse retrieval content. | 3 |
webis-04 (trec_eval) (llm_eval) (paper) | webis | automatic | yes | yes | yes | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | This is our run webis-01 but diversified so that per page only the top-segment retrieved. This aims to ensure that an LLM (for the retrieval augmented generation) sees more diverse retrieval content. | 10 (bottom) |
ldilab_repllama_listt5_pass2 (trec_eval) (llm_eval) | ldisnu | manual | yes | no | yes | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass) | 3 |
ldilab_repllama_listt5_pass1 (trec_eval) (llm_eval) | ldisnu | manual | yes | no | yes | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass) | 4 |
iiia_standard (trec_eval) (llm_eval) | IIIA-UNIPD | automatic | yes | no | no | yes | no | Learned Dense Only | This run was generated using sentence-transformers/msmarco-distilbert-base-tas-b | 10 (bottom) |
iiia_dedup (trec_eval) (llm_eval) | IIIA-UNIPD | automatic | yes | no | no | yes | no | Learned Dense Only | This run was created using sentence-transformers/msmarco-distilbert-base-tas-b and filtering the duplicated documents | 10 (bottom) |
dense_on_sparse (trec_eval) (llm_eval) | buw | automatic | yes | no | no | no | no | Ensemble/Fusion of First Stages | This run includes hybrid retrieval results based on baseline Pyserini retrieval. Basically, top 1000 segments per query were retrieved using Pyserini indices and these were vectorized with "multi-qa-MiniLM-L6-cos-v1" to retrieve top 100 using hybrid search. | 1 (top) |
webis-05 (trec_eval) (llm_eval) (paper) | webis | automatic | yes | yes | yes | yes | no | Multi-Stage Pipeline pointwise | We use multiple systems to create a re-ranking pool for MonoElectra. The re-ranking pool was created by fusing the results of traditional retrieval systems with a learned dense model and automatically created boolean query variants retrieved against traditional retrieval systems and additionally enriched by corpus graph retrieval. For the traditional retrieval, we submitted the original queries against Anserini (BM25, INL2, QLD) and ChatNoir (BM25F with a boost for Wikipedia). For the dense retrieval, we used weaviate. We created boolean query variants by using GPT-4o-mini and Llama3.1 by first extracting potential aspects of the query and subsequently generating boolean queries with the LLMs to capture those aspects, the boolean queries were retrieved against ChatNoir. We did re-rank the pools with MonoElectra. | 4 |
test.rag24.rrf.expanded.BM25.MiniLM (trec_eval) (llm_eval) | IITD-IRL | automatic | yes | no | no | yes | no | Multi-Stage Pipeline pointwise | This is 2 stage pipeline. With first stage is combination of BM25 + dense retrieval followed by reranking step. First step is performed with similar queries generated by small LLM. All retrieved passages are rank fused. | 4 |
test.rag24.no.rrf.no.expansion (trec_eval) (llm_eval) | IITD-IRL | automatic | yes | no | no | yes | no | Multi-Stage Pipeline pointwise | This is 2 stage pipeline. With first stage is combination of dense retrieval followed by reranking step. First step is performed with the raw queries. This is the classical two stage pipeline with dense retrieval + reranking | 6 |
anserini_bm25+rocchio.rag24.test_top100 (trec_eval) (llm_eval) (paper) | coordinators | automatic | no | no | no | no | no | Traditional Only | Anserini BM25 + Rocchio | 4 |
anserini_bm25.rag24.test_top100 (trec_eval) (llm_eval) (paper) | coordinators | automatic | no | no | no | no | no | Traditional Only | Anserini BM25 | 5 |
fs4+monot5_rz.rag24.test_top100 (trec_eval) (llm_eval) (paper) | coordinators | automatic | yes | no | yes | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankZephyr | 2 |
test.rag24.rrf.raw.query.MiniLM+BM25 (trec_eval) (llm_eval) | IITD-IRL | automatic | yes | no | no | yes | no | Multi-Stage Pipeline pointwise | This is 2 stage pipeline. With first stage is combination of BM25 + dense retrieval followed by reranking step. First step is performed with raw queries LLM. All retrieved passages are rank fused. | 5 |
fs4+monot5.rag24.test_top100 (trec_eval) (llm_eval) (paper) | coordinators | automatic | yes | no | no | yes | no | Multi-Stage Pipeline pointwise | First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) | 3 |
fs4+monot5_listgalore.rag24.test_top100 (trec_eval) (llm_eval) | h2oloo | automatic | yes | yes | yes | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RRF(RankGPT4-o, RankLLaMA3.1-70B, RankZephyr) | 1 (top) |
ldilab_repllama_listt5_pass3 (trec_eval) (llm_eval) | ldisnu | manual | yes | no | yes | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass) | 1 (top) |
fs4+monot5_rg4o.rag24.test_top100 (trec_eval) (llm_eval) | h2oloo | automatic | yes | yes | no | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankGPT4o | 2 |
ldilab_repllama_listt5_pass4 (trec_eval) (llm_eval) | ldisnu | manual | yes | no | yes | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | We used Repllama-7B as bi-encoder for first-stage retrieval, and rerank top-100 using ListT5-3B with r=2 and tournament sort. The first reranking was done on repllama top-1000, and others are done on previous top-100 results. We run reranking multiple(# of pass | 2 |
fs4+monot5_rl31-70b.rag24.test_top100 (trec_eval) (llm_eval) | h2oloo | automatic | yes | no | yes | yes | no | Multi-Stage Pipeline pointwise+pair/listwise | First Stage (top-3K): RRF(BM25 + Rocchio, Snowflake Embed L, Snowflake Embed M, GTE Large) Second Stage (top-3K): RRF(First Stage, monoT5-3B) Third Stage (top-100): RankLLaMA3.1-70B | 3 |
ielab-blender-llama70b-filtered (trec_eval) (llm_eval) | ielab | automatic | yes | no | yes | yes | no | Generation-in-the-loop Pipeline | Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator. | 1 (top) |
ielab-blender-llama70b (trec_eval) (llm_eval) | ielab | automatic | yes | no | yes | yes | no | Generation-in-the-loop Pipeline | Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 70B answer generation. No filter at the end. | 2 |
ielab-blender-llama8b (trec_eval) (llm_eval) | ielab | automatic | yes | no | yes | yes | no | Generation-in-the-loop Pipeline | Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 8B answer generator. no filter at the end | 4 |
ielab-blender-llama8b-filtered (trec_eval) (llm_eval) | ielab | automatic | yes | no | yes | yes | no | Generation-in-the-loop Pipeline | Blender pipeline, BM25 + Stella hybrid base retriever, llama3.1 8B answer generator. | 3 |
ielab-bm25-stella-hybrid (trec_eval) (llm_eval) | ielab | automatic | yes | no | no | yes | no | Ensemble/Fusion of First Stages | BM25 + Stella dense retriever hybrid run | 5 |
ielab-blender-llama70b-external-only (trec_eval) (llm_eval) | ielab | automatic | yes | no | yes | yes | no | Generation-in-the-loop Pipeline | The external knowledge (rag) component only of the Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator. | 6 |
ielab-blender-llama70b-internal-only (trec_eval) (llm_eval) | ielab | automatic | yes | no | yes | yes | no | Generation-in-the-loop Pipeline | The internal knowledge (direct LLM answer generation) component only of the Blender pipeline with BM25 + Stella hybrid base retriever and LLama3.1 70B answer generator. | 7 |
ISIR-IRIT-GEN (trec_eval) (llm_eval) | IRIT | automatic | yes | no | no | no | no | Generation-in-the-loop Pipeline | I used the zephyr model for the generation of 4 sub-questions with fewshot or I use 4 examples then I use LuceneSearcher with the initial query and the sub-queries generated to do an individual search and I rerank with MonoT5 and I take the first 20 for each query then I concatenate all the documents and I rerank everything with MonoT5 and I take the first 100. | 1 (top) |
ISIR-IRIT-Vanilla (trec_eval) (llm_eval) | IRIT | automatic | yes | no | no | no | no | Multi-Stage Pipeline pointwise | with the basic search and LuceneSearcher I do a search for docs and then the 100 docs returned I use MonoT5 to re-rank the results | 1 (top) |