TREC 2024 (33rd Text REtrieval Conference)

Runtag	Org	Is this run manual or automatic?	Briefly describe this run	Is this run a reranking of a baseline or a full ranking?	What other datasets were used in producing the run?	Did you make use of the news articles?	Briefly describe LLMs used for this run (optional)	Please give this run a priority for inclusion in manual assessments.
UWClarke_rerank (trec_eval) (paper)	WaterlooClarke	automatic	Using MonoT5 to rerank top 100, DuoT5 to rerank the top 10 decided by MonoT5	rerank	None	no	T5	1 (top)
Organizers-Baseline-BM25RM3 (trec_eval) (paper)	coordinators	automatic	BM25 (k1=0.9, b=0.4) with RM3 (fb_terms=10, fb_docs=10, original_query_weight=0.5) as implemented in Pyserini.	rerank	None.	no	Not used.	1 (top)
h2oloo-fused-gpt4o-zephyr-llama31_70b (trec_eval)	h2oloo	automatic	Segments corresponding to top 300 BM25 + Rocchio (10/5 sliding window) -> MonoT5 -> Top 100 segments - RRF(RankGPT4o, RankL3.1-70B, RankZephyr)	full	MS MARCO	no		1 (top)
h2oloo-bm25-rocchio-monot5-gpt4o (trec_eval)	h2oloo	automatic	(10/5 window) Segment corresponding to top 300 BM25 + Rocchio -> MonoT5 -> top 100 segments RankGPT4o	full	MS MARCO	no		2
h2oloo-bm25-rocchio-monot5-zephyr (trec_eval)	h2oloo	automatic	Segment corresponding to top 300 BM25 + Rocchio -> MonoT5 -> top 100 segments RankZephyr	full	MS MARCO	no		4
h2oloo-bm25-rocchio-monot5-lit5_xl_v2 (trec_eval)	h2oloo	automatic	Segment corresponding to top 300 BM25 + Rocchio -> MonoT5 -> top 100 segments LiT5-V2-XL (single pass)	full	MS MARCO	no		5 (bottom)
h2oloo-bm25-rocchio-monot5-lit5_large_v2 (trec_eval)	h2oloo	automatic	Segment corresponding to top 300 BM25 + Rocchio -> MonoT5 -> top 100 segments LiT5-V2-Large (single pass)	full	MS MARCO	no		5 (bottom)
h2oloo-bm25-rocchio-monot5 (trec_eval)	h2oloo	automatic	Segment corresponding to top 300 BM25 + Rocchio -> MonoT5	full	MS MARCO	no		5 (bottom)
h2oloo-bm25-rocchio (trec_eval)	h2oloo	automatic	Segment corresponding to top 300 BM25 + Rocchio	full	-	no		5 (bottom)
Organizers-LLM-Assessor (trec_eval) (paper)	coordinators	automatic	GPT-4o was used to generate 10 candidate queries for the original question. Then BM25 was used to retrieve 30 documents for each candidate query. Llama 3.1 8B Instruct was used to assess the relevance of those retrieved documents: very useful, useful, and not useful. The top 10 documents ranked by their usefulness assessments from Llama were selected for this run.	full	None.	yes	GPT-4o for query generation. Llama 3.1 8B Instruct for assessments.	1 (top)
TMU_V_BERTSim3 (trec_eval)	TMU_Toronto	automatic	This run processes 600 questions from the TREC 2024 Lateral Reading Task 2, reranking the top 100 documents retrieved by the baseline BM25-RM3 model. BERT (Bidirectional Encoder Representations from Transformers) is used to compute semantic similarity between each question and the document content. Specifically, BERT generates embeddings for both the question and the document by passing them through the pre-trained model, and the cosine similarity between these embeddings is calculated. The documents are then ranked based on their similarity scores to the question. Results for each question are saved in the required format, processing questions from question ID 1 to 600.	rerank	ClueWeb22-B dataset for Task 2 reranking trec-2024-lateral-reading-task2-questions.txt Organizers-Baseline-BM25RM3 trec-2024-lateral-reading-task2-baseline-documents.jsonl	yes		1 (top)

Runtag

Org

Is this run manual or automatic?

Briefly describe this run

Is this run a reranking of a baseline or a full ranking?

What other datasets were used in producing the run?

Did you make use of the news articles?

Briefly describe LLMs used for this run (optional)

Please give this run a priority for inclusion in manual assessments.

UWClarke_rerank (trec_eval) (paper)

WaterlooClarke

automatic

Using MonoT5 to rerank top 100, DuoT5 to rerank the top 10 decided by MonoT5

rerank

None

1 (top)

Organizers-Baseline-BM25RM3 (trec_eval) (paper)

coordinators

automatic

BM25 (k1=0.9, b=0.4) with RM3 (fb_terms=10, fb_docs=10, original_query_weight=0.5) as implemented in Pyserini.

rerank

None.

Not used.

1 (top)

h2oloo-fused-gpt4o-zephyr-llama31_70b (trec_eval)

h2oloo

automatic

Segments corresponding to top 300 BM25 + Rocchio (10/5 sliding window) -> MonoT5 -> Top 100 segments - RRF(RankGPT4o, RankL3.1-70B, RankZephyr)

full

MS MARCO

1 (top)

h2oloo-bm25-rocchio-monot5-gpt4o (trec_eval)

h2oloo

automatic

(10/5 window) Segment corresponding to top 300 BM25 + Rocchio -> MonoT5 -> top 100 segments RankGPT4o

full

MS MARCO

h2oloo-bm25-rocchio-monot5-zephyr (trec_eval)

h2oloo

automatic

Segment corresponding to top 300 BM25 + Rocchio -> MonoT5 -> top 100 segments RankZephyr

full

MS MARCO

h2oloo-bm25-rocchio-monot5-lit5_xl_v2 (trec_eval)

h2oloo

automatic

Segment corresponding to top 300 BM25 + Rocchio -> MonoT5 -> top 100 segments LiT5-V2-XL (single pass)

full

MS MARCO

5 (bottom)

h2oloo-bm25-rocchio-monot5-lit5_large_v2 (trec_eval)

h2oloo

automatic

Segment corresponding to top 300 BM25 + Rocchio -> MonoT5 -> top 100 segments LiT5-V2-Large (single pass)

full

MS MARCO

5 (bottom)

h2oloo-bm25-rocchio-monot5 (trec_eval)

h2oloo

automatic

Segment corresponding to top 300 BM25 + Rocchio -> MonoT5

full

MS MARCO

5 (bottom)

h2oloo-bm25-rocchio (trec_eval)

h2oloo

automatic

Segment corresponding to top 300 BM25 + Rocchio

full

5 (bottom)

Organizers-LLM-Assessor (trec_eval) (paper)

coordinators

automatic

GPT-4o was used to generate 10 candidate queries for the original question. Then BM25 was used to retrieve 30 documents for each candidate query. Llama 3.1 8B Instruct was used to assess the relevance of those retrieved documents: very useful, useful, and not useful. The top 10 documents ranked by their usefulness assessments from Llama were selected for this run.

full

None.

yes

GPT-4o for query generation. Llama 3.1 8B Instruct for assessments.

1 (top)

TMU_V_BERTSim3 (trec_eval)

TMU_Toronto

automatic

This run processes 600 questions from the TREC 2024 Lateral Reading Task 2, reranking the top 100 documents retrieved by the baseline BM25-RM3 model. BERT (Bidirectional Encoder Representations from Transformers) is used to compute semantic similarity between each question and the document content. Specifically, BERT generates embeddings for both the question and the document by passing them through the pre-trained model, and the cosine similarity between these embeddings is calculated. The documents are then ranked based on their similarity scores to the question. Results for each question are saved in the required format, processing questions from question ID 1 to 600.

rerank

ClueWeb22-B dataset for Task 2 reranking trec-2024-lateral-reading-task2-questions.txt Organizers-Baseline-BM25RM3 trec-2024-lateral-reading-task2-baseline-documents.jsonl

yes

1 (top)

The Thirty-Third Text REtrieval Conference
(TREC 2024)

Lateral Reading Document retrieval task Appendix