TREC 2024 (33rd Text REtrieval Conference)

Runtag	Org	Is this run manual or automatic?	Is this run text-only, image-only, or multi-modal?	Briefly describe this run	What other datasets were used in producing the run?	Briefly describe LLMs used for this run (optional)	Please give this run a priority for inclusion in manual assessments.
BM25 (trec_eval)	Lowes-DS	automatic	text-only	BM25	None		1 (top)
BM25-QE (trec_eval)	Lowes-DS	automatic	text-only	BM25 with Query Expansion	None		1 (top)
Rerank (trec_eval)	Lowes-DS	automatic	text-only	Top 1000 BM25 reranked with TAS-B	None		1 (top)
TAS-B (trec_eval)	Lowes-DS	automatic	text-only	Single representation biencoder dense retrieval method TAS-B	None		1 (top)
SPLADE++ (trec_eval)	Lowes-DS	automatic	text-only	Learned Sparse Vector method SPLADE++	None		1 (top)
BM25-TAS-B-fusion (trec_eval)	Lowes-DS	automatic	text-only	TAS-B and BM25 fusion	None		1 (top)
BM25-SPLADE++-fusion (trec_eval)	Lowes-DS	automatic	text-only	BM25 and SPLADE++ fusion	None		1 (top)
BM25QE-TAS-B-fusion (trec_eval)	Lowes-DS	automatic	text-only	BM25 with Query Expansion and TAS-B fusion	None		1 (top)
BM25QE-SPLADE++-fusion (trec_eval)	Lowes-DS	automatic	text-only	BM25 with Query Expansion and SPLADE++ fusion	None		1 (top)
jbnu08 (trec_eval) (paper)	jbnu	automatic	text-only	Fusion of jbnu02 and jbnu04 using the ranx library.	No other datasets were used.		1 (top)
jbnu04 (trec_eval) (paper)	jbnu	automatic	text-only	Using the ColBERT model and overcoming the maximum token limit by utilizing document summaries generated by T5.	No other datasets were used.		3
jbnu09 (trec_eval) (paper)	jbnu	automatic	text-only	Modifying the SPLADE model to calculate negative scores using the GELU function, and overcoming the maximum token limit by summarizing documents using T5 for retrieval.	No other datasets were used.		4
jbnu01 (trec_eval) (paper)	jbnu	automatic	text-only	Modifying the SPLADE model to calculate negative scores using the GELU activation function.	No other datasets were used.		5 (bottom)
jbnu07 (trec_eval) (paper)	jbnu	automatic	text-only	Fusion of jbnu02 and jbnu03 using the ranx library.	No other datasets were used.		5 (bottom)
jbnu10 (trec_eval) (paper)	jbnu	automatic	text-only	Translating queries and applying Pseudo Relevance Feedback (PRF) and ASIN-title conversion, followed by retrieval using BM25, and fusion of the results with jbnu04 using the ranx library.	No other datasets were used.		5 (bottom)
jbnu03 (trec_eval) (paper)	jbnu	automatic	text-only	Using the TAS-B model with title and T5 (document) summary data.	No other datasets were used.		5 (bottom)
jbnu02 (trec_eval) (paper)	jbnu	automatic	text-only	Fine-tuning the base SPLADE model, then overcoming the maximum token limit by summarizing documents using T5 for retrieval.	No other datasets were used.		1 (top)
jbnu11 (trec_eval) (paper)	jbnu	automatic	text-only	Fusion of jbnu09 and jbnu03 using the ranx library.	No other datasets were used.		5 (bottom)
bm25-simple-collection (trec_eval)	stktest	manual	text-only	bm25	none	none	4
jbnu12 (trec_eval) (paper)	jbnu	automatic	text-only	Fusion of jbnu09 and jbnu04 using the ranx library.	No other datasets were used.		5 (bottom)
jbnu05 (trec_eval) (paper)	jbnu	automatic	text-only	Fusion of jbnu01 and jbnu03 using the ranx library.	No other datasets were used.		5 (bottom)
jbnu06 (trec_eval) (paper)	jbnu	automatic	text-only	Fusion of jbnu01 and jbnu04 using the ranx library.	No other datasets were used.		5 (bottom)
res_img_splade_bm25_rerank (trec_eval)	wish	automatic	multi-modal	We trained a dual-tower model that maps queries to product texts and images, retrieving the K nearest neighbors of the query vector from product embeddings. We also incorporated additional candidates from SPADE++ and BM25, then reranked all candidates using a cross-encoder model.	This run utilized heuristically labeled relevance judgments for (query, product) pairs from Wish.com’s search data.		2
res_splade_bm25_rerank (trec_eval)	wish	automatic	text-only	We trained a dual-tower model that maps queries to product texts, retrieving the K nearest neighbors of the query vector from product embeddings. We also incorporated additional candidates from SPADE++ and BM25, then reranked all candidates using a cross-encoder model.	This run utilized heuristically labeled relevance judgments for (query, product) pairs from Wish.com’s search data.		2
long_res_img_splade_bm25 (trec_eval)	wish	automatic	multi-modal	We trained a dual-tower model with a longer sequence that maps queries to product texts and images, retrieving the K nearest neighbors of the query vector from product embeddings. We also incorporated additional candidates from SPADE++ and BM25, then reranked all candidates using a cross-encoder model.	This run utilized heuristically labeled relevance judgments for (query, product) pairs from Wish.com’s search data.		1 (top)
long_res_splade_bm25 (trec_eval)	wish	automatic	text-only	We trained a dual-tower model with a longer sequence length that maps queries to product texts, retrieving the K nearest neighbors of the query vector from product embeddings. We also incorporated additional candidates from SPADE++ and BM25, then reranked all candidates using a cross-encoder model.	This run utilized heuristically labeled relevance judgments for (query, product) pairs from Wish.com’s search data.		1 (top)
kd_bm25_100_listwise_20_10 (trec_eval)	kd	automatic	text-only	This run uses BM25 to retrieve top-100 candidates and then applies a listwise sliding window strategy to rerank the top-100 candidates.	The item description, ratings, and all content (except for image) was used.	GPT4o	3
kd_bm25_100_listwise_40_15 (trec_eval)	kd	automatic	text-only	This run retrieves the top-100 items using BM25 and then performs a sliding window approach with window size 40 and stride 10 to rerank them.	All the fields (ratings, item information, title, reviews, etc) are used except for image information	GPT4o	3
kd_bm25_100_listwise_20_10_twice (trec_eval)	kd	automatic	text-only	This run uses BM25 to retrieve top-100 candidates and then applies a listwise sliding window strategy twice to rerank the top-100 candidates. The ordered top-k from the first pass of sliding window listwise is revisited again for reranking.	The item description, ratings, and all content (except for image) was used.	GP4o	1 (top)
kd_bm25_100_listwise_30_15 (trec_eval)	kd	manual	text-only	This run uses BM25 to retrieve top-100 candidates and then applies a listwise sliding window strategy to rerank the top-100 candidates. Sliding window of width 30 and stride of 15.	The item description, ratings, and all content (except for image) was used.	GPT4o	3
snowflake arctic medium model (trec_eval)	stktest	manual	text-only	snowflake arctic medium model	none	snowflake arctic medium model	2
snowflake arctic large model (trec_eval)	stktest	manual	text-only	snowflake arctic large model	none	snowflake arctic large model	1 (top)
GTE Large (trec_eval)	stktest	manual	text-only	https://huggingface.co/thenlper/gte-large General text embeddings	none	https://huggingface.co/thenlper/gte-large General text embeddings	1 (top)
kd_bm25_100_listwise_20_10_llama_spark (trec_eval)	kd	automatic	text-only	This run uses BM25 to retrieve top-100 candidates and then applies a listwise sliding window strategy to rerank the top-100 candidates using the Llama Spark.	The item description, ratings, and all content (except for image) was used.	Llama Spark: Llama-Spark is built upon the Llama-3.1-8B base model, fine-tuned using the Tome Dataset, and merged with Llama-3.1-8B-Instruct.	2
kd_linear_combo_1_100 (trec_eval)	kd	automatic	text-only	This computes the rankings through three sliding window ranking approaches, each of which uses a paraphrase of some base instructions and then the scores are combined together for the final ranking.	All the fields (ratings, item information, title, reviews, etc) are used except for image information	GPT4o	2
run_bm25_1000_listwise_50_20 (trec_eval)	kd	automatic	text-only	This run uses BM25 to retrieve top-1000 candidates and then applies a listwise sliding window strategy of window 50 and stride 20 to rerank the top-1000 candidates	The item description, ratings, and all content (except for image) was used.	GPT4o	3
run_bm25_1000_listwise_50_30 (trec_eval)	kd	automatic	text-only	This run uses BM25 to retrieve top-1000 candidates and then applies a listwise sliding window strategy of window 50 and stride 30 to rerank the top-1000 candidates	The item description, ratings, and all content (except for image) was used.	GPT4o	3

Runtag

Org

Is this run manual or automatic?

Is this run text-only, image-only, or multi-modal?

Briefly describe this run

What other datasets were used in producing the run?

Briefly describe LLMs used for this run (optional)

Please give this run a priority for inclusion in manual assessments.

BM25 (trec_eval)

Lowes-DS

automatic

text-only

BM25

None

1 (top)

BM25-QE (trec_eval)

Lowes-DS

automatic

text-only

BM25 with Query Expansion

None

1 (top)

Rerank (trec_eval)

Lowes-DS

automatic

text-only

Top 1000 BM25 reranked with TAS-B

None

1 (top)

TAS-B (trec_eval)

Lowes-DS

automatic

text-only

Single representation biencoder dense retrieval method TAS-B

None

1 (top)

SPLADE++ (trec_eval)

Lowes-DS

automatic

text-only

Learned Sparse Vector method SPLADE++

None

1 (top)

BM25-TAS-B-fusion (trec_eval)

Lowes-DS

automatic

text-only

TAS-B and BM25 fusion

None

1 (top)

BM25-SPLADE++-fusion (trec_eval)

Lowes-DS

automatic

text-only

BM25 and SPLADE++ fusion

None

1 (top)

BM25QE-TAS-B-fusion (trec_eval)

Lowes-DS

automatic

text-only

BM25 with Query Expansion and TAS-B fusion

None

1 (top)

BM25QE-SPLADE++-fusion (trec_eval)

Lowes-DS

automatic

text-only

BM25 with Query Expansion and SPLADE++ fusion

None

1 (top)

jbnu08 (trec_eval) (paper)

jbnu

automatic

text-only

Fusion of jbnu02 and jbnu04 using the ranx library.

No other datasets were used.

1 (top)

jbnu04 (trec_eval) (paper)

jbnu

automatic

text-only

Using the ColBERT model and overcoming the maximum token limit by utilizing document summaries generated by T5.

No other datasets were used.

jbnu09 (trec_eval) (paper)

jbnu

automatic

text-only

Modifying the SPLADE model to calculate negative scores using the GELU function, and overcoming the maximum token limit by summarizing documents using T5 for retrieval.

No other datasets were used.

jbnu01 (trec_eval) (paper)

jbnu

automatic

text-only

Modifying the SPLADE model to calculate negative scores using the GELU activation function.

No other datasets were used.

5 (bottom)

jbnu07 (trec_eval) (paper)

jbnu

automatic

text-only

Fusion of jbnu02 and jbnu03 using the ranx library.

No other datasets were used.

5 (bottom)

jbnu10 (trec_eval) (paper)

jbnu

automatic

text-only

Translating queries and applying Pseudo Relevance Feedback (PRF) and ASIN-title conversion, followed by retrieval using BM25, and fusion of the results with jbnu04 using the ranx library.

No other datasets were used.

5 (bottom)

jbnu03 (trec_eval) (paper)

jbnu

automatic

text-only

Using the TAS-B model with title and T5 (document) summary data.

No other datasets were used.

5 (bottom)

jbnu02 (trec_eval) (paper)

jbnu

automatic

text-only

Fine-tuning the base SPLADE model, then overcoming the maximum token limit by summarizing documents using T5 for retrieval.

No other datasets were used.

1 (top)

jbnu11 (trec_eval) (paper)

jbnu

automatic

text-only

Fusion of jbnu09 and jbnu03 using the ranx library.

No other datasets were used.

5 (bottom)

bm25-simple-collection (trec_eval)

stktest

manual

text-only

bm25

none

jbnu12 (trec_eval) (paper)

jbnu

automatic

text-only

Fusion of jbnu09 and jbnu04 using the ranx library.

No other datasets were used.

5 (bottom)

jbnu05 (trec_eval) (paper)

jbnu

automatic

text-only

Fusion of jbnu01 and jbnu03 using the ranx library.

No other datasets were used.

5 (bottom)

jbnu06 (trec_eval) (paper)

jbnu

automatic

text-only

Fusion of jbnu01 and jbnu04 using the ranx library.

No other datasets were used.

5 (bottom)

res_img_splade_bm25_rerank (trec_eval)

wish

automatic

multi-modal

We trained a dual-tower model that maps queries to product texts and images, retrieving the K nearest neighbors of the query vector from product embeddings. We also incorporated additional candidates from SPADE++ and BM25, then reranked all candidates using a cross-encoder model.

This run utilized heuristically labeled relevance judgments for (query, product) pairs from Wish.com’s search data.

res_splade_bm25_rerank (trec_eval)

wish

automatic

text-only

We trained a dual-tower model that maps queries to product texts, retrieving the K nearest neighbors of the query vector from product embeddings. We also incorporated additional candidates from SPADE++ and BM25, then reranked all candidates using a cross-encoder model.

This run utilized heuristically labeled relevance judgments for (query, product) pairs from Wish.com’s search data.

long_res_img_splade_bm25 (trec_eval)

wish

automatic

multi-modal

We trained a dual-tower model with a longer sequence that maps queries to product texts and images, retrieving the K nearest neighbors of the query vector from product embeddings. We also incorporated additional candidates from SPADE++ and BM25, then reranked all candidates using a cross-encoder model.

This run utilized heuristically labeled relevance judgments for (query, product) pairs from Wish.com’s search data.

1 (top)

long_res_splade_bm25 (trec_eval)

wish

automatic

text-only

We trained a dual-tower model with a longer sequence length that maps queries to product texts, retrieving the K nearest neighbors of the query vector from product embeddings. We also incorporated additional candidates from SPADE++ and BM25, then reranked all candidates using a cross-encoder model.

This run utilized heuristically labeled relevance judgments for (query, product) pairs from Wish.com’s search data.

1 (top)

kd_bm25_100_listwise_20_10 (trec_eval)

automatic

text-only

This run uses BM25 to retrieve top-100 candidates and then applies a listwise sliding window strategy to rerank the top-100 candidates.

The item description, ratings, and all content (except for image) was used.

GPT4o

kd_bm25_100_listwise_40_15 (trec_eval)

automatic

text-only

This run retrieves the top-100 items using BM25 and then performs a sliding window approach with window size 40 and stride 10 to rerank them.

All the fields (ratings, item information, title, reviews, etc) are used except for image information

GPT4o

kd_bm25_100_listwise_20_10_twice (trec_eval)

automatic

text-only

This run uses BM25 to retrieve top-100 candidates and then applies a listwise sliding window strategy twice to rerank the top-100 candidates. The ordered top-k from the first pass of sliding window listwise is revisited again for reranking.

The item description, ratings, and all content (except for image) was used.

GP4o

1 (top)

kd_bm25_100_listwise_30_15 (trec_eval)

manual

text-only

This run uses BM25 to retrieve top-100 candidates and then applies a listwise sliding window strategy to rerank the top-100 candidates. Sliding window of width 30 and stride of 15.

The item description, ratings, and all content (except for image) was used.

GPT4o

snowflake arctic medium model (trec_eval)

stktest

manual

text-only

snowflake arctic medium model

none

snowflake arctic medium model

snowflake arctic large model (trec_eval)

stktest

manual

text-only

snowflake arctic large model

none

snowflake arctic large model

1 (top)

GTE Large (trec_eval)

stktest

manual

text-only

https://huggingface.co/thenlper/gte-large General text embeddings

none

https://huggingface.co/thenlper/gte-large General text embeddings

1 (top)

kd_bm25_100_listwise_20_10_llama_spark (trec_eval)

automatic

text-only

This run uses BM25 to retrieve top-100 candidates and then applies a listwise sliding window strategy to rerank the top-100 candidates using the Llama Spark.

The item description, ratings, and all content (except for image) was used.

Llama Spark: Llama-Spark is built upon the Llama-3.1-8B base model, fine-tuned using the Tome Dataset, and merged with Llama-3.1-8B-Instruct.

kd_linear_combo_1_100 (trec_eval)

automatic

text-only

This computes the rankings through three sliding window ranking approaches, each of which uses a paraphrase of some base instructions and then the scores are combined together for the final ranking.

All the fields (ratings, item information, title, reviews, etc) are used except for image information

GPT4o

run_bm25_1000_listwise_50_20 (trec_eval)

automatic

text-only

This run uses BM25 to retrieve top-1000 candidates and then applies a listwise sliding window strategy of window 50 and stride 20 to rerank the top-1000 candidates

The item description, ratings, and all content (except for image) was used.

GPT4o

run_bm25_1000_listwise_50_30 (trec_eval)

automatic

text-only

This run uses BM25 to retrieve top-1000 candidates and then applies a listwise sliding window strategy of window 50 and stride 30 to rerank the top-1000 candidates

The item description, ratings, and all content (except for image) was used.

GPT4o

The Thirty-Third Text REtrieval Conference
(TREC 2024)

Product Search Main task Appendix