TREC 2024 (33rd Text REtrieval Conference)

Runtag	Org	What type of manually annotated information does the system use?	How is conversation understanding (NLP/rewriting) performed in this run (check all that apply)?	What data is used for conversational query understanding in this run (check all that apply)?	How is ranking performed in this run (check all that apply)?	What data is used to develop the ranking method in this run (check all that apply)?	Please specify all the methods used to handle feedback or clarification responses from the user (check all that apply).	Please describe the method used to generate the final conversational responses from one or more retrieved passages (check all that apply).	Please describe the external resources used by this run, if applicable.	Please provide a short description of this run.	Please provide a priority for assessing this run. (If resources do not allow all runs to be assessed, NIST will work in priority order, resolving ties arbitrarily).
RALI_manual_monot5 (trec_eval) (ptkb.trec_eval) (paper)	rali lab	manual: system uses only manually rewritten utterances	['method uses other query understanding method (please describe below)']	['method uses iKAT provided manually rewritten utterances (note: this makes it a manual run)']	['method uses traditional unsupervised sparse retrieval (e.g.¸ QL¸ BM25¸ etc.)', 'method performs re-ranking with a pre-trained neural language model (BERT¸ Roberta¸ T5¸ etc.) (please describe specifics in the description field below)']	['method is trained on other datasets (please describe below)']	['method does not treat them specially']	['method uses other approaches (please specify in description below)']	We used Pyserini implementation of BM25. We also used a pretrained monoT5 reranker available at https://huggingface.co/castorini/monot5-base-msmarco-10k	This run is retrieval-only, i.e. does not participate in response evaluation. It consists of 2 key steps. (1) Retrieval: use BM25 to retrieve top 1000 documents w.r.t. manual rewrite. (2) Reranking: top 50 documents from the previous step are reranked using the monoT5 model, w.r.t manual rewrite.	5 (bottom)
RALI_manual_rankllama (trec_eval) (ptkb.trec_eval) (paper)	rali lab	manual: system uses only manually rewritten utterances	['method uses other query understanding method (please describe below)']	['method uses other external data (please specify in the external resources field below)']	['method uses traditional unsupervised sparse retrieval (e.g.¸ QL¸ BM25¸ etc.)', 'method performs re-ranking with large langauge models (LLaMA¸ GPT-x¸ etc.) (please describe specifics in the description field below)']	['method is trained on other datasets (please describe below)']	['method does not treat them specially']	['method uses other approaches (please specify in description below)']	We used Pyserini implementation of BM25. We also used a pretrained rankllama reranker available at https://huggingface.co/castorini/rankllama-v1-7b-lora-passage	This run is retrieval-only, i.e. does not participate in response evaluation. It consists of 2 key steps. (1) Retrieval: use BM25 to retrieve top 1000 documents w.r.t. manual rewrite. (2) Reranking: top 50 documents from the previous step are reranked using the rankllama model, w.r.t manual rewrite.	5 (bottom)
manual-splade-fusion (trec_eval) (ptkb.trec_eval) (paper)	uva	manual: system uses only manually rewritten utterances	['method uses other query understanding method (please describe below)']	['method uses iKAT provided manually rewritten utterances (note: this makes it a manual run)']	['method uses other ranking method (please describe below)']	['method is trained on other datasets (please describe below)']	['method does not treat them specially']	['method uses multiple sources (multiple passages)']	manual-splade-fusion	manual-splade-fusion	2
manual-splade-debertav3 (trec_eval) (ptkb.trec_eval) (paper)	uva	manual: system uses only manually rewritten utterances	['method uses other query understanding method (please describe below)']	['method uses iKAT provided manually rewritten utterances (note: this makes it a manual run)']	['method uses other ranking method (please describe below)']	['method is trained on other datasets (please describe below)']	['method does not treat them specially']	['method uses multiple sources (multiple passages)']	manual-splade-debertav3	manual-splade-debertav3	3
baseline-manual-bm25-minilm (trec_eval) (ptkb.trec_eval) (paper)	coordinators	manual: system uses only manually rewritten utterances	['method uses large language models like LLaMA and GPT-x.']	['method uses other external data (please specify in the external resources field below)']	['method uses other ranking method (please describe below)']	['method is trained on other datasets (please describe below)']	['method does not treat them specially']	['method uses multiple sources (multiple passages)']	baseline-manual-bm25-minilm	baseline-manual-bm25-minilm	1 (top)
baseline-manual-splade-minilm (trec_eval) (ptkb.trec_eval) (paper)	coordinators	manual: system uses only manually rewritten utterances	['method uses large language models like LLaMA and GPT-x.']	['method uses other external data (please specify in the external resources field below)']	['method uses other ranking method (please describe below)']	['method uses provided manual baseline', 'method is trained on other datasets (please describe below)']	['method does not treat them specially']	['method uses multiple sources (multiple passages)']	baseline-manual-splade-minilm	baseline-manual-splade-minilm	1 (top)

Runtag

Org

What type of manually annotated information does the system use?

How is conversation understanding (NLP/rewriting) performed in this run (check all that apply)?

What data is used for conversational query understanding in this run (check all that apply)?

How is ranking performed in this run (check all that apply)?

What data is used to develop the ranking method in this run (check all that apply)?

Please specify all the methods used to handle feedback or clarification responses from the user (check all that apply).

Please describe the method used to generate the final conversational responses from one or more retrieved passages (check all that apply).

Please describe the external resources used by this run, if applicable.

Please provide a short description of this run.

Please provide a priority for assessing this run. (If resources do not allow all runs to be assessed, NIST will work in priority order, resolving ties arbitrarily).

RALI_manual_monot5 (trec_eval) (ptkb.trec_eval) (paper)

rali lab

manual: system uses only manually rewritten utterances

['method uses other query understanding method (please describe below)']

['method uses iKAT provided manually rewritten utterances (note: this makes it a manual run)']

['method uses traditional unsupervised sparse retrieval (e.g.¸ QL¸ BM25¸ etc.)', 'method performs re-ranking with a pre-trained neural language model (BERT¸ Roberta¸ T5¸ etc.) (please describe specifics in the description field below)']

['method is trained on other datasets (please describe below)']

['method does not treat them specially']

['method uses other approaches (please specify in description below)']

We used Pyserini implementation of BM25. We also used a pretrained monoT5 reranker available at https://huggingface.co/castorini/monot5-base-msmarco-10k

This run is retrieval-only, i.e. does not participate in response evaluation. It consists of 2 key steps. (1) Retrieval: use BM25 to retrieve top 1000 documents w.r.t. manual rewrite. (2) Reranking: top 50 documents from the previous step are reranked using the monoT5 model, w.r.t manual rewrite.

5 (bottom)

RALI_manual_rankllama (trec_eval) (ptkb.trec_eval) (paper)

rali lab

manual: system uses only manually rewritten utterances

['method uses other query understanding method (please describe below)']

['method uses other external data (please specify in the external resources field below)']

['method uses traditional unsupervised sparse retrieval (e.g.¸ QL¸ BM25¸ etc.)', 'method performs re-ranking with large langauge models (LLaMA¸ GPT-x¸ etc.) (please describe specifics in the description field below)']

['method is trained on other datasets (please describe below)']

['method does not treat them specially']

['method uses other approaches (please specify in description below)']

We used Pyserini implementation of BM25. We also used a pretrained rankllama reranker available at https://huggingface.co/castorini/rankllama-v1-7b-lora-passage

5 (bottom)

manual-splade-fusion (trec_eval) (ptkb.trec_eval) (paper)

uva

manual: system uses only manually rewritten utterances

['method uses other query understanding method (please describe below)']

['method uses iKAT provided manually rewritten utterances (note: this makes it a manual run)']

['method uses other ranking method (please describe below)']

['method is trained on other datasets (please describe below)']

['method does not treat them specially']

['method uses multiple sources (multiple passages)']

manual-splade-fusion

manual-splade-debertav3 (trec_eval) (ptkb.trec_eval) (paper)

uva

manual: system uses only manually rewritten utterances

['method uses other query understanding method (please describe below)']

['method uses iKAT provided manually rewritten utterances (note: this makes it a manual run)']

['method uses other ranking method (please describe below)']

['method is trained on other datasets (please describe below)']

['method does not treat them specially']

['method uses multiple sources (multiple passages)']

manual-splade-debertav3

baseline-manual-bm25-minilm (trec_eval) (ptkb.trec_eval) (paper)

coordinators

manual: system uses only manually rewritten utterances

['method uses large language models like LLaMA and GPT-x.']

['method uses other external data (please specify in the external resources field below)']

['method uses other ranking method (please describe below)']

['method is trained on other datasets (please describe below)']

['method does not treat them specially']

['method uses multiple sources (multiple passages)']

baseline-manual-bm25-minilm

1 (top)

baseline-manual-splade-minilm (trec_eval) (ptkb.trec_eval) (paper)

coordinators

manual: system uses only manually rewritten utterances

['method uses large language models like LLaMA and GPT-x.']

['method uses other external data (please specify in the external resources field below)']

['method uses other ranking method (please describe below)']

['method uses provided manual baseline', 'method is trained on other datasets (please describe below)']

['method does not treat them specially']

['method uses multiple sources (multiple passages)']

baseline-manual-splade-minilm

1 (top)

The Thirty-Third Text REtrieval Conference
(TREC 2024)

Interactive Knowledge Assistance Track (iKAT) Manual task Appendix