TREC 2025 Proceedings

NITA_R_JH_HY

Submission Details

Organization
NITATREC
Track
Retrieval-Augmented Generation
Task
Retrieval Only Task
Date
2025-08-17

Run Description

Is this a manual (human intervention) or automatic run?
automatic
Does this run leverage neural networks?
yes
Does this run leverage proprietary models in any step of the retrieval pipeline?
no
Does this run leverage open-weight LLMs (> 5B parameters) in any step of the retrieval pipeline?
no
Does this run leverage smaller open-weight language models in any step of the retrieval pipeline?
yes
Was this run padded with results from a baseline run?
no
What would you categorize this run as?
Multi-Stage Pipeline pointwise
Please provide a short description of this run
The system implements a hybrid retrieval and reranking pipeline for TREC 2025 RAG. In the first stage, candidate documents are retrieved using both sparse and dense retrieval methods: BM25 (via Pyserini) retrieves the top 1000 documents per query, while Dense Passage Retrieval (DPR) leverages facebook/dpr-question_encoder-single-nq-base for queries and facebook/dpr-ctx_encoder-single-nq-base for documents to retrieve the top 500 candidates. The resulting sets are fused, ensuring unique documents across both retrievers. In the second stage, a cross-encoder model (cross-encoder/ms-marco-MiniLM-L-12-v2) scores each query-document pair to produce fine-grained relevance rankings. Documents are then sorted by cross-encoder scores, with the top 100 per query output in the TREC run file format. The pipeline utilizes GPU acceleration for DPR embedding generation and cross-encoder inference, optimizing retrieval efficiency and enabling scalable reranking across large candidate sets.
Please give this run a priority for inclusion in manual assessments.
1 (top)

Evaluation Files