TREC 2025 Proceedings
NITA_R_JH_HY
Submission Details
- Organization
- NITATREC
- Track
- Retrieval-Augmented Generation
- Task
- Retrieval Only Task
- Date
- 2025-08-17
Run Description
- Is this a manual (human intervention) or automatic run?
- automatic
- Does this run leverage neural networks?
- yes
- Does this run leverage proprietary models in any step of the retrieval pipeline?
- no
- Does this run leverage open-weight LLMs (> 5B parameters) in any step of the retrieval pipeline?
- no
- Does this run leverage smaller open-weight language models in any step of the retrieval pipeline?
- yes
- Was this run padded with results from a baseline run?
- no
- What would you categorize this run as?
- Multi-Stage Pipeline pointwise
- Please provide a short description of this run
- The system implements a hybrid retrieval and reranking pipeline for TREC 2025 RAG. In the first stage, candidate documents are retrieved using both sparse and dense retrieval methods: BM25 (via Pyserini) retrieves the top 1000 documents per query, while Dense Passage Retrieval (DPR) leverages facebook/dpr-question_encoder-single-nq-base for queries and facebook/dpr-ctx_encoder-single-nq-base for documents to retrieve the top 500 candidates. The resulting sets are fused, ensuring unique documents across both retrievers. In the second stage, a cross-encoder model (cross-encoder/ms-marco-MiniLM-L-12-v2) scores each query-document pair to produce fine-grained relevance rankings. Documents are then sorted by cross-encoder scores, with the top 100 per query output in the TREC run file format. The pipeline utilizes GPU acceleration for DPR embedding generation and cross-encoder inference, optimizing retrieval efficiency and enabling scalable reranking across large candidate sets.
- Please give this run a priority for inclusion in manual assessments.
- 1 (top)
Evaluation Files