2025 Tracks Homepage

Ad-hoc Video Search (AVS)

This track will evaluate video search engines on retrieving relevant video shots satisfying textual queries combining different facets such as people, actions, locations and objects.

The testing dataset is the V3C2 (Vimeo Creative Commons) with total 1.3 Million video shot and average duration of about 9 min. The task will test systems on the same 20 queries from 2024

Track coordinators:
George Awad, NIST

Track Web Page: https://www-nlpir.nist.gov/projects/tv2025/avs.html

V3C1 and V3C2 data agreement form: Click here to download, fill, sign, and submit

After submitting the data agreement form and receive back the access information, you can find the V3C1 training data using this LINK, and the V3C2 testing data via this link. Individual video urls, for V3C2 data, are available in this file

Testing topics (queries) are available HERE, and a description file for testing topics is available HERE

Biomedical Generative Retrieval (BioGen)

The track will evaluate technologies in the domain of biomedical documents retrieval. Specifically those with generative retrieval capabilities.

Documents including literature abstracts from the U.S. National Library of Medicine (MEDLINE) with over 30 million abstracts will be utilized.

Track Coordinators:
Deepak Gupta, NIH
Dina Demner-Fushman, NIH
Steven Bedrick, Oregon Health & Science University
Bill Hersh, Oregon Health & Science University
Kirk Roberts, UTHealth

Resources:

Track Web Page: https://trec-biogen.github.io/docs/
Task A topics
Task B topics

Change Detection

This track models an expert user following a topic of interest over time. The model of interaction follows an "inbox" or reading queue, with the goal to maximize importance and novelty in the queue.

Anticipated timeline: runs due end of September

Track Coordinators:
Kristine Rogers
David Grossman
John Frank
Peter Gantz
Megan Niemczyk

Track Web Page: TBD

Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)

The goal of this track is to support people's trustworthiness assessment of online news. There are two tasks: (1) Question Generation and (2) Report Generation. The Question Generation task focuses on detecting critical questions readers should consider during their trustworthiness assessment. Those questions should guide readers' investigation, such as the bias or motivations of the source and narratives from other sources. Meanwhile, the Report Generation task involves creating a well-attributed and comprehensive report that provides readers with the background and context they need to perform a more informed trustworthiness evaluation. Both tasks run in parallel, with the same submission due date. This track differs from traditional fact-checking by aiming to assist readers in making their trustworthiness assessments from a neutral perspective, helping them to form their own judgments rather than dictating conclusions.

Deadline: runs due August 15

Track Coordinators:
Mark Smucker, University of Waterloo
Charlie Clarke, University of Waterloo
Dake Zhang, University of Waterloo

Resources:

Track Web Page: https://trec-dragun.github.io
Topics on track webpage or here

Interactive Knowledge Assistance Track (iKAT)

iKAT is about conversational information seeking search. It is the successor to the Conversational Assistance Track (CAsT). TREC iKAT evolves CAsT to focus on supporting multi-path, multi-turn, multi-perspective conversations. That is for a given topic, the direction and the conversation that evolves depends not only on the prior responses but also on the user. Users are modeled with a knowledge base of prior knowledge, preferences, and constraints.

Deadline: runs due July 23

Track Coordinators:
Mohammed Aliannejadi, University of Amsterdam
Zahra Abbasiantaeb, University of Amsterdam
Simon Lupart, University of Amsterdam
Nailia Mirzakhmedova, Bauhaus-Universität Weimar
Marcel Gohsen, Bauhaus-Universität Weimar
Johannes Kiesel, GESIS - Leibniz Institute for the Social Sciences, Cologne

Resources

Track Web Page: https://trecikat.com. Mailing list: Google group, name: trec_ikat

Million LLM

Imagine that in the future LLM-powered generative tools abound, specialized for every kind of use. Given a user's query and a set of LLMs, rank the LLMs on the basis of their ability to answer the query correctly.

Anticipated timeline: runs due end of September

Track Coordinators:
Evangelos Kanoulas, University of Amsterdam
Jamie Callan, Carnegie Mellon University
Panagiotis Eustratiadis, University of Amsterdam
Mark Sanderson, RMIT University

Track Web Page: https://trec-mllm.github.io/

Product Search and Recommendation Track

The product search track focuses on IR tasks in the world of product search and discovery. This track seeks to understand what methods work best for product search, improve evaluation methodology, and provide a reusable dataset which allows easy benchmarking in a public forum.

This year the track is expanding to include a recommendation task.

Anticipated timeline: runs due end of August

Track Coordinators:
Daniel Campos, University of Illinois at Urbana-Champaign
Corby Rosset, Microsoft
Surya Kallumadi, Lowes
ChengXiang Zhai, University of Illinois at Urbana-Champaign
Sahiti Labhishetty, University of Illinois at Urbana-Champaign
Alessandro Magnani, Walmart

Resources

Track Web Page: https://trec-product-search.github.io/.
Collection and topics for the recommendation task (HuggingFace): here

Retrieval Augmented Generation (RAG)

The RAG track aims to enhance retrieval and generation effectiveness to focus on varied information needs in an evolving world. Data sources will include a large corpus and topics that capture long-form definitions, list, and ambiguous information needs.

The track will involve 2 subtasks:

1- Retrieval Task : Rank passages for a given queries
2- RAG Task : Generate answers with supporting passage attributes
The second task takes the primary focus of the track.

Anticipated timeline: runs due end of July

Track Coordinators:
Shivani Upadhyay, University of Waterloo
Ronak Pradeep, University of Waterloo
Nandan Thakur, University of Waterloo
Jimmy Lin, University of Waterloo
Nick Craswell, Microsoft

Track Web Page: https://trec-rag.github.io/.

RAG TREC Instrument for Multilingual Evaluation (RAGTIME) Track

In 2024, the NeuCLIR track piloted a Report Generation task, which you might think of as RAG for expert users and information analysts, as opposed to web searchers. The main task will be generating a report with citations, based on a retrieval of documents from a trilingual corpus of Russian, Chinese, and Arabic web news.

Anticipated timeline: Document collection available in February, submissions due in mid-to-late July.

Anticipated timeline: Dry run submissions due July 8, final submissions due Aug 14

Track Coordinators:
Dawn Lawrie, Johns Hopkins University
Sean MacAvaney, University of Glasgow
James Mayfield, Johns Hopkins University
Paul McNamee, Johns Hopkins University
Andrew Yates, Johns Hopkins University
Luca Soldaini, Allen Institute for AI
Eugene Yang, Johns Hopkins University

Resources

Track Web Page: TREC RAGTIME.
Mailing list: Google group, ragtime-participants
Task guidelines: Google Doc
Dry run topics: English Multilingual
Search API at JJHU for searching the collection.

Tip-of-the-Tongue Track

The Tip-of-the-Tongue (ToT) Track focuses on the known-item identification task where the searcher has previously experienced or consumed the item (e.g., a movie) but cannot recall a reliable identifier (i.e., "It's on the tip of my tongue..."). Unlike traditional ad-hoc keyword-based search, these information requests tend to be natural-language, verbose, and complex containing a wide variety of search strategies such as multi-hop reasoning, and frequently express uncertainty and suffer from false memories.

The primary task is ToT known-item search for new domains with ToT query elicitation (domains include landmarks, celebrities, movies, etc.)

Anticipated timeline: runs due August 31

Track Coordinators:
Jaime Arguello, University of North Carolina
Bhaskar Mitra, Microsoft Research
Fernando Diaz, Carnegie Mellon University
To Eun Kim, Carnegie Mellon University
Maik Fröbe, Friedrich-Schiller-Universität Jena

Track Web Page: https://trec-tot.github.io/. Twitter: @TREC_ToT
Mastodon: @[email protected]

Video Question Answering (VQA)

The Video Question Answering (VQA) Challenge aims to rigorously assess the capabilities of state-of-the-art multimodal models in understanding and reasoning about video content. Participants in this challenge will develop and test models that answer a diverse set of questions based on video segments, covering various levels of complexity, from factual retrieval to complex reasoning.

The testing data will comprise between 1500 to 2000 YouTube links

Anticipated timeline: runs due mid September

Track coordinators:
George Awad, NIST
Sanjay Purushotham, UMBC
Yvette Graham, UCD
Afzal Godil, NIST

Track Web Page:https://www-nlpir.nist.gov/projects/tv2025/vqa.html

VQA Training dataset : Please refer to external data resources HERE

VQA testing dataset : To be released according to the schedule

TREC

TREC 2025 Tracks

Ad-hoc Video Search (AVS)

Biomedical Generative Retrieval (BioGen)

Change Detection

Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)

Interactive Knowledge Assistance Track (iKAT)

Million LLM

Product Search and Recommendation Track

Retrieval Augmented Generation (RAG)

RAG TREC Instrument for Multilingual Evaluation (RAGTIME) Track

Tip-of-the-Tongue Track

Video Question Answering (VQA)

Policies and Links