Ad-hoc Video Search (AVS)
This track will evaluate video search engines on retrieving relevant video shots satisfying textual queries combining different facets such as people, actions, locations and objects.
The testing dataset is the V3C2 (Vimeo Creative Commons) with total 1.3 Million video shot and average duration of about 9 min. The task will test systems on the same 20 queries from 2024
Track coordinators:
George Awad, NIST
Track Web Page: https://www-nlpir.nist.gov/projects/tv2025/avs.html
V3C1 and V3C2 data agreement form: Click here to download, fill, sign, and submitAfter submitting the data agreement form and receive back the access information, you can find the V3C1 training data using this LINK, and the V3C2 testing data via this link. Individual video urls, for V3C2 data, are available in this file
Testing topics (queries) are available HERE, and a description file for testing topics is available HERE
Biomedical Generative Retrieval (BioGen)
The track will evaluate technologies in the domain of biomedical documents retrieval. Specifically those with generative retrieval capabilities.
Documents including literature abstracts from the U.S. National Library of Medicine (MEDLINE) with over 30 million abstracts will be utilized.
Track Coordinators:
Deepak Gupta, NIH
Dina Demner-Fushman, NIH
Steven Bedrick, Oregon Health & Science University
Bill Hersh, Oregon Health & Science University
Kirk Roberts, UTHealth
Resources:
- Track Web Page: https://trec-biogen.github.io/docs/
- Task A topics
- Task B topics
Change Detection
This track models an expert user following a topic of interest over time. The model of interaction follows an "inbox" or reading queue, with the goal to maximize importance and novelty in the queue.
Anticipated timeline: runs due end of September
Track Coordinators:
Kristine Rogers
David Grossman
John Frank
Peter Gantz
Megan Niemczyk
Track Web Page: TBD
Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
The goal of this track is to support people's trustworthiness assessment of online news. There are two tasks: (1) Question Generation and (2) Report Generation. The Question Generation task focuses on detecting critical questions readers should consider during their trustworthiness assessment. Those questions should guide readers' investigation, such as the bias or motivations of the source and narratives from other sources. Meanwhile, the Report Generation task involves creating a well-attributed and comprehensive report that provides readers with the background and context they need to perform a more informed trustworthiness evaluation. Both tasks run in parallel, with the same submission due date. This track differs from traditional fact-checking by aiming to assist readers in making their trustworthiness assessments from a neutral perspective, helping them to form their own judgments rather than dictating conclusions.
Deadline: runs due August 15
Track Coordinators:
Mark Smucker, University of Waterloo
Charlie Clarke, University of Waterloo
Dake Zhang, University of Waterloo
Resources:
- Track Web Page: https://trec-dragun.github.io
- Topics on track webpage or here
Interactive Knowledge Assistance Track (iKAT)
iKAT is about conversational information seeking search. It is the successor to the Conversational Assistance Track (CAsT). TREC iKAT evolves CAsT to focus on supporting multi-path, multi-turn, multi-perspective conversations. That is for a given topic, the direction and the conversation that evolves depends not only on the prior responses but also on the user. Users are modeled with a knowledge base of prior knowledge, preferences, and constraints.
Deadline: runs due July 23
Track Coordinators:
Mohammed Aliannejadi, University of Amsterdam
Zahra Abbasiantaeb, University of Amsterdam
Simon Lupart, University of Amsterdam
Nailia Mirzakhmedova, Bauhaus-Universität Weimar
Marcel Gohsen, Bauhaus-Universität Weimar
Johannes Kiesel, GESIS - Leibniz Institute for the Social Sciences, Cologne
Resources
- Track Web Page: https://trecikat.com. Mailing list: Google group, name: trec_ikat
Million LLM
Imagine that in the future LLM-powered generative tools abound, specialized for every kind of use. Given a user's query and a set of LLMs, rank the LLMs on the basis of their ability to answer the query correctly.
Anticipated timeline: runs due end of September
Track Coordinators:
Evangelos Kanoulas, University of Amsterdam
Jamie Callan, Carnegie Mellon University
Panagiotis Eustratiadis, University of Amsterdam
Mark Sanderson, RMIT University
Track Web Page: https://trec-mllm.github.io/
Product Search and Recommendation Track
The product search track focuses on IR tasks in the world of product search and discovery. This track seeks to understand what methods work best for product search, improve evaluation methodology, and provide a reusable dataset which allows easy benchmarking in a public forum.
This year the track is expanding to include a recommendation task.
Anticipated timeline: runs due end of August
Track Coordinators:
Daniel Campos, University of Illinois at Urbana-Champaign
Corby Rosset, Microsoft
Surya Kallumadi, Lowes
ChengXiang Zhai, University of Illinois at Urbana-Champaign
Sahiti Labhishetty, University of Illinois at Urbana-Champaign
Alessandro Magnani, Walmart
Resources
- Track Web Page: https://trec-product-search.github.io/.
- Collection and topics for the recommendation task (HuggingFace): here
Retrieval Augmented Generation (RAG)
The RAG track aims to enhance retrieval and generation effectiveness to focus on varied information needs in an evolving world. Data sources will include a large corpus and topics that capture long-form definitions, list, and ambiguous information needs.
The track will involve 2 subtasks:
1- Retrieval Task : Rank passages for a given queries2- RAG Task : Generate answers with supporting passage attributes
The second task takes the primary focus of the track.
Anticipated timeline: runs due end of July
Track Coordinators:
Shivani Upadhyay, University of Waterloo
Ronak Pradeep, University of Waterloo
Nandan Thakur, University of Waterloo
Jimmy Lin, University of Waterloo
Nick Craswell, Microsoft
Track Web Page: https://trec-rag.github.io/.
RAG TREC Instrument for Multilingual Evaluation (RAGTIME) Track
In 2024, the NeuCLIR track piloted a Report Generation task, which you might think of as RAG for expert users and information analysts, as opposed to web searchers. The main task will be generating a report with citations, based on a retrieval of documents from a trilingual corpus of Russian, Chinese, and Arabic web news.
Anticipated timeline: Document collection available in February, submissions due in mid-to-late July.
Anticipated timeline: Dry run submissions due July 8, final submissions due Aug 14
Track Coordinators:
Dawn Lawrie, Johns Hopkins University
Sean MacAvaney, University of Glasgow
James Mayfield, Johns Hopkins University
Paul McNamee, Johns Hopkins University
Andrew Yates, Johns Hopkins University
Luca Soldaini, Allen Institute for AI
Eugene Yang, Johns Hopkins University
- Track Web Page: TREC RAGTIME.
- Mailing list: Google group, ragtime-participants
- Task guidelines: Google Doc
- Dry run topics: English Multilingual
- Search API at JJHU for searching the collection.
Tip-of-the-Tongue Track
The Tip-of-the-Tongue (ToT) Track focuses on the known-item identification task where the searcher has previously experienced or consumed the item (e.g., a movie) but cannot recall a reliable identifier (i.e., "It's on the tip of my tongue..."). Unlike traditional ad-hoc keyword-based search, these information requests tend to be natural-language, verbose, and complex containing a wide variety of search strategies such as multi-hop reasoning, and frequently express uncertainty and suffer from false memories.
The primary task is ToT known-item search for new domains with ToT query elicitation (domains include landmarks, celebrities, movies, etc.)
Anticipated timeline: runs due August 31
Track Coordinators:
Jaime Arguello, University of North Carolina
Bhaskar Mitra, Microsoft Research
Fernando Diaz, Carnegie Mellon University
To Eun Kim, Carnegie Mellon University
Maik Fröbe, Friedrich-Schiller-Universität Jena
Track Web Page: https://trec-tot.github.io/.
Twitter: @TREC_ToT
Mastodon: @[email protected]
Video Question Answering (VQA)
The Video Question Answering (VQA) Challenge aims to rigorously assess the capabilities of state-of-the-art multimodal models in understanding and reasoning about video content. Participants in this challenge will develop and test models that answer a diverse set of questions based on video segments, covering various levels of complexity, from factual retrieval to complex reasoning.
The testing data will comprise between 1500 to 2000 YouTube links
Anticipated timeline: runs due mid September
Track coordinators:
George Awad, NIST
Sanjay Purushotham, UMBC
Yvette Graham, UCD
Afzal Godil, NIST
Track Web Page:https://www-nlpir.nist.gov/projects/tv2025/vqa.html
VQA Training dataset : Please refer to external data resources HERE
VQA testing dataset : To be released according to the schedule