IDACCS_nugget_tb4.1 — Report Generation Task

Document collection: ['English subset', 'Arabic subset', 'Chinese subset', 'Russian subset']
Machine translation of documents: ['Yes we used the organizer-provided machine translations']
Write a short description of your retrieval process: \item The organizers served to retrieve the top 30 documents using the background and problem statement as a query. \item We reranked the documents to get the top 10 using \texttt{mxbai-rerank-large-v1} on 10 sentence chunks with an overlap of 5, using a query generated by GPT-4o based on the title, background, and problem statement.
Write a short description of your generation process: \item An \texttt{occams} extractive summary of length twice the target length was produced, where the target length is 2500 for the 10000-long summaries (as the generation was done per language), and the target length was 4000 for the 2000-long summaries. \item GPT-4.1, with a prompt to either \begin{enumerate} \item form ``nuggets'' not to exceed the target length, was used to generate the report. \end{enumerate} \item Attribution was done using our ``blame'' semantic similarity method
Which LLM(s) where used by your system?: GPT-4o, gpt-4.1
Open repository link: na
Assessing priority: 3