IDACCS_extract_4.1 — Report Generation Task

Document collection: ['English subset', 'Arabic subset', 'Chinese subset', 'Russian subset']
Machine translation of documents: ['Yes we used the organizer-provided machine translations']
Write a short description of your retrieval process: \item The organizers served to retrieve the top 30 documents using the background and problem statement as a query. \item We reranked the documents to get the top 10 using \texttt{mxbai-rerank-large-v1} on 10 sentence chunks with an overlap of 5, using a query generated by GPT-4o based on the title, background, and problem statement.
Write a short description of your generation process: \item An \texttt{occams} extractive summary of length twice the target length was produced, where the target length is 2500 for the 10000-long summaries (as the generation was done per language), and the target length was 4000 for the 2000-long summaries. \item GPT-4.1, with a prompt to either \begin{enumerate} \item form ``extracts'' of the extracts not to exceed the target length, was used to generate the report. \end{enumerate} \item Attribution was done using our ``blame'' method . We tested and found that \texttt{T5-base model} did as well as the \texttt{t5-large} model.
Which LLM(s) where used by your system?: GPT-4o, gpt-4.1
Open repository link: na
Assessing priority: 5