AMU1ENG — Report Generation Task

Document collection: ['English subset', 'Arabic subset', 'Chinese subset', 'Russian subset']
Machine translation of documents: ['Yes we used the organizer-provided machine translations']
Write a short description of your retrieval process: The entire retrieval phase was performed on a locally hosted vector database. During the process of inserting texts into the database, they were divided into chunks of approximately 5 to 10 sentences, depending on their length. In edge cases, the maximum chunk size was set to 8k characters. These chunks were then vectorized using the base model BAAI/BGE-m3 with GPU acceleration. Retrieval was conducted using a standard similarity search based on the cosine similarity score.
Write a short description of your generation process: The generation process was carried out using an external closed-source model accessed via API. For the report generation task, we created a dedicated prompt, which produced a report in JSON format. During the work, open-source models ranging in size from 32B to 235B were also analyzed and showed promising results. However, due to time constraints, it was decided to complete the task using a closed-source model. For the generation process, the top 15 retrieved chunks were provided as input.
Which LLM(s) where used by your system?: GPT5-mini
Open repository link: not-public
Assessing priority: 2