cru-ablR-conf_ — Report Generation Task

Submission Details

Organization: HLTCOE
Track: Detection, Retr., and Gen for Understanding News
Task: Report Generation Task
Date: 2025-08-20

Run Description

Is this run manual or automatic?: automatic
Is this run based on the provided starter kit?: yes
Briefly describe this run: Crucible@dragun Original run tag: strict-filtered-crucible-retrieved_docs-most_common-retrieved-reranker.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. Just check citation support, then rely on extraction confidence. Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 250 words. Created on 2025-08-20
What other datasets or services (e.g. Google/Bing web search, ChatGPT, Perplexity, etc.)were used in producing the run?: No external data set was used; except Claude was used via an LLM API.From the starter kit we only used the document ranking provided in the internal data file. We used up to 20 documents from the input document ranking 'llm_selected'.
Briefly describe LLMs used for this run (optional): For the most part we used Llama-3.3-70B Instruct (70B). Runs with name 'clod' or 'cloch' used Claude 4 sonnet for nugget generation.
Please give this run a priority for inclusion in manual assessments.: 2

Evaluation Files

cru-ablR-conf_.contradictory.dragun-repgen (contradictory.dragun-repgen)
cru-ablR-conf_.supportive.dragun-repgen (supportive.dragun-repgen)
cru-ablR-conf_.repgen_results (repgen_results)

Paper