CUET-qwen14B-v3 — Question Generation Task

Submission Details

Organization: CUET
Track: Detection, Retr., and Gen for Understanding News
Task: Question Generation Task
Date: 2025-08-08

Run Description

Is this run manual or automatic?: automatic
Is this run based on the provided starter kit?: no
Briefly describe this run: This run utilizes the unsloth/Qwen3-14B-unsloth-bnb-4bit model to generate 10 investigative and critical questions per topic from the TREC 2025 dataset. The questions are designed to help readers assess the credibility and bias of each article. The prompt includes two detailed few-shot examples modeled after PolitiFact and MBFC, guiding the model to focus on: Evidence and factual integrity Bias and one-sided reporting Missing viewpoints or counterarguments Language framing and sensationalism Conflicts of interest or affiliations LangChain’s LLMChain is used to wrap a HuggingFace text generation pipeline with settings that enable diverse outputs (temperature=0.6, top_p=0.9, do_sample=True, max_new_tokens=600). Each article’s body is truncated to the first 2000 characters to fit within the model’s 2048-token context window. A regex is used to extract properly formatted numbered questions up to 300 characters long. The model attempts up to 3 retries per topic to get at least 10 valid questions, padding with "N/A" if not enough are generated. The final output is saved in a tab-separated file named CUET_run8.tsv, with columns: topic ID, team ID, run ID, question rank, and cleaned question.
What other datasets or services (e.g. Google/Bing web search, ChatGPT, Perplexity, etc.) were used in producing the run?: No external datasets or services such as Google, Bing, ChatGPT, or Perplexity were used. However, in the prompt template, the model is guided by two curated few-shot examples—one informed by PolitiFact’s approach to evidence-based fact-checking, and another inspired by MBFC-style media bias analysis. These examples simulate the reasoning process of expert fact-checkers and inform the model’s generation internally.
Briefly describe LLMs used for this run (optional): The model used is unsloth/Qwen3-14B-unsloth-bnb-4bit, a quantized variant of Alibaba’s Qwen-14B model optimized by Unsloth for efficient inference in 4-bit precision (using bitsandbytes). It supports RoPE scaling, enabling long input sequences, and is auto-configured with suitable data types (dtype=None allows auto-detection for hardware like Tesla T4, V100, Ampere, etc.). The HuggingFace pipeline applies nucleus sampling (top_p=0.9) and a controlled creativity temperature (0.6) for balanced and non-repetitive question generation.
Please give this run a priority for inclusion in manual assessments.: 2

Evaluation Files

CUET-qwen14B-v3.dragun-qgen (dragun-qgen)

Paper