CUET-qwen14B-v1 — Question Generation Task

Submission Details

Organization: CUET
Track: Detection, Retr., and Gen for Understanding News
Task: Question Generation Task
Date: 2025-08-08

Run Description

Is this run manual or automatic?: automatic
Is this run based on the provided starter kit?: no
Briefly describe this run: This run uses the unsloth/Qwen3-14B-unsloth-bnb-4bit large language model to generate 10 concise, ranked, and critical questions for each topic from the TREC 2025 dataset. The prompt is richly enhanced with two few-shot examples—one inspired by PolitiFact and the other by MBFC-style analysis—which train the model to emulate high-quality fact-checking strategies. The questions aim to assess news credibility, focusing on source bias, factual accuracy, omissions, and framing. LangChain's LLMChain handles inference through a HuggingFace pipeline with sampling enabled. Each article’s title and truncated body are passed through this chain, and output is cleaned using regex. A retry mechanism ensures quality (≥10 questions) with deduplication and padding if needed. Results are saved in a TREC-compatible TSV file CUET_run6.tsv.
What other datasets or services (e.g. Google/Bing web search, ChatGPT, Perplexity, etc.) were used in producing the run?: No external datasets or services such as Google, Bing, ChatGPT, or Perplexity were used. However, in the prompt template, the model is guided by two curated few-shot examples—one informed by PolitiFact’s approach to evidence-based fact-checking, and another inspired by MBFC-style media bias analysis. These examples simulate the reasoning process of expert fact-checkers and inform the model’s generation internally.
Briefly describe LLMs used for this run (optional): The language model used is unsloth/Qwen3-14B-unsloth-bnb-4bit, a 14-billion parameter variant of Qwen optimized by Unsloth for 4-bit inference. It supports RoPE scaling up to 2048 tokens, making it suitable for longer prompt contexts. HuggingFace’s transformers pipeline and LangChain’s LLMChain are used for interfacing. The model uses temperature-based sampling to enhance question diversity while keeping outputs well-structured. This model balances capacity (14B parameters) with efficient memory usage (bnb-4bit), enabling powerful yet cost-effective generation.
Please give this run a priority for inclusion in manual assessments.: 4

Evaluation Files

CUET-qwen14B-v1.dragun-qgen (dragun-qgen)

Paper