TREC 2025 Proceedings

CUET-qwen14B-v2

Submission Details

Organization
CUET
Track
Detection, Retr., and Gen for Understanding News
Task
Question Generation Task
Date
2025-08-08

Run Description

Is this run manual or automatic?
automatic
Is this run based on the provided starter kit?
no
Briefly describe this run
This run uses the unsloth/Qwen3-14B-unsloth-bnb-4bit model to generate 10 concise and investigative questions per topic from the TREC 2025 dataset. The prompt is enhanced with two few-shot examples to guide the model in producing fact-check-style questions in the spirit of PolitiFact or MBFC. The questions aim to probe the trustworthiness and factual quality of a news article based on its title and truncated body (first 2000 characters). The model is invoked using LangChain’s LLMChain and a HuggingFace pipeline with sampling (temperature=0.7, do_sample=True) for diversity. A regex filter ensures only properly numbered and unique questions of up to 300 characters are accepted. A retry loop allows up to 3 attempts for quality control. If fewer than 10 valid questions are returned, the output is padded with "N/A". The final structured submission is saved in a TSV file named CUET_run7.tsv with fields: topic ID, team ID, run ID, question rank, and cleaned question text.
What other datasets or services (e.g. Google/Bing web search, ChatGPT, Perplexity, etc.) were used in producing the run?
No external datasets or services such as Google, Bing, ChatGPT, or Perplexity were used. Internally, the model was prompted with two embedded few-shot examples, modeled after PolitiFact and MBFC, to simulate expert fact-checking behavior. These are part of the prompt template and not loaded from external APIs or files.
Briefly describe LLMs used for this run (optional)
The model used is unsloth/Qwen3-14B-unsloth-bnb-4bit, a 14B parameter variant of Alibaba’s Qwen family, optimized by Unsloth for efficient inference via 4-bit quantization using bitsandbytes. The model is loaded via FastLanguageModel.from_pretrained() with automatic RoPE scaling and 2048 token max length. This configuration is optimized for speed and memory, making it feasible to run on limited resources while retaining the expressive power of a large transformer. The generation uses sampling (temperature=0.7) to introduce creative diversity within a structured few-shot prompt.
Please give this run a priority for inclusion in manual assessments.
3

Evaluation Files

Paper