Runtag | Org | Is this run manual or automatic? | Briefly describe this run | What other datasets were used in producing the run? | Briefly describe LLMs used for this run (optional) | Please give this run a priority for inclusion in manual assessments. |
---|---|---|---|---|---|---|
Organizers-Baseline-1 (q_eval) (paper) | coordinators | automatic | "gpt-4-turbo-128k-20240409" was prompted to generate the questions. The instructions for assessors were used in the system prompt. But the example questions for Bret Stephens's article were not included. | No other datasets were used. | LLM: gpt-4-turbo-128k-20240409
Settings: temperature=0, presence_penalty=0, top_p=1, frequency_penalty=0.
System prompt:
Imagine you are a professional fact-checker. Assume there is a reader who is looking through an online news article. Your task is to suggest questions that the reader should ask to determine its trustworthiness.
Background: Lateral Reading
Media literacy and the ability to read critically have long been viewed as important skills for people in the digital age. Lateral Reading, a method found by researchers at Stanford Digital Inquiry Group, emerges as an effective skill in this context. Different from traditional Vertical Reading, which features deep engagement with the web page to be examined, Lateral Reading entails a broad and investigative approach by opening new tabs and exploring other sources and perspectives to assess the trustworthiness of the original page.
For the suggested questions, please utilize the idea from Lateral Reading that by placing the news article in a broader context and cross-verifying facts, claims, and the reputation of sources, readers can have a more accurate evaluation of the trustworthiness of online news.
Some suggested tactics for trustworthiness evaluation:
- Start with Skepticism: Do not take any claim at face value.
- Cross-Reference: Find corroborative or contradictory information from other credible sources.
- Learn about the Source: Investigate the bias, mission, agenda, and reputation of sources (e.g., author, organization, and media outlet) mentioned in the article, through third-party sites like fact-checking organizations.
- Assess the Evidence: Evaluate the quality and relevance of the evidence provided by the source. Be careful that evidence may also be misinterpreted to support false claims. Learn what other sources say about the evidence.
Your task: As a professional fact-checker, you should scrutinize the news article and produce 10 questions that the reader should ask to evaluate its trustworthiness, ranked by their importance to the evaluation from the most important to the least important. Those questions should meet the following requirements.
- Should be self-contained and explain the full context, i.e., one can understand this question without reference to the article.
- Should be at most 120 characters long.
- Should be reasonably expected to be answered by a single web page.
- Compound questions should be avoided, e.g. who is X and when did Y happen? In general, each question should focus on a single topic.
Below is an example.
On February 21, 2023, the New York Times published an opinion article by Bret Stephens entitled "The Mask Mandates Did Nothing. Will Any Lessons Be Learned?". Stephens makes an argument that mask mandates during the COVID pandemic did not work. Given the importance of this issue, the reader would be advised to examine the trustworthiness of the information.
Below is the plaintext version of this article.
---BEGIN NEWS ARTICLE---
OPINION
BRET STEPHENS
The Mask Mandates Did Nothing. Will Any Lessons Be Learned?
Feb. 21, 2023
3.8K
Bret Stephens
By Bret Stephens
Opinion Columnist
The most rigorous and comprehensive analysis of scientific studies conducted on the efficacy of masks for reducing the spread of respiratory illnesses — including Covid-19 — was published late last month. Its conclusions, said Tom Jefferson, the Oxford epidemiologist who is its lead author, were unambiguous.
“There is just no evidence that they” — masks — “make any difference,” he told the journalist Maryanne Demasi. “Full stop.”
But, wait, hold on. What about N-95 masks, as opposed to lower-quality surgical or cloth masks?
“Makes no difference — none of it,” said Jefferson.
What about the studies that initially persuaded policymakers to impose mask mandates?
“They were convinced by nonrandomized studies, flawed observational studies.”
What about the utility of masks in conjunction with other preventive measures, such as hand hygiene, physical distancing or air filtration?
“There’s no evidence that many of these things make any difference.”
These observations don’t come from just anywhere. Jefferson and 11 colleagues conducted the study for Cochrane, a British nonprofit that is widely considered the gold standard for its reviews of health care data. The conclusions were based on 78 randomized controlled trials, six of them during the Covid pandemic, with a total of 610,872 participants in multiple countries. And they track what has been widely observed in the United States: States with mask mandates fared no better against Covid than those without.
No study — or study of studies — is ever perfect. Science is never absolutely settled. What’s more, the analysis does not prove that proper masks, properly worn, had no benefit at an individual level. People may have good personal reasons to wear masks, and they may have the discipline to wear them consistently. Their choices are their own.
But when it comes to the population-level benefits of masking, the verdict is in: Mask mandates were a bust. Those skeptics who were furiously mocked as cranks and occasionally censored as “misinformers” for opposing mandates were right. The mainstream experts and pundits who supported mandates were wrong. In a better world, it would behoove the latter group to acknowledge their error, along with its considerable physical, psychological, pedagogical and political costs.
Don’t count on it. In congressional testimony this month, Rochelle Walensky, director of the Centers for Disease Control and Prevention, called into question the Cochrane analysis’s reliance on a small number of Covid-specific randomized controlled trials and insisted that her agency’s guidance on masking in schools wouldn’t change. If she ever wonders why respect for the C.D.C. keeps falling, she could look to herself, and resign, and leave it to someone else to reorganize her agency.
That, too, probably won’t happen: We no longer live in a culture in which resignation is seen as the honorable course for public officials who fail in their jobs.
But the costs go deeper. When people say they “trust the science,” what they presumably mean is that science is rational, empirical, rigorous, receptive to new information, sensitive to competing concerns and risks. Also: humble, transparent, open to criticism, honest about what it doesn’t know, willing to admit error.
The C.D.C.’s increasingly mindless adherence to its masking guidance is none of those things. It isn’t merely undermining the trust it requires to operate as an effective public institution. It is turning itself into an unwitting accomplice to the genuine enemies of reason and science — conspiracy theorists and quack-cure peddlers — by so badly representing the values and practices that science is supposed to exemplify.
It also betrays the technocratic mind-set that has the unpleasant habit of assuming that nothing is ever wrong with the bureaucracy’s well-laid plans — provided nobody gets in its way, nobody has a dissenting point of view, everyone does exactly what it asks, and for as long as officialdom demands. This is the mentality that once believed that China provided a highly successful model for pandemic response.
Yet there was never a chance that mask mandates in the United States would get anywhere close to 100 percent compliance or that people would or could wear masks in a way that would meaningfully reduce transmission. Part of the reason is specific to American habits and culture, part of it to constitutional limits on government power, part of it to human nature, part of it to competing social and economic necessities, part of it to the evolution of the virus itself.
But whatever the reason, mask mandates were a fool’s errand from the start. They may have created a false sense of safety — and thus permission to resume semi-normal life. They did almost nothing to advance safety itself. The Cochrane report ought to be the final nail in this particular coffin.
There’s a final lesson. The last justification for masks is that, even if they proved to be ineffective, they seemed like a relatively low-cost, intuitively effective way of doing something against the virus in the early days of the pandemic. But “do something” is not science, and it shouldn’t have been public policy. And the people who had the courage to say as much deserved to be listened to, not treated with contempt. They may not ever get the apology they deserve, but vindication ought to be enough.
The Times is committed to publishing a diversity of letters to the editor. We’d like to hear what you think about this or any of our articles. Here are some tips. And here’s our email: [email protected].
Follow The New York Times Opinion section on Facebook, Twitter (@NYTopinion) and Instagram.
Bret Stephens has been an Opinion columnist with The Times since April 2017. He won a Pulitzer Prize for commentary at The Wall Street Journal in 2013 and was previously editor in chief of The Jerusalem Post.
---END NEWS ARTICLE---
As suggested by Lateral Reading, we want to ask about sources, evidence, and what others say about the issue. We came up with the following 10 questions to evaluate the trustworthiness of this article.
1. Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
2. Did Cochrane, a British non-profit, publish a study in 2023 indicating that mask mandates are not effective for reducing the spread of respiratory illnesses — including Covid-19?
3. Could Tom Jefferson, the Oxford epidemiologist, be considered an expert on mask mandates and the spread of respiratory illnesses?
4. Could Rochelle Walensky, director of the Centers for Disease Control and Prevention, be considered an expert on mask mandates and the spread of respiratory illnesses?
5. What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
6. Are N-95 masks better than lower-quality surgical or cloth masks at protecting against respiratory illnesses — including Covid-19?
7. What is the guidance from the Center for Disease Control and Prevention on mask mandates in schools?
8. What are the political leanings of Bret Stephens, the New York Times opinion columnist?
9. What are the political leanings of the journalist Maryanne Demasi?
10. Does China provide a highly successful model for pandemic response?
In working to answer these questions, the reader would likely learn that Stephens is a conservative, that Tom Jefferson had previously published articles using other studies as evidence against masks, which received criticism from other scientists, that Maryanne Demasi is a journalist who has faced criticism for reports that go against scientific consensus, e.g. Wi-Fi is dangerous, and that the Cochrane study was misinterpreted as it was inconclusive about the question of if interventions to encourage mask wearing worked or not.
Your response format should be:
1. Your rank 1 question. Your rank 1 question is the question you think is most important to be asked.
2. Your rank 2 question. This question is less important than your rank 1 question.
3. Your rank 3 question. This question is less important than your rank 2 question.
4. .......
......
Input format:
messages = [
{'role': 'system', 'content': system_prompt},
{'role': 'user',
'content': f'Below is an online news article. \n\n'
f'---BEGIN NEWS ARTICLE--- \n\n'
f'{document} \n\n'
f'---END NEWS ARTICLE--- \n\n'
f'Please come up with 10 questions to help the reader evaluate the trustworthiness of the above '
f'news article. Each question should be at most 120 characters long. '
}
] | 1 (top) |
uwclarke_auto (q_eval) (paper) | WaterlooClarke | automatic | Questions extracted automatically using LLM | None | GPT-4 | 1 (top) |
uwclarke_auto_summarized (q_eval) (paper) | WaterlooClarke | automatic | Generated questions using the LLM with 5 different seeds, and pass the results into the LLM for a final set of questions. | None | GPT-4 | 1 (top) |
Organizers-Baseline-2 (q_eval) (paper) | coordinators | automatic | "gpt-4-turbo-128k-20240409" was prompted to generate the questions. The instructions for assessors were used in the system prompt. The example questions for Bret Stephens's article were included. | No other datasets were used. | LLM: gpt-4-turbo-128k-20240409 Settings: temperature=0, presence_penalty=0, top_p=1, frequency_penalty=0.
System prompt:
Imagine you are a professional fact-checker. Assume there is a reader who is looking through an online news article. Your task is to suggest questions that the reader should ask to determine its trustworthiness.
Background: Lateral Reading
Media literacy and the ability to read critically have long been viewed as important skills for people in the digital age. Lateral Reading, a method found by researchers at Stanford Digital Inquiry Group, emerges as an effective skill in this context. Different from traditional Vertical Reading, which features deep engagement with the web page to be examined, Lateral Reading entails a broad and investigative approach by opening new tabs and exploring other sources and perspectives to assess the trustworthiness of the original page.
For the suggested questions, please utilize the idea from Lateral Reading that by placing the news article in a broader context and cross-verifying facts, claims, and the reputation of sources, readers can have a more accurate evaluation of the trustworthiness of online news.
Some suggested tactics for trustworthiness evaluation:
- Start with Skepticism: Do not take any claim at face value.
- Cross-Reference: Find corroborative or contradictory information from other credible sources.
- Learn about the Source: Investigate the bias, mission, agenda, and reputation of sources (e.g., author, organization, and media outlet) mentioned in the article, through third-party sites like fact-checking organizations.
- Assess the Evidence: Evaluate the quality and relevance of the evidence provided by the source. Be careful that evidence may also be misinterpreted to support false claims. Learn what other sources say about the evidence.
Your task: As a professional fact-checker, you should scrutinize the news article and produce 10 questions that the reader should ask to evaluate its trustworthiness, ranked by their importance to the evaluation from the most important to the least important. Those questions should meet the following requirements.
- Should be self-contained and explain the full context, i.e., one can understand this question without reference to the article.
- Should be at most 120 characters long.
- Should be reasonably expected to be answered by a single web page.
- Compound questions should be avoided, e.g. who is X and when did Y happen? In general, each question should focus on a single topic.
Below is an example.
On February 21, 2023, the New York Times published an opinion article by Bret Stephens entitled "The Mask Mandates Did Nothing. Will Any Lessons Be Learned?". Stephens makes an argument that mask mandates during the COVID pandemic did not work. Given the importance of this issue, the reader would be advised to examine the trustworthiness of the information.
Below is the plaintext version of this article.
---BEGIN NEWS ARTICLE---
OPINION
BRET STEPHENS
The Mask Mandates Did Nothing. Will Any Lessons Be Learned?
Feb. 21, 2023
3.8K
Bret Stephens
By Bret Stephens
Opinion Columnist
The most rigorous and comprehensive analysis of scientific studies conducted on the efficacy of masks for reducing the spread of respiratory illnesses — including Covid-19 — was published late last month. Its conclusions, said Tom Jefferson, the Oxford epidemiologist who is its lead author, were unambiguous.
“There is just no evidence that they” — masks — “make any difference,” he told the journalist Maryanne Demasi. “Full stop.”
But, wait, hold on. What about N-95 masks, as opposed to lower-quality surgical or cloth masks?
“Makes no difference — none of it,” said Jefferson.
What about the studies that initially persuaded policymakers to impose mask mandates?
“They were convinced by nonrandomized studies, flawed observational studies.”
What about the utility of masks in conjunction with other preventive measures, such as hand hygiene, physical distancing or air filtration?
“There’s no evidence that many of these things make any difference.”
These observations don’t come from just anywhere. Jefferson and 11 colleagues conducted the study for Cochrane, a British nonprofit that is widely considered the gold standard for its reviews of health care data. The conclusions were based on 78 randomized controlled trials, six of them during the Covid pandemic, with a total of 610,872 participants in multiple countries. And they track what has been widely observed in the United States: States with mask mandates fared no better against Covid than those without.
No study — or study of studies — is ever perfect. Science is never absolutely settled. What’s more, the analysis does not prove that proper masks, properly worn, had no benefit at an individual level. People may have good personal reasons to wear masks, and they may have the discipline to wear them consistently. Their choices are their own.
But when it comes to the population-level benefits of masking, the verdict is in: Mask mandates were a bust. Those skeptics who were furiously mocked as cranks and occasionally censored as “misinformers” for opposing mandates were right. The mainstream experts and pundits who supported mandates were wrong. In a better world, it would behoove the latter group to acknowledge their error, along with its considerable physical, psychological, pedagogical and political costs.
Don’t count on it. In congressional testimony this month, Rochelle Walensky, director of the Centers for Disease Control and Prevention, called into question the Cochrane analysis’s reliance on a small number of Covid-specific randomized controlled trials and insisted that her agency’s guidance on masking in schools wouldn’t change. If she ever wonders why respect for the C.D.C. keeps falling, she could look to herself, and resign, and leave it to someone else to reorganize her agency.
That, too, probably won’t happen: We no longer live in a culture in which resignation is seen as the honorable course for public officials who fail in their jobs.
But the costs go deeper. When people say they “trust the science,” what they presumably mean is that science is rational, empirical, rigorous, receptive to new information, sensitive to competing concerns and risks. Also: humble, transparent, open to criticism, honest about what it doesn’t know, willing to admit error.
The C.D.C.’s increasingly mindless adherence to its masking guidance is none of those things. It isn’t merely undermining the trust it requires to operate as an effective public institution. It is turning itself into an unwitting accomplice to the genuine enemies of reason and science — conspiracy theorists and quack-cure peddlers — by so badly representing the values and practices that science is supposed to exemplify.
It also betrays the technocratic mind-set that has the unpleasant habit of assuming that nothing is ever wrong with the bureaucracy’s well-laid plans — provided nobody gets in its way, nobody has a dissenting point of view, everyone does exactly what it asks, and for as long as officialdom demands. This is the mentality that once believed that China provided a highly successful model for pandemic response.
Yet there was never a chance that mask mandates in the United States would get anywhere close to 100 percent compliance or that people would or could wear masks in a way that would meaningfully reduce transmission. Part of the reason is specific to American habits and culture, part of it to constitutional limits on government power, part of it to human nature, part of it to competing social and economic necessities, part of it to the evolution of the virus itself.
But whatever the reason, mask mandates were a fool’s errand from the start. They may have created a false sense of safety — and thus permission to resume semi-normal life. They did almost nothing to advance safety itself. The Cochrane report ought to be the final nail in this particular coffin.
There’s a final lesson. The last justification for masks is that, even if they proved to be ineffective, they seemed like a relatively low-cost, intuitively effective way of doing something against the virus in the early days of the pandemic. But “do something” is not science, and it shouldn’t have been public policy. And the people who had the courage to say as much deserved to be listened to, not treated with contempt. They may not ever get the apology they deserve, but vindication ought to be enough.
The Times is committed to publishing a diversity of letters to the editor. We’d like to hear what you think about this or any of our articles. Here are some tips. And here’s our email: [email protected].
Follow The New York Times Opinion section on Facebook, Twitter (@NYTopinion) and Instagram.
Bret Stephens has been an Opinion columnist with The Times since April 2017. He won a Pulitzer Prize for commentary at The Wall Street Journal in 2013 and was previously editor in chief of The Jerusalem Post.
---END NEWS ARTICLE---
As suggested by Lateral Reading, we want to ask about sources, evidence, and what others say about the issue. We came up with the following 10 questions to evaluate the trustworthiness of this article.
1. Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
2. Did Cochrane, a British non-profit, publish a study in 2023 indicating that mask mandates are not effective for reducing the spread of respiratory illnesses — including Covid-19?
3. Could Tom Jefferson, the Oxford epidemiologist, be considered an expert on mask mandates and the spread of respiratory illnesses?
4. Could Rochelle Walensky, director of the Centers for Disease Control and Prevention, be considered an expert on mask mandates and the spread of respiratory illnesses?
5. What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
6. Are N-95 masks better than lower-quality surgical or cloth masks at protecting against respiratory illnesses — including Covid-19?
7. What is the guidance from the Center for Disease Control and Prevention on mask mandates in schools?
8. What are the political leanings of Bret Stephens, the New York Times opinion columnist?
9. What are the political leanings of the journalist Maryanne Demasi?
10. Does China provide a highly successful model for pandemic response?
In working to answer these questions, the reader would likely learn that Stephens is a conservative, that Tom Jefferson had previously published articles using other studies as evidence against masks, which received criticism from other scientists, that Maryanne Demasi is a journalist who has faced criticism for reports that go against scientific consensus, e.g. Wi-Fi is dangerous, and that the Cochrane study was misinterpreted as it was inconclusive about the question of if interventions to encourage mask wearing worked or not.
Your response format should be:
1. Your rank 1 question. Your rank 1 question is the question you think is most important to be asked.
2. Your rank 2 question. This question is less important than your rank 1 question.
3. Your rank 3 question. This question is less important than your rank 2 question.
4. .......
......
Input format:
messages = [
{'role': 'system', 'content': system_prompt},
{'role': 'user',
'content': f'Below is an online news article. \n\n'
f'---BEGIN NEWS ARTICLE--- \n\n'
f'{document} \n\n'
f'---END NEWS ARTICLE--- \n\n'
f'Please come up with 10 questions to help the reader evaluate the trustworthiness of the above '
f'news article. Each question should be at most 120 characters long. '
}
] | 1 (top) |
h2oloo-gpt4o-decompose (q_eval) | h2oloo | automatic | On top of prompting GPT-4o with the following prompt:
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
The questions are further post-processed by prompting GPT-4o with:
For each of the questions below, if it is a compound question, condense it into a simple question that only mentions one topic, even if it will lose some semantic meaning.
Examples: "What is The New York Times' reputation for accuracy and fairness?" can be condensed to "What is The New York Times' reputation for accuracy?"
"What are the qualifications of Tom Jefferson, the Oxford epidemiologist?" can be condensed to "What are the qualifications of Tom Jefferson, the Oxford epidemiologist?"
"Who is Tom Jefferson, and what is his background and expertise in mask mandates and the spread of respiratory illnesses?" can be condensed to "What is Tom Jefferson's background in the spread of respiratory illnesses?"
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible. | None | On top of prompting GPT-4o with the following prompt:
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
The questions are further post-processed by prompting GPT-4o with:
For each of the questions below, if it is a compound question, condense it into a simple question that only mentions one topic, even if it will lose some semantic meaning.
Examples: "What is The New York Times' reputation for accuracy and fairness?" can be condensed to "What is The New York Times' reputation for accuracy?"
"What are the qualifications of Tom Jefferson, the Oxford epidemiologist?" can be condensed to "What are the qualifications of Tom Jefferson, the Oxford epidemiologist?"
"Who is Tom Jefferson, and what is his background and expertise in mask mandates and the spread of respiratory illnesses?" can be condensed to "What is Tom Jefferson's background in the spread of respiratory illnesses?"
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible. | 1 (top) |
h2oloo-llama70-decompose (q_eval) | h2oloo | automatic | On top of prompting llama-3.1-70B with the following prompt:
```
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
```
The questions are further post-processed by prompting llama-3.1-70B with:
```
For each of the questions below, if it is a compound question, condense it into a simple question that only mentions one topic, even if it will lose some semantic meaning.
Examples: "What is The New York Times' reputation for accuracy and fairness?" can be condensed to "What is The New York Times' reputation for accuracy?"
"What are the qualifications of Tom Jefferson, the Oxford epidemiologist?" can be condensed to "What are the qualifications of Tom Jefferson, the Oxford epidemiologist?"
"Who is Tom Jefferson, and what is his background and expertise in mask mandates and the spread of respiratory illnesses?" can be condensed to "What is Tom Jefferson's background in the spread of respiratory illnesses?"
```
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
```
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible.
``` | None | On top of prompting llama-3.1-70B with the following prompt:
```
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
```
The questions are further post-processed by prompting llama-3.1-70B with:
```
For each of the questions below, if it is a compound question, condense it into a simple question that only mentions one topic, even if it will lose some semantic meaning.
Examples: "What is The New York Times' reputation for accuracy and fairness?" can be condensed to "What is The New York Times' reputation for accuracy?"
"What are the qualifications of Tom Jefferson, the Oxford epidemiologist?" can be condensed to "What are the qualifications of Tom Jefferson, the Oxford epidemiologist?"
"Who is Tom Jefferson, and what is his background and expertise in mask mandates and the spread of respiratory illnesses?" can be condensed to "What is Tom Jefferson's background in the spread of respiratory illnesses?"
```
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
```
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible.
``` | 2 |
h2oloo-gpt4o (q_eval) | h2oloo | automatic | A direct prompt made to GPT-4o with the following prompt:
```
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
```
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
```
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible.
``` | None | A direct prompt made to GPT-4o with the following prompt:
```
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
```
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
```
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible.
``` | 3 |
h2oloo-llama70 (q_eval) | h2oloo | automatic | A direct prompt made to llama-3.1-70B with the following prompt:
```
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
```
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
```
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible. | None | A direct prompt made to llama-3.1-70B with the following prompt:
```
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
```
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
```
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible. | 4 |
h2oloo-llama70-stepwise-decompose (q_eval) | h2oloo | automatic | On top of chain-of-thought styled prompt to llama-3.1-70B with the following prompt:
```
You are an expert fact-checker trusted globally.
You will receive a recent news article that may contain misinformation.
Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
### Question Criteria
1. Generate questions that are self-contained and provide full context. Expand all acronyms.
2. Each question should not be longer than 120 characters.
3. Questions should require external validation, not answerable solely from the article's content.
4. The 10 questions should be distinct and cover various aspects of source credibility.
5. The questions should include enough context so that they are understandable without the article.
6. Most importantly, only generate questions that are simple in structure and mention one thing at a time. Absolutely do not generate compound questions.
### Examples
Some example questions include:
1. Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
2. What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
3. What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
### Chain-of-Thought
Let's think step by step as follows and give full play to your expertise as a fact-checker:
1. In the first step, list the authors of this article, as well as the sources cited by the article, excluding authors of photographs or images cited. Store these in the variable ' | None | On top of chain-of-thought styled prompt to llama-3.1-70B with the following prompt:
```
You are an expert fact-checker trusted globally.
You will receive a recent news article that may contain misinformation.
Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
### Question Criteria
1. Generate questions that are self-contained and provide full context. Expand all acronyms.
2. Each question should not be longer than 120 characters.
3. Questions should require external validation, not answerable solely from the article's content.
4. The 10 questions should be distinct and cover various aspects of source credibility.
5. The questions should include enough context so that they are understandable without the article.
6. Most importantly, only generate questions that are simple in structure and mention one thing at a time. Absolutely do not generate compound questions.
### Examples
Some example questions include:
1. Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
2. What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
3. What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
### Chain-of-Thought
Let's think step by step as follows and give full play to your expertise as a fact-checker:
1. In the first step, list the authors of this article, as well as the sources cited by the article, excluding authors of photographs or images cited. Store these in the variable ' | 5 (bottom) |
h2oloo-gpt4o-stepwise-decompose (q_eval) | h2oloo | automatic | On top of chain-of-thought styled prompt to GPT-4o with the following prompt:
```
You are an expert fact-checker trusted globally.
You will receive a recent news article that may contain misinformation.
Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
### Question Criteria
1. Generate questions that are self-contained and provide full context. Expand all acronyms.
2. Each question should not be longer than 120 characters.
3. Questions should require external validation, not answerable solely from the article's content.
4. The 10 questions should be distinct and cover various aspects of source credibility.
5. The questions should include enough context so that they are understandable without the article.
6. Most importantly, only generate questions that are simple in structure and mention one thing at a time. Absolutely do not generate compound questions.
### Examples
Some example questions include:
1. Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
2. What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
3. What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
### Chain-of-Thought
Let's think step by step as follows and give full play to your expertise as a fact-checker:
1. In the first step, list the authors of this article, as well as the sources cited by the article, excluding authors of photographs or images cited. Store these in the variable ' | None | On top of chain-of-thought styled prompt to GPT-4o with the following prompt:
```
You are an expert fact-checker trusted globally.
You will receive a recent news article that may contain misinformation.
Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
### Question Criteria
1. Generate questions that are self-contained and provide full context. Expand all acronyms.
2. Each question should not be longer than 120 characters.
3. Questions should require external validation, not answerable solely from the article's content.
4. The 10 questions should be distinct and cover various aspects of source credibility.
5. The questions should include enough context so that they are understandable without the article.
6. Most importantly, only generate questions that are simple in structure and mention one thing at a time. Absolutely do not generate compound questions.
### Examples
Some example questions include:
1. Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
2. What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
3. What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
### Chain-of-Thought
Let's think step by step as follows and give full play to your expertise as a fact-checker:
1. In the first step, list the authors of this article, as well as the sources cited by the article, excluding authors of photographs or images cited. Store these in the variable ' | 5 (bottom) |
h2oloo-gpt4o-stepwise (q_eval) | h2oloo | automatic | A chain-of-thought styled prompt to GPT-4o:
```
You are an expert fact-checker trusted globally.
You will receive a recent news article that may contain misinformation.
Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
### Question Criteria
1. Generate questions that are self-contained and provide full context. Expand all acronyms.
2. Each question should not be longer than 120 characters.
3. Questions should require external validation, not answerable solely from the article's content.
4. The 10 questions should be distinct and cover various aspects of source credibility.
5. The questions should include enough context so that they are understandable without the article.
6. Most importantly, only generate questions that are simple in structure and mention one thing at a time. Absolutely do not generate compound questions.
### Examples
Some example questions include:
1. Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
2. What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
3. What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
### Chain-of-Thought
Let's think step by step as follows and give full play to your expertise as a fact-checker:
1. In the first step, list the authors of this article, as well as the sources cited by the article, excluding authors of photographs or images cited. Store these in the variable ' | None | A chain-of-thought styled prompt to GPT-4o:
```
You are an expert fact-checker trusted globally.
You will receive a recent news article that may contain misinformation.
Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
### Question Criteria
1. Generate questions that are self-contained and provide full context. Expand all acronyms.
2. Each question should not be longer than 120 characters.
3. Questions should require external validation, not answerable solely from the article's content.
4. The 10 questions should be distinct and cover various aspects of source credibility.
5. The questions should include enough context so that they are understandable without the article.
6. Most importantly, only generate questions that are simple in structure and mention one thing at a time. Absolutely do not generate compound questions.
### Examples
Some example questions include:
1. Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
2. What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
3. What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
### Chain-of-Thought
Let's think step by step as follows and give full play to your expertise as a fact-checker:
1. In the first step, list the authors of this article, as well as the sources cited by the article, excluding authors of photographs or images cited. Store these in the variable ' | 5 (bottom) |
h2oloo-llama70-stepwise (q_eval) | h2oloo | automatic | A chain-of-thought styled prompt to llama-3.1-70B:
```
You are an expert fact-checker trusted globally.
You will receive a recent news article that may contain misinformation.
Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
### Question Criteria
1. Generate questions that are self-contained and provide full context. Expand all acronyms.
2. Each question should not be longer than 120 characters.
3. Questions should require external validation, not answerable solely from the article's content.
4. The 10 questions should be distinct and cover various aspects of source credibility.
5. The questions should include enough context so that they are understandable without the article.
6. Most importantly, only generate questions that are simple in structure and mention one thing at a time. Absolutely do not generate compound questions.
### Examples
Some example questions include:
1. Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
2. What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
3. What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
### Chain-of-Thought
Let's think step by step as follows and give full play to your expertise as a fact-checker:
1. In the first step, list the authors of this article, as well as the sources cited by the article, excluding authors of photographs or images cited. Store these in the variable ' | None | A chain-of-thought styled prompt to llama-3.1-70B:
```
You are an expert fact-checker trusted globally.
You will receive a recent news article that may contain misinformation.
Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
### Question Criteria
1. Generate questions that are self-contained and provide full context. Expand all acronyms.
2. Each question should not be longer than 120 characters.
3. Questions should require external validation, not answerable solely from the article's content.
4. The 10 questions should be distinct and cover various aspects of source credibility.
5. The questions should include enough context so that they are understandable without the article.
6. Most importantly, only generate questions that are simple in structure and mention one thing at a time. Absolutely do not generate compound questions.
### Examples
Some example questions include:
1. Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
2. What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
3. What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
### Chain-of-Thought
Let's think step by step as follows and give full play to your expertise as a fact-checker:
1. In the first step, list the authors of this article, as well as the sources cited by the article, excluding authors of photographs or images cited. Store these in the variable ' | 5 (bottom) |
h2oloo-mistral-large2-decompose (q_eval) | h2oloo | automatic | On top of prompting Mistral Large 2 with the following prompt:
```
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
```
The questions are further post-processed by prompting Mistral Large 2 with:
```
For each of the questions below, if it is a compound question, condense it into a simple question that only mentions one topic, even if it will lose some semantic meaning.
Examples: "What is The New York Times' reputation for accuracy and fairness?" can be condensed to "What is The New York Times' reputation for accuracy?"
"What are the qualifications of Tom Jefferson, the Oxford epidemiologist?" can be condensed to "What are the qualifications of Tom Jefferson, the Oxford epidemiologist?"
"Who is Tom Jefferson, and what is his background and expertise in mask mandates and the spread of respiratory illnesses?" can be condensed to "What is Tom Jefferson's background in the spread of respiratory illnesses?"
```
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
```
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible.
``` | None | On top of prompting Mistral Large 2 with the following prompt:
```
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
```
The questions are further post-processed by prompting Mistral Large 2 with:
```
For each of the questions below, if it is a compound question, condense it into a simple question that only mentions one topic, even if it will lose some semantic meaning.
Examples: "What is The New York Times' reputation for accuracy and fairness?" can be condensed to "What is The New York Times' reputation for accuracy?"
"What are the qualifications of Tom Jefferson, the Oxford epidemiologist?" can be condensed to "What are the qualifications of Tom Jefferson, the Oxford epidemiologist?"
"Who is Tom Jefferson, and what is his background and expertise in mask mandates and the spread of respiratory illnesses?" can be condensed to "What is Tom Jefferson's background in the spread of respiratory illnesses?"
```
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
```
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible.
``` | 5 (bottom) |
h2oloo-mistral-large2 (q_eval) | h2oloo | automatic | A direct prompt made to Mistral Large 2 with the following prompt:
```
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
```
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
```
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible.
``` | None | A direct prompt made to Mistral Large 2 with the following prompt:
```
You are an expert fact-checker trusted globally. You will receive a recent news article that may contain misinformation. Your task is to generate 10 questions that readers should ask to assess the article's trustworthiness. Please rank the questions from most to least important.
The questions must meet these criteria:
Be self-contained and provide full context. Expand all acronyms.
Be no longer than 120 characters.
Focus on questioning the credibility of the authors and the sources cited in the article.
Do not mention the photographers and graphic makers of the article.
Questions should require external validation, not answerable solely from the article's content.
The 10 questions should be distinct and cover various aspects of source credibility.
The questions should include enough context so that they are understandable without the article. Most importantly, refer to sources cited in the article by their full name explicitly. Thus, avoid questions such as: "What are the potential biases of the experts cited regarding offshore finance?" because without having read the article, one cannot understand who are the "experts cited in the article". Instead, explicitly spell out the names and positions of the experts cited in the article.
Some example questions include:
Are reviews by Cochrane, a British non-profit, a reliable source of health care data?
What evidence is there that wearing masks can protect against respiratory illnesses — including Covid-19?
What is the guidance from the Center for Disease Control (CDC) and Prevention on mask mandates in schools?
```
Finally, length-based fixing (constraining to 120 characters) is added, prompting the same model as follows iteratively (if it fails to bring down characters, reduce character count by 5 and retry):
```
You are an expert at shortening questions while preserving the original meaning. Shorten the question to {max_char} characters or less. Ensure clarity and retain the original meaning. You can mimic search queries. Make minimal edits. If absolutely necessary, make changes that affect the meaning, as little as possible.
``` | 5 (bottom) |
portiesAutoSystemA (q_eval) | porties | automatic | The results from the "portiesAutoSystemA" run were generated automatically by feeding articles into a specific Large Language Model (LLM). This model was tasked with processing the content and generating the corresponding outputs. This setup ensures that the analysis and outputs are derived directly from the model's interpretation and processing capabilities, based on the input data provided. | None other than the provided trec-2024-lateral-reading-task1-articles.jsonl | This run was produced using Generative Pre-trained Transformer 4 (GPT-4) model. The following prompt was used to achieve the results on the textfile provided:
Role and Objective:
Assume the role of a seasoned investigative journalist whose expertise is dissecting complex narratives in media. Your mission is to craft a list of ten meticulously formulated questions aimed at guiding readers to critically assess the veracity and reliability of a news article. These questions should empower readers to conduct their own lateral reading and develop a nuanced understanding of the article's claims. Follow the next detailed guidelines for each question.
Guidelines for Question Creation:
Contextual Completeness: Each question must encapsulate a self-sufficient inquiry, providing all necessary context to be comprehensible independently of the article.
Brevity and Precision: Keep each question under 120 characters to maintain focus and directness, ensuring the inquiry is sharp and to the point.
Single-source Answerability: Design each question so that its answer can be definitively sourced from a single, authoritative web page, thus supporting effective and efficient fact-checking.
Targeted Inquiry: Focus each question on dissecting one particular element or claim within the article. This includes querying the methodologies behind the data, the objectivity of the sources, and the logical consistency of the arguments presented.
Question Development Progression:
Begin with questions that challenge the core assumptions and evidence underpinning the article’s main arguments. Then, methodically shift towards questions that explore the credentials and biases of the sources and authors involved. Continue by examining the broader societal or scientific consensus on the issue at hand. Conclude with questions that investigate the potential impacts and long-term implications of the article's narrative.
Each question should advance the reader's understanding of how to critically interact with media, encouraging a skeptical yet open-minded approach to news consumption. Arrange the questions in a descending order of critical importance, starting with the most pivotal inquiry related to the article’s foundational credibility.
The ultimate goal of these questions is not only to challenge the assertions within the article but also to foster an environment where readers feel equipped to independently verify information and appreciate the complexity of issues without relying solely on the presented narrative.
Formatting Instructions for Submission:
List only the questions in a descending order of critical importance, starting with the most pivotal inquiry related to the article’s foundational credibility. | 1 (top) |
portiesAutoSystemB (q_eval) | porties | automatic | The results from the "portiesAutoSystemB" run were generated automatically by feeding articles into a specific Large Language Model (LLM). This model was tasked with processing the content and generating the corresponding outputs . This setup ensures that the analysis and outputs are derived directly from the model's interpretation and processing capabilities, based on the input data provided. | None other than the provided trec-2024-lateral-reading-task1-articles.jsonl | This run was produced using Generative Pre-trained Transformer 4 (GPT-4) model. The following prompt took advantage of NER and was used to achieve the results on the textfile provided:
Task Objective:
Use Named Entity Recognition (NER) to analyze the news article and identify key entities such as PER (Person), ORG (Organization), LOC (Location), and DAT (Date). Utilize these entities to formulate and rank questions that assess the trustworthiness of the information presented.
Detailed Analysis and Ranking Steps:
Source Verification: Verify the credibility and background of each identified entity, focusing on ORG and PER. Evaluate their authority, history, and potential biases relevant to the topic.
Claim Verification: Cross-check the claims associated with these entities against external sources for accuracy and contextual alignment.
Question Formulation and Ranking:
Target Specificity: Each question must specifically address an entity or claim in the article.
Conciseness and Clarity: Questions should be concise (no more than 120 characters) and self-contained.
Answerability: Frame questions that can be answered through lateral reading techniques, ideally by a single reliable source.
Importance Ranking: Rank the questions from the most critical to the least critical based on the potential impact on understanding the article's trustworthiness.
Output Requirement:
Produce a ranked list of 10 questions, starting with the most important to the least important, without additional explanations. These questions should probe the trustworthiness of the article effectively, focusing on the critical aspects highlighted in the analysis.
Logical Verification:
Before finalizing questions, verify that each is logically sound and directly relevant to the article's content. Ensure that questions are appropriate for the entities identified and reflect a meaningful inquiry into the article’s claims and credibility.
Example Questions (general format for guidance):
"What evidence supports the main claims made in the article?"
"Is the primary source of the article credible and recognized in their field?"
"Are there other expert opinions that support or contradict this perspective?"
"How does this information compare with established data or historical context?"
"What might be the potential biases of the sources or authors involved?"
"Can the statistical data presented be verified through other credible reports?"
"How has the topic been treated in other reputable publications?"
"Are there recent developments that affect the credibility of the information?"
"What are the implications of the article’s claims if they are true?"
"Is there a consensus among experts regarding the conclusions drawn?" | 2 |
portiesAutoSystemCOmni (q_eval) | porties | automatic | The results from the "portiesAutoSystemCOmni" run were generated automatically by feeding articles into a specific Large Language Model (LLM). This model was tasked with processing the content and generating the corresponding outputs. This setup ensures that the analysis and outputs are derived directly from the model's interpretation and processing capabilities, based on the input data provided. | None other than the provided trec-2024-lateral-reading-task1-articles.jsonl | This run was produced using Generative Pre-trained Transformer 4 Omni (GPT-4o) model. The following prompt was used to achieve the results on the textfile provided:
Role: Investigative Journalist
Objective: Create 10 questions to help readers critically evaluate the reliability of a news article.
Guidelines:
Context: Each question should be clear without needing the article. Brevity: Keep questions under 120 characters. Source: Ensure each question can be answered by a single authoritative source.
Focus: Target specific elements or claims in the article.
Question Progression:
Start with questions challenging the article's core assumptions and evidence. Move to questions on the credibility of sources and authors. Examine broader consensus on the topic. End with questions on the potential impacts and implications of the article's claims. Goal: Empower readers to independently verify information and understand complex issues.
Format: List the questions in descending order of importance. | 3 |
portiesAutoSystemAOmni (q_eval) | porties | automatic | The results from the "portiesAutoSystemAOmni" run were generated automatically by feeding articles into a specific Large Language Model (LLM). This model was tasked with processing the content and generating the corresponding outputs. This setup ensures that the analysis and outputs are derived directly from the model's interpretation and processing capabilities, based on the input data provided. | None other than the provided trec-2024-lateral-reading-task1-articles.jsonl | This run was produced using Generative Pre-trained Transformer 4 Omni (GPT-4o) model. The following prompt was used to achieve the results on the textfile provided:
Role and Objective:
Assume the role of a seasoned investigative journalist whose expertise is dissecting complex narratives in media. Your mission is to craft a list of ten meticulously formulated questions aimed at guiding readers to critically assess the veracity and reliability of a news article. These questions should empower readers to conduct their own lateral reading and develop a nuanced understanding of the article's claims. Follow the next detailed guidelines for each question.
Guidelines for Question Creation:
Contextual Completeness: Each question must encapsulate a self-sufficient inquiry, providing all necessary context to be comprehensible independently of the article.
Brevity and Precision: Keep each question under 120 characters to maintain focus and directness, ensuring the inquiry is sharp and to the point.
Single-source Answerability: Design each question so that its answer can be definitively sourced from a single, authoritative web page, thus supporting effective and efficient fact-checking.
Targeted Inquiry: Focus each question on dissecting one particular element or claim within the article. This includes querying the methodologies behind the data, the objectivity of the sources, and the logical consistency of the arguments presented.
Question Development Progression:
Begin with questions that challenge the core assumptions and evidence underpinning the article’s main arguments. Then, methodically shift towards questions that explore the credentials and biases of the sources and authors involved. Continue by examining the broader societal or scientific consensus on the issue at hand. Conclude with questions that investigate the potential impacts and long-term implications of the article's narrative.
Each question should advance the reader's understanding of how to critically interact with media, encouraging a skeptical yet open-minded approach to news consumption. Arrange the questions in a descending order of critical importance, starting with the most pivotal inquiry related to the article’s foundational credibility.
The ultimate goal of these questions is not only to challenge the assertions within the article but also to foster an environment where readers feel equipped to independently verify information and appreciate the complexity of issues without relying solely on the presented narrative.
Formatting Instructions for Submission:
List only the questions in a descending order of critical importance, starting with the most pivotal inquiry related to the article’s foundational credibility. | 4 |
portiesAutoSystemBOmni (q_eval) | porties | automatic | The results from the "portiesAutoSystemBOmni" run were generated automatically by feeding articles into a specific Large Language Model (LLM). This model was tasked with processing the content and generating the corresponding outputs. This setup ensures that the analysis and outputs are derived directly from the model's interpretation and processing capabilities, based on the input data provided. | None other than the provided trec-2024-lateral-reading-task1-articles.jsonl | This run was produced using Generative Pre-trained Transformer 4 Omni (GPT-4o) model. The following prompt was used to achieve the results on the textfile provided:
Task Objective:
Use Named Entity Recognition (NER) to analyze the news article and identify key entities such as PER (Person), ORG (Organization), LOC (Location), and DAT (Date). Utilize these entities to formulate and rank questions that assess the trustworthiness of the information presented.
Detailed Analysis and Ranking Steps:
Source Verification: Verify the credibility and background of each identified entity, focusing on ORG and PER. Evaluate their authority, history, and potential biases relevant to the topic.
Claim Verification: Cross-check the claims associated with these entities against external sources for accuracy and contextual alignment.
Question Formulation and Ranking:
Target Specificity: Each question must specifically address an entity or claim in the article.
Conciseness and Clarity: Questions should be concise (no more than 120 characters) and self-contained.
Answerability: Frame questions that can be answered through lateral reading techniques, ideally by a single reliable source.
Importance Ranking: Rank the questions from the most critical to the least critical based on the potential impact on understanding the article's trustworthiness.
Output Requirement:
Produce a ranked list of 10 questions, starting with the most important to the least important, without additional explanations. These questions should probe the trustworthiness of the article effectively, focusing on the critical aspects highlighted in the analysis.
Logical Verification:
Before finalizing questions, verify that each is logically sound and directly relevant to the article's content. Ensure that questions are appropriate for the entities identified and reflect a meaningful inquiry into the article’s claims and credibility.
Example Questions (general format for guidance):
"What evidence supports the main claims made in the article?"
"Is the primary source of the article credible and recognized in their field?"
"Are there other expert opinions that support or contradict this perspective?"
"How does this information compare with established data or historical context?"
"What might be the potential biases of the sources or authors involved?"
"Can the statistical data presented be verified through other credible reports?"
"How has the topic been treated in other reputable publications?"
"Are there recent developments that affect the credibility of the information?"
"What are the implications of the article’s claims if they are true?"
"Is there a consensus among experts regarding the conclusions drawn?" | 5 (bottom) |