TREC 2024 (33rd Text REtrieval Conference)

Runtag	Org	Describe what, if any, manual intervention that was done to produce the run in response to the test data	Which base models did this run use -- e.g., GPT-4, Llama 13B, etc.?	What data, if any, did this run use for training -- e.g., PLABA dataset (Attal 2023), etc.?	Briefly describe salient features of this run, including what distinguishes it from your other runs.	Please give this run a priority for inclusion in manual assessments.
UAms-ConBART-Cochrane (paper)	UAmsterdam	None.	Plan guided ConBART (Contextual BART) model trained on Cochrane plain English summaries.	Cochrane plain English summaries.	Contextual BART model trained as planned guided simplification model, using with a "rephrase" instruction for each sentence.	1 (top)
UAms-BART-Cochrane (paper)	UAmsterdam	None.	Plan guided BART (Sentence BART) model trained on Cochrane plain English summaries.	Cochrane plain English summaries.	Sentence BART model trained as planned guided simplification model, using with a "rephrase" instruction for each sentence.	2
gpt35_dspy	Yseop		gpt-35-turbo-16k (via azure)	PLABA dataset (Attal 2023)	prompt engg using dspy	1 (top)
bart_base_ft	Yseop		facebook/bart-base	PLABA dataset (Attal 2023)	high sari score but mediocre quality sentences	2
plaba_um_fhs_sub1 (paper)	um_fhs	No manual intervention was performed on the test data.	The run used GPT-4o mini, model version gpt-4o-mini-2024-07-18. We created two AI agents (instruction prompts for the two AI agents are below), a discussion (thread) was created where AI agent 1 created the adaptation, AI agents 2 was a "smart 13-14 year old student" who asked clarification questions and then the adaptation was modified using relevant answers to the questions (Merge prompt below). AI agent 1 instructions (same as user prompt in runs w/o AI agents): 'You are tasked with adapting a list of sentences from a medical text into plain language, suitable for readers at a K8 level (13 to 14 years old). The adaptations must be simple, accurate, complete, brief, and fluent, ensuring that the reader fully understands the content. Use the adaptation guidelines provided below ##Adaptation guidelines##, which include rules for splitting complex sentences, substituting medical jargon with common alternatives, and explaining terms with no substitutions. The tone should be casual and understandable.\nYou will be given a list of sentences, formatted as ["SENTENCE_1", "SENTENCE_2",...,"SENTENCE_N"]. Your goal is to return a list of adapted sentences in the same format, where each adapted sentence corresponds to the original sentence at the same position in the list.\nPrioritize making the adaptation as complete as possible to ensure full understanding by a K8-level reader.\nSplit sentences when necessary to improve clarity.\nIf a sentence is already simple and understandable, you may carry it over without changes.\nIf a sentence is irrelevant to consumer understanding, you should omit it and replace it with "" in the output list.\nIf a sentence can be made simpler or clearer by replacing technical terms, do so according to the guidelines.\nEnsure that no information from other source sentences is merged into the adaptation of any given sentence.\nOutput only the list of adaptations as ["ADAPTATION_1", "ADAPTATION_2",...,"ADAPTATION_N"]. Double-check your work to ensure that the adaptations follow the guidelines, are as complete as possible, are the same length as the list of sentences, and omit any unnecessary information.\n\n##Adaptation guidelines##\nThese are guidelines for plain text adaptation from medical texts. The guidelines also feature level of importance for specific concepts, if a word or multiple words are encased "", that means that this concept has the highest priority concept and should always be adhered to in plain language adaptations, if a word or multiple words are encased in \|\| that means a very high priority concept and should be adhered to in plain language adaptations except if it contradicts with a "" concept. Similarly word or multiple words encased between [] are high priority concepts and should be adhered to except if it contradicts "" or []. Examples sentences or example words for plain language adaptations are provide in the format // // -> // //, where the first in // // is the original and second sentence in // // the plain language adaptation.\n\nEducation level of audience for adapted (target) text: "K8 (8th grade level students, schooling age 13 to 14)" \n\n\|Splitting sentences\|: if a sentence is long and contains two or more complete thoughts, it should be split into multiple sentences that are simpler. All such sentences will be entered in the same cell to the right of the source sentence, separating them with periods as per usual. \n\n\|Carrying over sentences or phrases\|: a sentence or phrase need not be paraphrased if it is already understandable for consumers; it can simply be carried over as is. Similarly, some sentences may only need one or two terms to be substituted, but no syntactic changes made. \n\n\|Ignoring sentences\|: if a source sentence is not relevant to consumer understanding of the document, it should be ignored, and the cell to the right of it left blank, for example: \n1) Sentences that expound on experimental procedures not relevant to conclusions, such as \'Blood pressure of study participants was measured in mmHg using a sphygmomanometer.\', \n2) Adapt (do not ignore) sentences mentioning or implying that “Future studies are needed for this topic...” \n\n\|Resolving anaphora\|: if pronouns in the source sentence refer to something in the previous sentence that is necessary for understanding the current, replace them with their referents in the target sentence. For example: //Cardiovascular disease is the leading cause of mortality.// -> //Heart disease is the leading cause of death.//, //It is influenced by genetics as well as lifestyle.// -> //Heart disease is influenced by heredity and lifestyle.// \n\nGeneral guidelines: \n1) [Change passive voice to active voice when possible.] Example //A total of 24 papers were reviewed// -> //We reviewed a total of 24 papers//, \n2) [If a source sentence contains a subheading, such as Background:, Results:,] a) [And is followed by a complete sentence, omit the subheadings, such as Background:, Results: in the target text], example //Objective: Our aim is to evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our aim is to rate treatment of foreign objects stuck in the upper digestive tract.// b) [And is followed by an incomplete sentence, convert the partial or incomplete sentence to a complete target sentence by folding in the subheading based on context], examples //Objective: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our objective is to rate treatment of \nforeign objects stuck in the upper digestive tract.//, //Purpose of this review: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //This review’s purpose is to rate treatment of foreign objects stuck in the upper digestive tract.//, \n3) "Omit confidence intervals, p-values, and similar measurements." Example: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).// \n4) [If the current target sentence is partially entailed or implied by the previous target sentence, still create a adaptation for the current target sentence.] Examples: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).//, //The summary OR for clinical cure rate was 2.29 (95% CI, 1.61-3.28), significantly favoring cephalosporins (P<.00001).// -> //Results favored cephalosporins.// \n5) If the current target sentence can be written EXACTLY as the previous target sentence, just type “...” (no quotes) for the current target sentence Note: this is a rare scenario \n6) [Carry over words that are understandable for consumers OR words that consumers are exposed to constantly], such as metabolism. Metabolism does not need a substitution, synonym, or adjacent definition in the target sentence and can be carried over as is. \n7) [Substitute longer, more arcane words for shorter, more common synonyms.] Example: //inhibits// -> //blocks//, //assessed// -> //measured// \n8) "Replace professional jargon with common, consumer-friendly terms." a) Examples: //nighttime orthoses// -> //nighttime braces//, //interphalangeal joint// -> //finger knuckle//, b) [If there is ambiguity in how a term can be replaced, the full publication or other outside sources may be used to deduce the intent of the authors], c) [When substituting a term, ensure that it fits in with the sentence holistically, adjusting the term or sentence appropriately, e.g. to avoid redundancy. Where appropriate, pronouns like it or the general you in the adapted term can become more specific from the context.] \n9) "If the jargon or a named entity does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same named entity by (1) a PRONOUN or (2) its SPECIFIC NAME can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC NAME. Example: //Duloxetine is a combined serotonin/norepinephrine reuptake inhibitor currently under clinical investigation for the treatment of women with stress urinary incontinence.// -> //Duloxetine (a common antidepressant) blocks removal of serotonin/norepinephrine (chemical messengers) and is studied for treating women with bladder control loss from stress.//, \n10) "Treat abbreviations similarly as jargon or named entities. If an abbreviation does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same abbreviation by (1) a PRONOUN or (2) its SPECIFIC ABBREVIATION can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC ABBREVIATION. Example://This chapter covers antidepressants that fall into the class of serotonin (5HT) and norepinephrine (NE) reuptake inhibitors.// -> //This work covers antidepressants that block removal of the chemical messengers serotonin (5-HT) and norepinephrine (NE).//' AI agent 2 instructions: "You are a smart 13 to 14-year-old student. Your job is to carefully review plain language adaptations of medical text and ask questions that could make the adaptations better. The goal is to help the AI Assistant improve these texts so that they are easy to understand for everyone.\n\nWhen reviewing the text, focus on these five things:\n\nSimplicity: Is the text easy to understand?\nAccuracy: Is the information correct?\nCompleteness: Is there anything important missing?\nBrevity: Is the information as short as possible while still being clear?\nFluency: Does the text flow smoothly when read?\nAsk a question only if you are pretty sure it could help make the adaptation better. For example, ask if there’s a medical term that needs explaining, if something important is missing, or if something could be said more clearly.\n\nHere are up to 5 questions you might ask:\n\nCould this sentence be made simpler for someone who doesn't know any medical terms?\nIs there any important information left out that should be added to make this clearer?\nIs there a shorter way to say this without losing the important details?\nDoes this part make sense if someone has no background in health or medicine?\nIs there any medical jargon or abbreviation here that should be explained or replaced with simpler words?" Merge prompt (instruction to AI agent 1 to include response from AI agent 2): 'You are tasked with adapting a list of sentences from a medical text into plain language, suitable for readers at a K8 level (13 to 14 years old). The adaptations must be simple, accurate, complete, brief, and fluent, ensuring that the reader fully understands the content. After completing your initial adaptations, you will receive questions from a 13 to 14-year-old student (AI Assistant 2) designed to help improve the clarity and effectiveness of your work.\n\nHere’s how you should proceed:\n\nReview the Questions: Carefully read each question provided by AI Assistant 2, which aims to identify areas where the adaptations could be made simpler, more accurate, more complete, shorter, or more fluent.\n\nIncorporate Feedback: Based on the questions, revise the adaptations to improve them. This might involve simplifying language further, adding missing information, clarifying confusing parts, shortening sentences, or replacing medical jargon with simpler terms.\n\nMaintain Quality: Ensure that the revised adaptations remain as complete as possible for the understanding of a K8-level reader, while adhering to the original guidelines for simplicity, accuracy, completeness, brevity, and fluency.\n\nFinal Output: After incorporating the feedback, output the final list of adaptations in the same format, ["ADAPTATION_1", "ADAPTATION_2",...,"ADAPTATION_N"], ensuring that each adaptation corresponds to the original sentence and has been improved based on the questions asked.'	This run did not use training data (PLABA dataset). Training data was only used for comparison with test set (Flesh-Kincaid score od the adaptations).	Two GPT-4o mini, model version gpt-4o-mini-2024-07-18, AI agent were created, where after creating the adaptation with the AI agent 1, AI agents 2 in the persona of a "smart 13-14 year old student" who asked clarification questions and then the adaptation was modified using relevant answers to the questions. The runs were was evaluated quantitively (comparing average Flesh-Kincaid (FK) grade level to the average FK-grade level in the training set) and qualitatively (manual evaluation on a sample, similar as in the competition - four 5 scale Likert scores). This run was evaluated highest qualitatively in a sample of n=40 abstract adaptations Simplicity M=4.23 (SD=0.95), Accuracy 4.25 (SD=0.87), Completeness 4.38(SD=0.70), Brevity M=4.08(SD=0.89) with a sum of average with a score M=16,17(SD=2.65). The average FK-grade level was M=8,93 (SD=1,74), where the average training data adaptations (ground truth) was 11.64 (SD=2.43).	1 (top)
plaba_um_fhs_sub2 (paper)	um_fhs	No manual intervention was performed on the test data.	The run used GPT-4o mini, model version gpt-4o-mini-2024-07-18. We used the following user prompt which included manually adapted guidelines for generating the plain language adaptation. User prompt: 'You are tasked with adapting a list of sentences from a medical text into plain language, suitable for readers at a K8 level (13 to 14 years old). The adaptations must be simple, accurate, complete, brief, and fluent, ensuring that the reader fully understands the content. Use the adaptation guidelines provided below ##Adaptation guidelines##, which include rules for splitting complex sentences, substituting medical jargon with common alternatives, and explaining terms with no substitutions. The tone should be casual and understandable.\nYou will be given a list of sentences, formatted as ["SENTENCE_1", "SENTENCE_2",...,"SENTENCE_N"]. Your goal is to return a list of adapted sentences in the same format, where each adapted sentence corresponds to the original sentence at the same position in the list.\nPrioritize making the adaptation as complete as possible to ensure full understanding by a K8-level reader.\nSplit sentences when necessary to improve clarity.\nIf a sentence is already simple and understandable, you may carry it over without changes.\nIf a sentence is irrelevant to consumer understanding, you should omit it and replace it with "" in the output list.\nIf a sentence can be made simpler or clearer by replacing technical terms, do so according to the guidelines.\nEnsure that no information from other source sentences is merged into the adaptation of any given sentence.\nOutput only the list of adaptations as ["ADAPTATION_1", "ADAPTATION_2",...,"ADAPTATION_N"]. Double-check your work to ensure that the adaptations follow the guidelines, are as complete as possible, are the same length as the list of sentences, and omit any unnecessary information.\n\n##Adaptation guidelines##\nThese are guidelines for plain text adaptation from medical texts. The guidelines also feature level of importance for specific concepts, if a word or multiple words are encased "", that means that this concept has the highest priority concept and should always be adhered to in plain language adaptations, if a word or multiple words are encased in \|\| that means a very high priority concept and should be adhered to in plain language adaptations except if it contradicts with a "" concept. Similarly word or multiple words encased between [] are high priority concepts and should be adhered to except if it contradicts "" or []. Examples sentences or example words for plain language adaptations are provide in the format // // -> // //, where the first in // // is the original and second sentence in // // the plain language adaptation.\n\nEducation level of audience for adapted (target) text: "K8 (8th grade level students, schooling age 13 to 14)" \n\n\|Splitting sentences\|: if a sentence is long and contains two or more complete thoughts, it should be split into multiple sentences that are simpler. All such sentences will be entered in the same cell to the right of the source sentence, separating them with periods as per usual. \n\n\|Carrying over sentences or phrases\|: a sentence or phrase need not be paraphrased if it is already understandable for consumers; it can simply be carried over as is. Similarly, some sentences may only need one or two terms to be substituted, but no syntactic changes made. \n\n\|Ignoring sentences\|: if a source sentence is not relevant to consumer understanding of the document, it should be ignored, and the cell to the right of it left blank, for example: \n1) Sentences that expound on experimental procedures not relevant to conclusions, such as \'Blood pressure of study participants was measured in mmHg using a sphygmomanometer.\', \n2) Adapt (do not ignore) sentences mentioning or implying that “Future studies are needed for this topic...” \n\n\|Resolving anaphora\|: if pronouns in the source sentence refer to something in the previous sentence that is necessary for understanding the current, replace them with their referents in the target sentence. For example: //Cardiovascular disease is the leading cause of mortality.// -> //Heart disease is the leading cause of death.//, //It is influenced by genetics as well as lifestyle.// -> //Heart disease is influenced by heredity and lifestyle.// \n\nGeneral guidelines: \n1) [Change passive voice to active voice when possible.] Example //A total of 24 papers were reviewed// -> //We reviewed a total of 24 papers//, \n2) [If a source sentence contains a subheading, such as Background:, Results:,] a) [And is followed by a complete sentence, omit the subheadings, such as Background:, Results: in the target text], example //Objective: Our aim is to evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our aim is to rate treatment of foreign objects stuck in the upper digestive tract.// b) [And is followed by an incomplete sentence, convert the partial or incomplete sentence to a complete target sentence by folding in the subheading based on context], examples //Objective: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our objective is to rate treatment of \nforeign objects stuck in the upper digestive tract.//, //Purpose of this review: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //This review’s purpose is to rate treatment of foreign objects stuck in the upper digestive tract.//, \n3) "Omit confidence intervals, p-values, and similar measurements." Example: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).// \n4) [If the current target sentence is partially entailed or implied by the previous target sentence, still create a adaptation for the current target sentence.] Examples: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).//, //The summary OR for clinical cure rate was 2.29 (95% CI, 1.61-3.28), significantly favoring cephalosporins (P<.00001).// -> //Results favored cephalosporins.// \n5) If the current target sentence can be written EXACTLY as the previous target sentence, just type “...” (no quotes) for the current target sentence Note: this is a rare scenario \n6) [Carry over words that are understandable for consumers OR words that consumers are exposed to constantly], such as metabolism. Metabolism does not need a substitution, synonym, or adjacent definition in the target sentence and can be carried over as is. \n7) [Substitute longer, more arcane words for shorter, more common synonyms.] Example: //inhibits// -> //blocks//, //assessed// -> //measured// \n8) "Replace professional jargon with common, consumer-friendly terms." a) Examples: //nighttime orthoses// -> //nighttime braces//, //interphalangeal joint// -> //finger knuckle//, b) [If there is ambiguity in how a term can be replaced, the full publication or other outside sources may be used to deduce the intent of the authors], c) [When substituting a term, ensure that it fits in with the sentence holistically, adjusting the term or sentence appropriately, e.g. to avoid redundancy. Where appropriate, pronouns like it or the general you in the adapted term can become more specific from the context.] \n9) "If the jargon or a named entity does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same named entity by (1) a PRONOUN or (2) its SPECIFIC NAME can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC NAME. Example: //Duloxetine is a combined serotonin/norepinephrine reuptake inhibitor currently under clinical investigation for the treatment of women with stress urinary incontinence.// -> //Duloxetine (a common antidepressant) blocks removal of serotonin/norepinephrine (chemical messengers) and is studied for treating women with bladder control loss from stress.//, \n10) "Treat abbreviations similarly as jargon or named entities. If an abbreviation does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same abbreviation by (1) a PRONOUN or (2) its SPECIFIC ABBREVIATION can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC ABBREVIATION. Example://This chapter covers antidepressants that fall into the class of serotonin (5HT) and norepinephrine (NE) reuptake inhibitors.// -> //This work covers antidepressants that block removal of the chemical messengers serotonin (5-HT) and norepinephrine (NE).//'	This run did not use training data (PLABA dataset). Training data was only used for comparison with test set (Flesh-Kincaid score od the adaptations).	Two GPT-4o mini, model version gpt-4o-mini-2024-07-18, with a user prompt including the manually adapted guidelines was used. The runs were was evaluated quantitively (comparing average Flesh-Kincaid (FK) grade level to the average FK-grade level in the training set) and qualitatively (manual evaluation on a sample, similar as in the competition - four 5 scale Likert scores). This run was evaluated second highest qualitatively in a sample of n=40 abstract adaptations with Simplicity M=4.08 (SD=1.02), Accuracy 4.20 (SD=0.88), Completeness 4.43(SD=0.75), Brevity M=4.03 (SD=0.77) with a sum of average with a score M=15,93 (SD=2.20). The average FK-grade level was M=8,94 (SD=1,79), where the average training data adaptations (ground truth) was 11.64 (SD=2.43).	2
plaba_um_fhs_sub3 (paper)	um_fhs	No manual intervention was performed on the test data.	The run used GPT-4o, model version gpt-4o-2024-08-06. It was fine-tuned on the PLABA training dataset. We used the following user prompt with manual adaptation of guidelines for generating the plain language adaptation. User prompt: 'You are tasked with adapting a list of sentences from a medical text into plain language, suitable for readers at a K8 level (13 to 14 years old). The adaptations must be simple, accurate, complete, brief, and fluent, ensuring that the reader fully understands the content. Use the adaptation guidelines provided below ##Adaptation guidelines##, which include rules for splitting complex sentences, substituting medical jargon with common alternatives, and explaining terms with no substitutions. The tone should be casual and understandable.\nYou will be given a list of sentences, formatted as ["SENTENCE_1", "SENTENCE_2",...,"SENTENCE_N"]. Your goal is to return a list of adapted sentences in the same format, where each adapted sentence corresponds to the original sentence at the same position in the list.\nPrioritize making the adaptation as complete as possible to ensure full understanding by a K8-level reader.\nSplit sentences when necessary to improve clarity.\nIf a sentence is already simple and understandable, you may carry it over without changes.\nIf a sentence is irrelevant to consumer understanding, you should omit it and replace it with "" in the output list.\nIf a sentence can be made simpler or clearer by replacing technical terms, do so according to the guidelines.\nEnsure that no information from other source sentences is merged into the adaptation of any given sentence.\nOutput only the list of adaptations as ["ADAPTATION_1", "ADAPTATION_2",...,"ADAPTATION_N"]. Double-check your work to ensure that the adaptations follow the guidelines, are as complete as possible, are the same length as the list of sentences, and omit any unnecessary information.\n\n##Adaptation guidelines##\nThese are guidelines for plain text adaptation from medical texts. The guidelines also feature level of importance for specific concepts, if a word or multiple words are encased "", that means that this concept has the highest priority concept and should always be adhered to in plain language adaptations, if a word or multiple words are encased in \|\| that means a very high priority concept and should be adhered to in plain language adaptations except if it contradicts with a "" concept. Similarly word or multiple words encased between [] are high priority concepts and should be adhered to except if it contradicts "" or []. Examples sentences or example words for plain language adaptations are provide in the format // // -> // //, where the first in // // is the original and second sentence in // // the plain language adaptation.\n\nEducation level of audience for adapted (target) text: "K8 (8th grade level students, schooling age 13 to 14)" \n\n\|Splitting sentences\|: if a sentence is long and contains two or more complete thoughts, it should be split into multiple sentences that are simpler. All such sentences will be entered in the same cell to the right of the source sentence, separating them with periods as per usual. \n\n\|Carrying over sentences or phrases\|: a sentence or phrase need not be paraphrased if it is already understandable for consumers; it can simply be carried over as is. Similarly, some sentences may only need one or two terms to be substituted, but no syntactic changes made. \n\n\|Ignoring sentences\|: if a source sentence is not relevant to consumer understanding of the document, it should be ignored, and the cell to the right of it left blank, for example: \n1) Sentences that expound on experimental procedures not relevant to conclusions, such as \'Blood pressure of study participants was measured in mmHg using a sphygmomanometer.\', \n2) Adapt (do not ignore) sentences mentioning or implying that “Future studies are needed for this topic...” \n\n\|Resolving anaphora\|: if pronouns in the source sentence refer to something in the previous sentence that is necessary for understanding the current, replace them with their referents in the target sentence. For example: //Cardiovascular disease is the leading cause of mortality.// -> //Heart disease is the leading cause of death.//, //It is influenced by genetics as well as lifestyle.// -> //Heart disease is influenced by heredity and lifestyle.// \n\nGeneral guidelines: \n1) [Change passive voice to active voice when possible.] Example //A total of 24 papers were reviewed// -> //We reviewed a total of 24 papers//, \n2) [If a source sentence contains a subheading, such as Background:, Results:,] a) [And is followed by a complete sentence, omit the subheadings, such as Background:, Results: in the target text], example //Objective: Our aim is to evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our aim is to rate treatment of foreign objects stuck in the upper digestive tract.// b) [And is followed by an incomplete sentence, convert the partial or incomplete sentence to a complete target sentence by folding in the subheading based on context], examples //Objective: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our objective is to rate treatment of \nforeign objects stuck in the upper digestive tract.//, //Purpose of this review: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //This review’s purpose is to rate treatment of foreign objects stuck in the upper digestive tract.//, \n3) "Omit confidence intervals, p-values, and similar measurements." Example: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).// \n4) [If the current target sentence is partially entailed or implied by the previous target sentence, still create a adaptation for the current target sentence.] Examples: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).//, //The summary OR for clinical cure rate was 2.29 (95% CI, 1.61-3.28), significantly favoring cephalosporins (P<.00001).// -> //Results favored cephalosporins.// \n5) If the current target sentence can be written EXACTLY as the previous target sentence, just type “...” (no quotes) for the current target sentence Note: this is a rare scenario \n6) [Carry over words that are understandable for consumers OR words that consumers are exposed to constantly], such as metabolism. Metabolism does not need a substitution, synonym, or adjacent definition in the target sentence and can be carried over as is. \n7) [Substitute longer, more arcane words for shorter, more common synonyms.] Example: //inhibits// -> //blocks//, //assessed// -> //measured// \n8) "Replace professional jargon with common, consumer-friendly terms." a) Examples: //nighttime orthoses// -> //nighttime braces//, //interphalangeal joint// -> //finger knuckle//, b) [If there is ambiguity in how a term can be replaced, the full publication or other outside sources may be used to deduce the intent of the authors], c) [When substituting a term, ensure that it fits in with the sentence holistically, adjusting the term or sentence appropriately, e.g. to avoid redundancy. Where appropriate, pronouns like it or the general you in the adapted term can become more specific from the context.] \n9) "If the jargon or a named entity does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same named entity by (1) a PRONOUN or (2) its SPECIFIC NAME can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC NAME. Example: //Duloxetine is a combined serotonin/norepinephrine reuptake inhibitor currently under clinical investigation for the treatment of women with stress urinary incontinence.// -> //Duloxetine (a common antidepressant) blocks removal of serotonin/norepinephrine (chemical messengers) and is studied for treating women with bladder control loss from stress.//, \n10) "Treat abbreviations similarly as jargon or named entities. If an abbreviation does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same abbreviation by (1) a PRONOUN or (2) its SPECIFIC ABBREVIATION can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC ABBREVIATION. Example://This chapter covers antidepressants that fall into the class of serotonin (5HT) and norepinephrine (NE) reuptake inhibitors.// -> //This work covers antidepressants that block removal of the chemical messengers serotonin (5-HT) and norepinephrine (NE).//'	This run used the PLABA training dataset for fine-tuning (FT) GPT-4o, model version gpt-4o-2024-08-06. We randomly split the data 80% for training (733 samples) and 20% for validation (184 samples) with respect to “pmid” since abstract could have more than one adaptation. Each sample encoded as .jsonl has the following system prompt included: “You are tasked with adapting a list of sentences from a medical text into plain language, suitable for readers at a K8 level (13 to 14 years old). The adaptations must be simple, accurate, complete, brief, and fluent, ensuring that the reader fully understands the content. Rules for splitting complex sentences, substituting medical jargon with common alternatives, and explaining terms with no substitutions. The tone should be casual and understandable.You will be given a list of sentences, formatted as ["SENTENCE_1", "SENTENCE_2",...,"SENTENCE_N"]. Return only the list of adaptations as ["ADAPTATION_1", "ADAPTATION_2",...,"ADAPTATION_N"]. Double-check your work to ensure that the adaptations follow the guidelines, are as complete as possible, are the same length as the list of sentences, and omit any unnecessary information.”	GPT-4o, model version gpt-4o-2024-08-06, was fine-tuned on the PLABA dataset (80% for training and 20% for validation) was used for this run. The runs were was evaluated quantitively (comparing average Flesh-Kincaid (FK) grade level to the average FK-grade level in the training set) and qualitatively (manual evaluation on a sample, similar as in the competition - four 5-scale Likert scores). This run was evaluated fourth highest qualitatively in a sample of n=40 abstract adaptations Simplicity M=3.80 (SD=0.76), Accuracy 4.20 (SD=0.76), Completeness 4.30 (SD=0.72), Brevity M=3.48 (SD=0.72) with an average sum of scores M=15.43 (SD=1.89). The average FK-grade level was M=12.20 (SD=2.76), where the average training data adaptations (ground truth) was 11.64 (SD=2.43). We decided to include this run since the average FK-grade level was closer to the ground truth then the third highest qualitatively evaluated, which was gpt-4o, model version gpt-4o-2024-08-06, with simple prompt and had a FK-grade level of M=7.40 (SD=1.72).	3 (bottom)
LLaMA-8B-4bit-MedicalAbstract-seq-to-seq-v1	BU	Added a conversational template using a role, user, and content prompt. Implement two custom functions: one to break down complex medical jargon and another to generate clear, concise sentences. Another function implemented was layering a medical dictionary with the train data.	Llama-3.1-8B-Instruct The model was built using a modified version of the LLaMA algorithm, specifically the DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored model, which was then optimized for memory efficiency by using 4-bit precision	PLABA dataset for task 1 PLABA dataset (Attal 2023)- extract differences between abstract and adaption to create a list of 'difficult words' Plain Language Medical Dictionary: https://apps.lib.umich.edu/medical-dictionary https://github.com/mlibrary/medical-dictionary/blob/main/data.json	What sets this project apart is its use of 4-bit quantization, which makes the model more efficient in terms of memory usage. The model was also trained using a custom dictionary to help explain complex medical terms, which makes the output much clearer and more accessible to non-medical professionals. Also, introduced while configuring the tokenizer - padding side - to help sentence length match up i.e. input and output sentence structure similarity to optimize the output.	1 (top)
task2_moa_tier3_post (paper)	ntu_nlp	No manual intervention.	Model: gemini-1.0-pro, gemini-1.5-flash, gemini-1.5-pro, gemma-2-27b, gpt-4o-mini, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 Prompt for adaptation: zero_shot_prompt = """Below is an instruction that describes a task, paired with an input that provides biomedical abstracts and other information in json format. Write a lay language response in JSON format that adapts input_sentence for the general public using plain language. Response should be in the following format:{{"response": }} ### Instruction: {} ### Input: {} ### Response: {}""" Prompt for MoA: f"""I have a task to convert medical sentences into simpler, easy-to-understand sentences. Please follow the guidelines below: Guidelines: 1. Simplicity: Outputs should be easy to understand. 2. Accuracy: Outputs should contain accurate information. 3. Completeness: Outputs should seek to minimize information lost from the original text. 4. Brevity: Outputs should be concise. Original Sentence: {original_sentence} I have already provided several samples. {sample} Please generate sentences that best follow these guidelines. The output format should be in JSON: {{response: }}”””	PLABA dataset (Attal 2023)	Method: demonstration relevant 5 shot + NER 5 Adaptation: gemini-1.0-pro, gemini-1.5-flash, gemini-1.5-pro, gemma-2-27b, gpt-4o-mini, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 MoA: gemini-1.5-flash	3 (bottom)
task2_moa_tier1_post (paper)	ntu_nlp	No manual intervention	Model: gemini-1.0-pro, gemini-1.5-flash, gemini-1.5-pro, gemma-2-27b, gpt-4o-mini, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 Prompt for adaptation: zero_shot_prompt = """Below is an instruction that describes a task, paired with an input that provides biomedical abstracts and other information in json format. Write a lay language response in JSON format that adapts input_sentence for the general public using plain language. Response should be in the following format:{{"response": }} ### Instruction: {} ### Input: {} ### Response: {}""" Prompt for MoA: f"""I have a task to convert medical sentences into simpler, easy-to-understand sentences. Please follow the guidelines below: Guidelines: 1. Simplicity: Outputs should be easy to understand. 2. Accuracy: Outputs should contain accurate information. 3. Completeness: Outputs should seek to minimize information lost from the original text. 4. Brevity: Outputs should be concise. Original Sentence: {original_sentence} I have already provided several samples. {sample} Please generate sentences that best follow these guidelines. The output format should be in JSON: {{response: }}”””	PLABA dataset (Attal 2023)	Method: demonstration relevant 5 shot + NER None & Finetune 1epoch NER None Adaptation: gemini-1.0-pro, gemini-1.5-flash, gemini-1.5-pro, gemma-2-27b, gpt-4o-mini, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 Finetune: gemma-2-27b, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 MoA: gemini-1.5-flash	1 (top)
task2_moa_tier2_post (paper)	ntu_nlp	No manual intervention	Model: gemini-1.0-pro, gemini-1.5-flash, gemini-1.5-pro, gemma-2-27b, gpt-4o-mini, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 Prompt for adaptation: zero_shot_prompt = """Below is an instruction that describes a task, paired with an input that provides biomedical abstracts and other information in json format. Write a lay language response in JSON format that adapts input_sentence for the general public using plain language. Response should be in the following format:{{"response": }} ### Instruction: {} ### Input: {} ### Response: {}""" Prompt for MoA: f"""I have a task to convert medical sentences into simpler, easy-to-understand sentences. Please follow the guidelines below: Guidelines: 1. Simplicity: Outputs should be easy to understand. 2. Accuracy: Outputs should contain accurate information. 3. Completeness: Outputs should seek to minimize information lost from the original text. 4. Brevity: Outputs should be concise. Original Sentence: {original_sentence} I have already provided several samples. {sample} Please generate sentences that best follow these guidelines. The output format should be in JSON: {{response: }}”””	PLABA dataset (Attal 2023)	Method: demonstration relevant 5 shot + NER None Adaptation: gemini-1.0-pro, gemini-1.5-flash, gemini-1.5-pro, gemma-2-27b, gpt-4o-mini, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 MoA: gemini-1.5-flash	2
TREC2024_SIB_run1	SIB	None	Llama3 8B	PLABA dataset (Attal 2023)	This run is a baseline, with the original model (no special feature), to compare the performance of our other runs	2
TREC2024_SIB_run3	SIB	-	Llama3.1	-	Basic prompting with Llama3.1 instead of Llama3	1 (top)
mistral-fix	CLAC	Yes format delete \" and quick fix of q34	mistral-large-latest	Dataset task 1 PLABA 2024	mistral-large-latest 7-shot with temp 0.4	2
mistral-FINAL	CLAC	Delete /"	mistral-large-latest	Dataset of task 1 PLABA 2024	mistral-large-latest 7-shot with temperature 0.4	1 (top)
TREC2024_SIB_run4	SIB	None	Llama 3 - 8B	None	Retrieval Augmented Generation (RAG) based on documents from Wikipedia and Monash	3 (bottom)
GPT (paper)	UM	Self-prompted GPT-4 with manual adjustment for alignment	GPT-4o	PLABA	Self-prompt until the output is satisfied.	2
LLaMa 3.1 70B instruction (2nd run) (paper)	UM	Since LLaMa generated results include pre-words such as 'Here is the simplified version: [SIMPLIFIED SENTENCE]', we used regular express to replace such kind of pre-words.	LLaMa 3.1 70B instruction tuning	No training or fine tuning, only inference. We added one example into the prompt(1-shot).	We used transformers.pipeline, with all default args	1 (top)
gpt-final	CLAC	Manual delete of characters \"	gpt-3.5-turbo-0125	Task 1 Dataset PLABA 2024	gpt-3.5-turbo-0125 7-shot run	3 (bottom)

Runtag

Org

Describe what, if any, manual intervention that was done to produce the run in response to the test data

Which base models did this run use -- e.g., GPT-4, Llama 13B, etc.?

What data, if any, did this run use for training -- e.g., PLABA dataset (Attal 2023), etc.?

Briefly describe salient features of this run, including what distinguishes it from your other runs.

Please give this run a priority for inclusion in manual assessments.

UAms-ConBART-Cochrane (paper)

UAmsterdam

None.

Plan guided ConBART (Contextual BART) model trained on Cochrane plain English summaries.

Cochrane plain English summaries.

Contextual BART model trained as planned guided simplification model, using with a "rephrase" instruction for each sentence.

1 (top)

UAms-BART-Cochrane (paper)

UAmsterdam

None.

Plan guided BART (Sentence BART) model trained on Cochrane plain English summaries.

Cochrane plain English summaries.

Sentence BART model trained as planned guided simplification model, using with a "rephrase" instruction for each sentence.

gpt35_dspy

Yseop

gpt-35-turbo-16k (via azure)

PLABA dataset (Attal 2023)

prompt engg using dspy

1 (top)

bart_base_ft

Yseop

facebook/bart-base

PLABA dataset (Attal 2023)

high sari score but mediocre quality sentences

plaba_um_fhs_sub1 (paper)

um_fhs

No manual intervention was performed on the test data.

The run used GPT-4o mini, model version gpt-4o-mini-2024-07-18. We created two AI agents (instruction prompts for the two AI agents are below), a discussion (thread) was created where AI agent 1 created the adaptation, AI agents 2 was a "smart 13-14 year old student" who asked clarification questions and then the adaptation was modified using relevant answers to the questions (Merge prompt below). AI agent 1 instructions (same as user prompt in runs w/o AI agents): 'You are tasked with adapting a list of sentences from a medical text into plain language, suitable for readers at a K8 level (13 to 14 years old). The adaptations must be simple, accurate, complete, brief, and fluent, ensuring that the reader fully understands the content. Use the adaptation guidelines provided below ##Adaptation guidelines##, which include rules for splitting complex sentences, substituting medical jargon with common alternatives, and explaining terms with no substitutions. The tone should be casual and understandable.\nYou will be given a list of sentences, formatted as ["SENTENCE_1", "SENTENCE_2",...,"SENTENCE_N"]. Your goal is to return a list of adapted sentences in the same format, where each adapted sentence corresponds to the original sentence at the same position in the list.\nPrioritize making the adaptation as complete as possible to ensure full understanding by a K8-level reader.\nSplit sentences when necessary to improve clarity.\nIf a sentence is already simple and understandable, you may carry it over without changes.\nIf a sentence is irrelevant to consumer understanding, you should omit it and replace it with "" in the output list.\nIf a sentence can be made simpler or clearer by replacing technical terms, do so according to the guidelines.\nEnsure that no information from other source sentences is merged into the adaptation of any given sentence.\nOutput only the list of adaptations as ["ADAPTATION_1", "ADAPTATION_2",...,"ADAPTATION_N"]. Double-check your work to ensure that the adaptations follow the guidelines, are as complete as possible, are the same length as the list of sentences, and omit any unnecessary information.\n\n##Adaptation guidelines##\nThese are guidelines for plain text adaptation from medical texts. The guidelines also feature level of importance for specific concepts, if a word or multiple words are encased "", that means that this concept has the highest priority concept and should always be adhered to in plain language adaptations, if a word or multiple words are encased in || that means a very high priority concept and should be adhered to in plain language adaptations except if it contradicts with a "" concept. Similarly word or multiple words encased between [] are high priority concepts and should be adhered to except if it contradicts "" or []. Examples sentences or example words for plain language adaptations are provide in the format // // -> // //, where the first in // // is the original and second sentence in // // the plain language adaptation.\n\nEducation level of audience for adapted (target) text: "K8 (8th grade level students, schooling age 13 to 14)" \n\n|Splitting sentences|: if a sentence is long and contains two or more complete thoughts, it should be split into multiple sentences that are simpler. All such sentences will be entered in the same cell to the right of the source sentence, separating them with periods as per usual. \n\n|Carrying over sentences or phrases|: a sentence or phrase need not be paraphrased if it is already understandable for consumers; it can simply be carried over as is. Similarly, some sentences may only need one or two terms to be substituted, but no syntactic changes made. \n\n|Ignoring sentences|: if a source sentence is not relevant to consumer understanding of the document, it should be ignored, and the cell to the right of it left blank, for example: \n1) Sentences that expound on experimental procedures not relevant to conclusions, such as \'Blood pressure of study participants was measured in mmHg using a sphygmomanometer.\', \n2) Adapt (do not ignore) sentences mentioning or implying that “Future studies are needed for this topic...” \n\n|Resolving anaphora|: if pronouns in the source sentence refer to something in the previous sentence that is necessary for understanding the current, replace them with their referents in the target sentence. For example: //Cardiovascular disease is the leading cause of mortality.// -> //Heart disease is the leading cause of death.//, //It is influenced by genetics as well as lifestyle.// -> //Heart disease is influenced by heredity and lifestyle.// \n\nGeneral guidelines: \n1) [Change passive voice to active voice when possible.] Example //A total of 24 papers were reviewed// -> //We reviewed a total of 24 papers//, \n2) [If a source sentence contains a subheading, such as Background:, Results:,] a) [And is followed by a complete sentence, omit the subheadings, such as Background:, Results: in the target text], example //Objective: Our aim is to evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our aim is to rate treatment of foreign objects stuck in the upper digestive tract.// b) [And is followed by an incomplete sentence, convert the partial or incomplete sentence to a complete target sentence by folding in the subheading based on context], examples //Objective: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our objective is to rate treatment of \nforeign objects stuck in the upper digestive tract.//, //Purpose of this review: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //This review’s purpose is to rate treatment of foreign objects stuck in the upper digestive tract.//, \n3) "Omit confidence intervals, p-values, and similar measurements." Example: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).// \n4) [If the current target sentence is partially entailed or implied by the previous target sentence, still create a adaptation for the current target sentence.] Examples: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).//, //The summary OR for clinical cure rate was 2.29 (95% CI, 1.61-3.28), significantly favoring cephalosporins (P<.00001).// -> //Results favored cephalosporins.// \n5) If the current target sentence can be written EXACTLY as the previous target sentence, just type “...” (no quotes) for the current target sentence Note: this is a rare scenario \n6) [Carry over words that are understandable for consumers OR words that consumers are exposed to constantly], such as metabolism. Metabolism does not need a substitution, synonym, or adjacent definition in the target sentence and can be carried over as is. \n7) [Substitute longer, more arcane words for shorter, more common synonyms.] Example: //inhibits// -> //blocks//, //assessed// -> //measured// \n8) "Replace professional jargon with common, consumer-friendly terms." a) Examples: //nighttime orthoses// -> //nighttime braces//, //interphalangeal joint// -> //finger knuckle//, b) [If there is ambiguity in how a term can be replaced, the full publication or other outside sources may be used to deduce the intent of the authors], c) [When substituting a term, ensure that it fits in with the sentence holistically, adjusting the term or sentence appropriately, e.g. to avoid redundancy. Where appropriate, pronouns like it or the general you in the adapted term can become more specific from the context.] \n9) "If the jargon or a named entity does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same named entity by (1) a PRONOUN or (2) its SPECIFIC NAME can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC NAME. Example: //Duloxetine is a combined serotonin/norepinephrine reuptake inhibitor currently under clinical investigation for the treatment of women with stress urinary incontinence.// -> //Duloxetine (a common antidepressant) blocks removal of serotonin/norepinephrine (chemical messengers) and is studied for treating women with bladder control loss from stress.//, \n10) "Treat abbreviations similarly as jargon or named entities. If an abbreviation does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same abbreviation by (1) a PRONOUN or (2) its SPECIFIC ABBREVIATION can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC ABBREVIATION. Example://This chapter covers antidepressants that fall into the class of serotonin (5HT) and norepinephrine (NE) reuptake inhibitors.// -> //This work covers antidepressants that block removal of the chemical messengers serotonin (5-HT) and norepinephrine (NE).//' AI agent 2 instructions: "You are a smart 13 to 14-year-old student. Your job is to carefully review plain language adaptations of medical text and ask questions that could make the adaptations better. The goal is to help the AI Assistant improve these texts so that they are easy to understand for everyone.\n\nWhen reviewing the text, focus on these five things:\n\nSimplicity: Is the text easy to understand?\nAccuracy: Is the information correct?\nCompleteness: Is there anything important missing?\nBrevity: Is the information as short as possible while still being clear?\nFluency: Does the text flow smoothly when read?\nAsk a question only if you are pretty sure it could help make the adaptation better. For example, ask if there’s a medical term that needs explaining, if something important is missing, or if something could be said more clearly.\n\nHere are up to 5 questions you might ask:\n\nCould this sentence be made simpler for someone who doesn't know any medical terms?\nIs there any important information left out that should be added to make this clearer?\nIs there a shorter way to say this without losing the important details?\nDoes this part make sense if someone has no background in health or medicine?\nIs there any medical jargon or abbreviation here that should be explained or replaced with simpler words?" Merge prompt (instruction to AI agent 1 to include response from AI agent 2): 'You are tasked with adapting a list of sentences from a medical text into plain language, suitable for readers at a K8 level (13 to 14 years old). The adaptations must be simple, accurate, complete, brief, and fluent, ensuring that the reader fully understands the content. After completing your initial adaptations, you will receive questions from a 13 to 14-year-old student (AI Assistant 2) designed to help improve the clarity and effectiveness of your work.\n\nHere’s how you should proceed:\n\nReview the Questions: Carefully read each question provided by AI Assistant 2, which aims to identify areas where the adaptations could be made simpler, more accurate, more complete, shorter, or more fluent.\n\nIncorporate Feedback: Based on the questions, revise the adaptations to improve them. This might involve simplifying language further, adding missing information, clarifying confusing parts, shortening sentences, or replacing medical jargon with simpler terms.\n\nMaintain Quality: Ensure that the revised adaptations remain as complete as possible for the understanding of a K8-level reader, while adhering to the original guidelines for simplicity, accuracy, completeness, brevity, and fluency.\n\nFinal Output: After incorporating the feedback, output the final list of adaptations in the same format, ["ADAPTATION_1", "ADAPTATION_2",...,"ADAPTATION_N"], ensuring that each adaptation corresponds to the original sentence and has been improved based on the questions asked.'

This run did not use training data (PLABA dataset). Training data was only used for comparison with test set (Flesh-Kincaid score od the adaptations).

Two GPT-4o mini, model version gpt-4o-mini-2024-07-18, AI agent were created, where after creating the adaptation with the AI agent 1, AI agents 2 in the persona of a "smart 13-14 year old student" who asked clarification questions and then the adaptation was modified using relevant answers to the questions. The runs were was evaluated quantitively (comparing average Flesh-Kincaid (FK) grade level to the average FK-grade level in the training set) and qualitatively (manual evaluation on a sample, similar as in the competition - four 5 scale Likert scores). This run was evaluated highest qualitatively in a sample of n=40 abstract adaptations Simplicity M=4.23 (SD=0.95), Accuracy 4.25 (SD=0.87), Completeness 4.38(SD=0.70), Brevity M=4.08(SD=0.89) with a sum of average with a score M=16,17(SD=2.65). The average FK-grade level was M=8,93 (SD=1,74), where the average training data adaptations (ground truth) was 11.64 (SD=2.43).

1 (top)

plaba_um_fhs_sub2 (paper)

um_fhs

No manual intervention was performed on the test data.

The run used GPT-4o mini, model version gpt-4o-mini-2024-07-18. We used the following user prompt which included manually adapted guidelines for generating the plain language adaptation. User prompt: 'You are tasked with adapting a list of sentences from a medical text into plain language, suitable for readers at a K8 level (13 to 14 years old). The adaptations must be simple, accurate, complete, brief, and fluent, ensuring that the reader fully understands the content. Use the adaptation guidelines provided below ##Adaptation guidelines##, which include rules for splitting complex sentences, substituting medical jargon with common alternatives, and explaining terms with no substitutions. The tone should be casual and understandable.\nYou will be given a list of sentences, formatted as ["SENTENCE_1", "SENTENCE_2",...,"SENTENCE_N"]. Your goal is to return a list of adapted sentences in the same format, where each adapted sentence corresponds to the original sentence at the same position in the list.\nPrioritize making the adaptation as complete as possible to ensure full understanding by a K8-level reader.\nSplit sentences when necessary to improve clarity.\nIf a sentence is already simple and understandable, you may carry it over without changes.\nIf a sentence is irrelevant to consumer understanding, you should omit it and replace it with "" in the output list.\nIf a sentence can be made simpler or clearer by replacing technical terms, do so according to the guidelines.\nEnsure that no information from other source sentences is merged into the adaptation of any given sentence.\nOutput only the list of adaptations as ["ADAPTATION_1", "ADAPTATION_2",...,"ADAPTATION_N"]. Double-check your work to ensure that the adaptations follow the guidelines, are as complete as possible, are the same length as the list of sentences, and omit any unnecessary information.\n\n##Adaptation guidelines##\nThese are guidelines for plain text adaptation from medical texts. The guidelines also feature level of importance for specific concepts, if a word or multiple words are encased "", that means that this concept has the highest priority concept and should always be adhered to in plain language adaptations, if a word or multiple words are encased in || that means a very high priority concept and should be adhered to in plain language adaptations except if it contradicts with a "" concept. Similarly word or multiple words encased between [] are high priority concepts and should be adhered to except if it contradicts "" or []. Examples sentences or example words for plain language adaptations are provide in the format // // -> // //, where the first in // // is the original and second sentence in // // the plain language adaptation.\n\nEducation level of audience for adapted (target) text: "K8 (8th grade level students, schooling age 13 to 14)" \n\n|Splitting sentences|: if a sentence is long and contains two or more complete thoughts, it should be split into multiple sentences that are simpler. All such sentences will be entered in the same cell to the right of the source sentence, separating them with periods as per usual. \n\n|Carrying over sentences or phrases|: a sentence or phrase need not be paraphrased if it is already understandable for consumers; it can simply be carried over as is. Similarly, some sentences may only need one or two terms to be substituted, but no syntactic changes made. \n\n|Ignoring sentences|: if a source sentence is not relevant to consumer understanding of the document, it should be ignored, and the cell to the right of it left blank, for example: \n1) Sentences that expound on experimental procedures not relevant to conclusions, such as \'Blood pressure of study participants was measured in mmHg using a sphygmomanometer.\', \n2) Adapt (do not ignore) sentences mentioning or implying that “Future studies are needed for this topic...” \n\n|Resolving anaphora|: if pronouns in the source sentence refer to something in the previous sentence that is necessary for understanding the current, replace them with their referents in the target sentence. For example: //Cardiovascular disease is the leading cause of mortality.// -> //Heart disease is the leading cause of death.//, //It is influenced by genetics as well as lifestyle.// -> //Heart disease is influenced by heredity and lifestyle.// \n\nGeneral guidelines: \n1) [Change passive voice to active voice when possible.] Example //A total of 24 papers were reviewed// -> //We reviewed a total of 24 papers//, \n2) [If a source sentence contains a subheading, such as Background:, Results:,] a) [And is followed by a complete sentence, omit the subheadings, such as Background:, Results: in the target text], example //Objective: Our aim is to evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our aim is to rate treatment of foreign objects stuck in the upper digestive tract.// b) [And is followed by an incomplete sentence, convert the partial or incomplete sentence to a complete target sentence by folding in the subheading based on context], examples //Objective: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our objective is to rate treatment of \nforeign objects stuck in the upper digestive tract.//, //Purpose of this review: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //This review’s purpose is to rate treatment of foreign objects stuck in the upper digestive tract.//, \n3) "Omit confidence intervals, p-values, and similar measurements." Example: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).// \n4) [If the current target sentence is partially entailed or implied by the previous target sentence, still create a adaptation for the current target sentence.] Examples: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).//, //The summary OR for clinical cure rate was 2.29 (95% CI, 1.61-3.28), significantly favoring cephalosporins (P<.00001).// -> //Results favored cephalosporins.// \n5) If the current target sentence can be written EXACTLY as the previous target sentence, just type “...” (no quotes) for the current target sentence Note: this is a rare scenario \n6) [Carry over words that are understandable for consumers OR words that consumers are exposed to constantly], such as metabolism. Metabolism does not need a substitution, synonym, or adjacent definition in the target sentence and can be carried over as is. \n7) [Substitute longer, more arcane words for shorter, more common synonyms.] Example: //inhibits// -> //blocks//, //assessed// -> //measured// \n8) "Replace professional jargon with common, consumer-friendly terms." a) Examples: //nighttime orthoses// -> //nighttime braces//, //interphalangeal joint// -> //finger knuckle//, b) [If there is ambiguity in how a term can be replaced, the full publication or other outside sources may be used to deduce the intent of the authors], c) [When substituting a term, ensure that it fits in with the sentence holistically, adjusting the term or sentence appropriately, e.g. to avoid redundancy. Where appropriate, pronouns like it or the general you in the adapted term can become more specific from the context.] \n9) "If the jargon or a named entity does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same named entity by (1) a PRONOUN or (2) its SPECIFIC NAME can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC NAME. Example: //Duloxetine is a combined serotonin/norepinephrine reuptake inhibitor currently under clinical investigation for the treatment of women with stress urinary incontinence.// -> //Duloxetine (a common antidepressant) blocks removal of serotonin/norepinephrine (chemical messengers) and is studied for treating women with bladder control loss from stress.//, \n10) "Treat abbreviations similarly as jargon or named entities. If an abbreviation does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same abbreviation by (1) a PRONOUN or (2) its SPECIFIC ABBREVIATION can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC ABBREVIATION. Example://This chapter covers antidepressants that fall into the class of serotonin (5HT) and norepinephrine (NE) reuptake inhibitors.// -> //This work covers antidepressants that block removal of the chemical messengers serotonin (5-HT) and norepinephrine (NE).//'

This run did not use training data (PLABA dataset). Training data was only used for comparison with test set (Flesh-Kincaid score od the adaptations).

Two GPT-4o mini, model version gpt-4o-mini-2024-07-18, with a user prompt including the manually adapted guidelines was used. The runs were was evaluated quantitively (comparing average Flesh-Kincaid (FK) grade level to the average FK-grade level in the training set) and qualitatively (manual evaluation on a sample, similar as in the competition - four 5 scale Likert scores). This run was evaluated second highest qualitatively in a sample of n=40 abstract adaptations with Simplicity M=4.08 (SD=1.02), Accuracy 4.20 (SD=0.88), Completeness 4.43(SD=0.75), Brevity M=4.03 (SD=0.77) with a sum of average with a score M=15,93 (SD=2.20). The average FK-grade level was M=8,94 (SD=1,79), where the average training data adaptations (ground truth) was 11.64 (SD=2.43).

plaba_um_fhs_sub3 (paper)

um_fhs

No manual intervention was performed on the test data.

The run used GPT-4o, model version gpt-4o-2024-08-06. It was fine-tuned on the PLABA training dataset. We used the following user prompt with manual adaptation of guidelines for generating the plain language adaptation. User prompt: 'You are tasked with adapting a list of sentences from a medical text into plain language, suitable for readers at a K8 level (13 to 14 years old). The adaptations must be simple, accurate, complete, brief, and fluent, ensuring that the reader fully understands the content. Use the adaptation guidelines provided below ##Adaptation guidelines##, which include rules for splitting complex sentences, substituting medical jargon with common alternatives, and explaining terms with no substitutions. The tone should be casual and understandable.\nYou will be given a list of sentences, formatted as ["SENTENCE_1", "SENTENCE_2",...,"SENTENCE_N"]. Your goal is to return a list of adapted sentences in the same format, where each adapted sentence corresponds to the original sentence at the same position in the list.\nPrioritize making the adaptation as complete as possible to ensure full understanding by a K8-level reader.\nSplit sentences when necessary to improve clarity.\nIf a sentence is already simple and understandable, you may carry it over without changes.\nIf a sentence is irrelevant to consumer understanding, you should omit it and replace it with "" in the output list.\nIf a sentence can be made simpler or clearer by replacing technical terms, do so according to the guidelines.\nEnsure that no information from other source sentences is merged into the adaptation of any given sentence.\nOutput only the list of adaptations as ["ADAPTATION_1", "ADAPTATION_2",...,"ADAPTATION_N"]. Double-check your work to ensure that the adaptations follow the guidelines, are as complete as possible, are the same length as the list of sentences, and omit any unnecessary information.\n\n##Adaptation guidelines##\nThese are guidelines for plain text adaptation from medical texts. The guidelines also feature level of importance for specific concepts, if a word or multiple words are encased "", that means that this concept has the highest priority concept and should always be adhered to in plain language adaptations, if a word or multiple words are encased in || that means a very high priority concept and should be adhered to in plain language adaptations except if it contradicts with a "" concept. Similarly word or multiple words encased between [] are high priority concepts and should be adhered to except if it contradicts "" or []. Examples sentences or example words for plain language adaptations are provide in the format // // -> // //, where the first in // // is the original and second sentence in // // the plain language adaptation.\n\nEducation level of audience for adapted (target) text: "K8 (8th grade level students, schooling age 13 to 14)" \n\n|Splitting sentences|: if a sentence is long and contains two or more complete thoughts, it should be split into multiple sentences that are simpler. All such sentences will be entered in the same cell to the right of the source sentence, separating them with periods as per usual. \n\n|Carrying over sentences or phrases|: a sentence or phrase need not be paraphrased if it is already understandable for consumers; it can simply be carried over as is. Similarly, some sentences may only need one or two terms to be substituted, but no syntactic changes made. \n\n|Ignoring sentences|: if a source sentence is not relevant to consumer understanding of the document, it should be ignored, and the cell to the right of it left blank, for example: \n1) Sentences that expound on experimental procedures not relevant to conclusions, such as \'Blood pressure of study participants was measured in mmHg using a sphygmomanometer.\', \n2) Adapt (do not ignore) sentences mentioning or implying that “Future studies are needed for this topic...” \n\n|Resolving anaphora|: if pronouns in the source sentence refer to something in the previous sentence that is necessary for understanding the current, replace them with their referents in the target sentence. For example: //Cardiovascular disease is the leading cause of mortality.// -> //Heart disease is the leading cause of death.//, //It is influenced by genetics as well as lifestyle.// -> //Heart disease is influenced by heredity and lifestyle.// \n\nGeneral guidelines: \n1) [Change passive voice to active voice when possible.] Example //A total of 24 papers were reviewed// -> //We reviewed a total of 24 papers//, \n2) [If a source sentence contains a subheading, such as Background:, Results:,] a) [And is followed by a complete sentence, omit the subheadings, such as Background:, Results: in the target text], example //Objective: Our aim is to evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our aim is to rate treatment of foreign objects stuck in the upper digestive tract.// b) [And is followed by an incomplete sentence, convert the partial or incomplete sentence to a complete target sentence by folding in the subheading based on context], examples //Objective: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //Our objective is to rate treatment of \nforeign objects stuck in the upper digestive tract.//, //Purpose of this review: To evaluate management of foreign bodies in the upper gastrointestinal tract.// -> //This review’s purpose is to rate treatment of foreign objects stuck in the upper digestive tract.//, \n3) "Omit confidence intervals, p-values, and similar measurements." Example: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).// \n4) [If the current target sentence is partially entailed or implied by the previous target sentence, still create a adaptation for the current target sentence.] Examples: //The summary odds ratio (OR) for bacteriologic cure rate significantly favored cephalosporins, compared with penicillin (OR,1.83; 95% confidence interval [CI], 1.37-2.44); the bacteriologic failure rate was nearly 2 times higher for penicillin therapy than it was for cephalosporin therapy (P=.00004).// -> //Results favored cephalosporins (antibacterial antibiotics) over penicillin (another antibiotic).//, //The summary OR for clinical cure rate was 2.29 (95% CI, 1.61-3.28), significantly favoring cephalosporins (P<.00001).// -> //Results favored cephalosporins.// \n5) If the current target sentence can be written EXACTLY as the previous target sentence, just type “...” (no quotes) for the current target sentence Note: this is a rare scenario \n6) [Carry over words that are understandable for consumers OR words that consumers are exposed to constantly], such as metabolism. Metabolism does not need a substitution, synonym, or adjacent definition in the target sentence and can be carried over as is. \n7) [Substitute longer, more arcane words for shorter, more common synonyms.] Example: //inhibits// -> //blocks//, //assessed// -> //measured// \n8) "Replace professional jargon with common, consumer-friendly terms." a) Examples: //nighttime orthoses// -> //nighttime braces//, //interphalangeal joint// -> //finger knuckle//, b) [If there is ambiguity in how a term can be replaced, the full publication or other outside sources may be used to deduce the intent of the authors], c) [When substituting a term, ensure that it fits in with the sentence holistically, adjusting the term or sentence appropriately, e.g. to avoid redundancy. Where appropriate, pronouns like it or the general you in the adapted term can become more specific from the context.] \n9) "If the jargon or a named entity does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same named entity by (1) a PRONOUN or (2) its SPECIFIC NAME can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC NAME. Example: //Duloxetine is a combined serotonin/norepinephrine reuptake inhibitor currently under clinical investigation for the treatment of women with stress urinary incontinence.// -> //Duloxetine (a common antidepressant) blocks removal of serotonin/norepinephrine (chemical messengers) and is studied for treating women with bladder control loss from stress.//, \n10) "Treat abbreviations similarly as jargon or named entities. If an abbreviation does not have plain synonyms, leave as is in the first mention but explain it with parentheses or nonrestrictive clauses." Subsequent mentions of the same abbreviation by (1) a PRONOUN or (2) its SPECIFIC ABBREVIATION can be replaced with either (1) a more GENERAL REFERENT or (2) its SPECIFIC ABBREVIATION. Example://This chapter covers antidepressants that fall into the class of serotonin (5HT) and norepinephrine (NE) reuptake inhibitors.// -> //This work covers antidepressants that block removal of the chemical messengers serotonin (5-HT) and norepinephrine (NE).//'

This run used the PLABA training dataset for fine-tuning (FT) GPT-4o, model version gpt-4o-2024-08-06. We randomly split the data 80% for training (733 samples) and 20% for validation (184 samples) with respect to “pmid” since abstract could have more than one adaptation. Each sample encoded as .jsonl has the following system prompt included: “You are tasked with adapting a list of sentences from a medical text into plain language, suitable for readers at a K8 level (13 to 14 years old). The adaptations must be simple, accurate, complete, brief, and fluent, ensuring that the reader fully understands the content. Rules for splitting complex sentences, substituting medical jargon with common alternatives, and explaining terms with no substitutions. The tone should be casual and understandable.You will be given a list of sentences, formatted as ["SENTENCE_1", "SENTENCE_2",...,"SENTENCE_N"]. Return only the list of adaptations as ["ADAPTATION_1", "ADAPTATION_2",...,"ADAPTATION_N"]. Double-check your work to ensure that the adaptations follow the guidelines, are as complete as possible, are the same length as the list of sentences, and omit any unnecessary information.”

GPT-4o, model version gpt-4o-2024-08-06, was fine-tuned on the PLABA dataset (80% for training and 20% for validation) was used for this run. The runs were was evaluated quantitively (comparing average Flesh-Kincaid (FK) grade level to the average FK-grade level in the training set) and qualitatively (manual evaluation on a sample, similar as in the competition - four 5-scale Likert scores). This run was evaluated fourth highest qualitatively in a sample of n=40 abstract adaptations Simplicity M=3.80 (SD=0.76), Accuracy 4.20 (SD=0.76), Completeness 4.30 (SD=0.72), Brevity M=3.48 (SD=0.72) with an average sum of scores M=15.43 (SD=1.89). The average FK-grade level was M=12.20 (SD=2.76), where the average training data adaptations (ground truth) was 11.64 (SD=2.43). We decided to include this run since the average FK-grade level was closer to the ground truth then the third highest qualitatively evaluated, which was gpt-4o, model version gpt-4o-2024-08-06, with simple prompt and had a FK-grade level of M=7.40 (SD=1.72).

3 (bottom)

LLaMA-8B-4bit-MedicalAbstract-seq-to-seq-v1

Added a conversational template using a role, user, and content prompt. Implement two custom functions: one to break down complex medical jargon and another to generate clear, concise sentences. Another function implemented was layering a medical dictionary with the train data.

Llama-3.1-8B-Instruct The model was built using a modified version of the LLaMA algorithm, specifically the DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored model, which was then optimized for memory efficiency by using 4-bit precision

PLABA dataset for task 1 PLABA dataset (Attal 2023)- extract differences between abstract and adaption to create a list of 'difficult words' Plain Language Medical Dictionary: https://apps.lib.umich.edu/medical-dictionary https://github.com/mlibrary/medical-dictionary/blob/main/data.json

What sets this project apart is its use of 4-bit quantization, which makes the model more efficient in terms of memory usage. The model was also trained using a custom dictionary to help explain complex medical terms, which makes the output much clearer and more accessible to non-medical professionals. Also, introduced while configuring the tokenizer - padding side - to help sentence length match up i.e. input and output sentence structure similarity to optimize the output.

1 (top)

task2_moa_tier3_post (paper)

ntu_nlp

No manual intervention.

Model: gemini-1.0-pro, gemini-1.5-flash, gemini-1.5-pro, gemma-2-27b, gpt-4o-mini, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 Prompt for adaptation: zero_shot_prompt = """Below is an instruction that describes a task, paired with an input that provides biomedical abstracts and other information in json format. Write a lay language response in JSON format that adapts input_sentence for the general public using plain language. Response should be in the following format:{{"response": }} ### Instruction: {} ### Input: {} ### Response: {}""" Prompt for MoA: f"""I have a task to convert medical sentences into simpler, easy-to-understand sentences. Please follow the guidelines below: **Guidelines**: 1. **Simplicity**: Outputs should be easy to understand. 2. **Accuracy**: Outputs should contain accurate information. 3. **Completeness**: Outputs should seek to minimize information lost from the original text. 4. **Brevity**: Outputs should be concise. **Original Sentence**: {original_sentence} I have already provided several samples. {sample} Please generate sentences that best follow these guidelines. The output format should be in JSON: {{response: }}”””

PLABA dataset (Attal 2023)

Method: demonstration relevant 5 shot + NER 5 Adaptation: gemini-1.0-pro, gemini-1.5-flash, gemini-1.5-pro, gemma-2-27b, gpt-4o-mini, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 MoA: gemini-1.5-flash

3 (bottom)

task2_moa_tier1_post (paper)

ntu_nlp

No manual intervention

PLABA dataset (Attal 2023)

Method: demonstration relevant 5 shot + NER None & Finetune 1epoch NER None Adaptation: gemini-1.0-pro, gemini-1.5-flash, gemini-1.5-pro, gemma-2-27b, gpt-4o-mini, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 Finetune: gemma-2-27b, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 MoA: gemini-1.5-flash

1 (top)

task2_moa_tier2_post (paper)

ntu_nlp

No manual intervention

PLABA dataset (Attal 2023)

Method: demonstration relevant 5 shot + NER None Adaptation: gemini-1.0-pro, gemini-1.5-flash, gemini-1.5-pro, gemma-2-27b, gpt-4o-mini, Meta-Llama-3.1-8B, Mistral-Nemo-Instruct-2407 MoA: gemini-1.5-flash

TREC2024_SIB_run1

SIB

None

Llama3 8B

PLABA dataset (Attal 2023)

This run is a baseline, with the original model (no special feature), to compare the performance of our other runs

TREC2024_SIB_run3

SIB

Llama3.1

Basic prompting with Llama3.1 instead of Llama3

1 (top)

mistral-fix

CLAC

Yes format delete \" and quick fix of q34

mistral-large-latest

Dataset task 1 PLABA 2024

mistral-large-latest 7-shot with temp 0.4

mistral-FINAL

CLAC

Delete /"

mistral-large-latest

Dataset of task 1 PLABA 2024

mistral-large-latest 7-shot with temperature 0.4

1 (top)

TREC2024_SIB_run4

SIB

None

Llama 3 - 8B

None

Retrieval Augmented Generation (RAG) based on documents from Wikipedia and Monash

3 (bottom)

GPT (paper)

Self-prompted GPT-4 with manual adjustment for alignment

GPT-4o

PLABA

Self-prompt until the output is satisfied.

LLaMa 3.1 70B instruction (2nd run) (paper)

Since LLaMa generated results include pre-words such as 'Here is the simplified version: [SIMPLIFIED SENTENCE]', we used regular express to replace such kind of pre-words.

LLaMa 3.1 70B instruction tuning

No training or fine tuning, only inference. We added one example into the prompt(1-shot).

We used transformers.pipeline, with all default args

1 (top)

gpt-final

CLAC

Manual delete of characters \"

gpt-3.5-turbo-0125

Task 1 Dataset PLABA 2024

gpt-3.5-turbo-0125 7-shot run

3 (bottom)

The Thirty-Third Text REtrieval Conference
(TREC 2024)

Plain Language Adaptation of Biomedical Abstracts Complete abstract adaptation task Appendix