Runtag | Org | Which subtasks did the run complete? | What, if any, manual intervention was done to produce the run in response to the test data? | Which base models did this run use, if any -- e.g., BERT, GPT-4, Llama 13B, etc.? | Did the run use other data besides the training set? If so, please describe. | Briefly describe salient features of this run, including what distinguishes it from your other runs. | Please give this run a priority for inclusion in manual assessments. |
---|---|---|---|---|---|---|---|
bad | plaba | Task 1A only (term identification) | test | test | test | test | 1 (top) |
good | plaba | Task 1A and 1B and 1C (term identification and classification and replacement text generation) | test | test | test | test | 3 (bottom) |
gpt | CLAC | Task 1A and 1B and 1C (term identification and classification and replacement text generation) | No | gpt-3.5-turbo-0125 | Only training set | gpt-3.5-turbo-0125 7-shot run | 2 |
mistral | CLAC | Task 1A only (term identification) | No | mistral-large-latest | No | mistral-large-latest 7-shot with temperature of 0.4 | 1 (top) |
MLPClassifier-identify-classify-replace-v1 | BU | Task 1A and 1B and 1C (term identification and classification and replacement text generation) | Flattened the nested JSON structure in order to make it easy to work with and automated that. | For the base model, I used a simple multi-layer perception (MLP) neural network for classification purposes. | NA | This run explores different classifiers (XGBoost, LightGBM)and uses an MLP, which can capture non-linear patterns. The model accuracy is about 65%. However, a deep dive into the performance of individual action shows the F1 for each class
"SUBSTITUTE":0,
"EXPLAIN":1,
'GENERALIZE':2,
'EXEMPLIFY':3,
'OMIT':4,
precision recall f1-score
0.71 0.83 0.76
0.58 0.47 0.52
0.35 0.18 0.24
0.53 0.67 0.59
0.26 0.12 0.17
Another interesting thing in this run is the inclusion logic to handle cases where the top two predicted actions have very close probabilities (within 0.05 of each other). It also handles cases where no matching description is found for a term-action pair. | 1 (top) |
gemini-1.5-pro_demon5_replace-demon5 (paper) | ntu_nlp | Task 1A and 1B and 1C (term identification and classification and replacement text generation) | format editting
I edited the format from plain text to JSON when it did not meet the required submission format. However, no changes were made to the content itself. | Gemini-pro-1.5 | no | Use gemini-pro-1.5 as base model.
Step 1: entities extraction with 5 demonstrations.
Step 2: entities replacement with 5 demonstrations. | 1 (top) |
gemini-1.5-flash_demon5_replace-demon5 (paper) | ntu_nlp | Task 1A and 1B and 1C (term identification and classification and replacement text generation) | format editting
I edited the format from plain text to JSON when it did not meet the required submission format. However, no changes were made to the content itself. | gemini-1.5-flash | no | Use gemini-flash-1.5 as base model.
Step 1: entities extraction with 5 demonstrations.
Step 2: entities replacement with 5 demonstrations. | 2 |
gpt-4o-mini _demon5_replace-demon5 (paper) | ntu_nlp | Task 1A and 1B and 1C (term identification and classification and replacement text generation) | format editting
I edited the format from plain text to JSON when it did not meet the required submission format. However, no changes were made to the content itself. | gpt-4o-mini | no | Use gpt-4o-mini as base model.
Step 1: entities extraction with 5 demonstrations.
Step 2: entities replacement with 5 demonstrations. | 3 (bottom) |
First | IIITH | Task 1A and 1B (term identification and classification) | No manual intervention was done to obtain this data except ensuring that the data was in the specified format. | For task 1A, BioBERT was used to for named entity recognition.
For task 2B, again, BioBERT was used to get term embeddings, and a RandomForest Classifier used to classify the embeddings into appropriate simplification tools | Besides using the given PLABA dataset, the other dataset used is a pre-processed version of the BC5CDR (BioCreative V CDR task corpus: a resource for relation extraction) dataset from Li et al. (2016).
The two dataset were used in combination to train the BioBERT models | The following run is computationally cheap as it does not require the use of any LLMs. The given data was also complimented further using other publicly available datasets | 1 (top) |
Roberta-base (paper) | UM | Task 1A and 1B (term identification and classification) | RoBERTa-base | nothing | Multi label token classification with roberta-base | 1 (top) | |
roberta-gbc | Yseop | Task 1A and 1B (term identification and classification) | pabRomero/BioMedRoBERTa-full-finetuned-ner-pablo and GradientBoostingClassifier | no | cleaned the training corpus and hyper param tuning of the two models | 1 (top) |