Extended Data Table 5 Comparison of the performance between Med-PaLM 540B and Flan-PaLM 540B with self-consistency (SC) across multiple-choice datasets

From: Large language models encode clinical knowledge

  1. Med-PaLM was not trained using any of these datasets. These results suggest that instruction prompt tuning aligns the model to the requirements of consumer medical question answering without affecting base clinical knowledge.