TY - JOUR
T1 - Investigating the impact of innovative AI chatbot on post-pandemic medical education and clinical assistance
T2 - a comprehensive analysis
AU - Xie, Yi
AU - Seth, Ishith
AU - Hunter-Smith, David J.
AU - Rozen, Warren M.
AU - Seifman, Marc A.
N1 - Publisher Copyright:
© 2023 The Authors. ANZ Journal of Surgery published by John Wiley & Sons Australia, Ltd on behalf of Royal Australasian College of Surgeons.
PY - 2024/2
Y1 - 2024/2
N2 - Background: The COVID-19 pandemic has significantly disrupted clinical experience and exposure of medical students and junior doctors. Artificial Intelligence (AI) integration in medical education has the potential to enhance learning and improve patient care. This study aimed to evaluate the effectiveness of three popular large language models (LLMs) in serving as clinical decision-making support tools for junior doctors. Methods: A series of increasingly complex clinical scenarios were presented to ChatGPT, Google's Bard and Bing's AI. Their responses were evaluated against standard guidelines, and for reliability by the Flesch Reading Ease Score, Flesch–Kincaid Grade Level, the Coleman-Liau Index, and the modified DISCERN score for assessing suitability. Lastly, the LLMs outputs were assessed by using the Likert scale for accuracy, informativeness, and accessibility by three experienced specialists. Results: In terms of readability and reliability, ChatGPT stood out among the three LLMs, recording the highest scores in Flesch Reading Ease (31.2 ± 3.5), Flesch–Kincaid Grade Level (13.5 ± 0.7), Coleman–Lau Index (13) and DISCERN (62 ± 4.4). These results suggest statistically significant superior comprehensibility and alignment with clinical guidelines in the medical advice given by ChatGPT. Bard followed closely behind, with BingAI trailing in all categories. The only non-significant statistical differences (P > 0.05) were found between ChatGPT and Bard's readability indices, and between the Flesch Reading Ease scores of ChatGPT/Bard and BingAI. Conclusion: This study demonstrates the potential utility of LLMs in fostering self-directed and personalized learning, as well as bolstering clinical decision-making support for junior doctors. However further development is needed for its integration into education.
AB - Background: The COVID-19 pandemic has significantly disrupted clinical experience and exposure of medical students and junior doctors. Artificial Intelligence (AI) integration in medical education has the potential to enhance learning and improve patient care. This study aimed to evaluate the effectiveness of three popular large language models (LLMs) in serving as clinical decision-making support tools for junior doctors. Methods: A series of increasingly complex clinical scenarios were presented to ChatGPT, Google's Bard and Bing's AI. Their responses were evaluated against standard guidelines, and for reliability by the Flesch Reading Ease Score, Flesch–Kincaid Grade Level, the Coleman-Liau Index, and the modified DISCERN score for assessing suitability. Lastly, the LLMs outputs were assessed by using the Likert scale for accuracy, informativeness, and accessibility by three experienced specialists. Results: In terms of readability and reliability, ChatGPT stood out among the three LLMs, recording the highest scores in Flesch Reading Ease (31.2 ± 3.5), Flesch–Kincaid Grade Level (13.5 ± 0.7), Coleman–Lau Index (13) and DISCERN (62 ± 4.4). These results suggest statistically significant superior comprehensibility and alignment with clinical guidelines in the medical advice given by ChatGPT. Bard followed closely behind, with BingAI trailing in all categories. The only non-significant statistical differences (P > 0.05) were found between ChatGPT and Bard's readability indices, and between the Flesch Reading Ease scores of ChatGPT/Bard and BingAI. Conclusion: This study demonstrates the potential utility of LLMs in fostering self-directed and personalized learning, as well as bolstering clinical decision-making support for junior doctors. However further development is needed for its integration into education.
KW - artificial intelligence
KW - ChatGPT
KW - junior doctor
KW - large language model
KW - surgical education
UR - http://www.scopus.com/inward/record.url?scp=85168612234&partnerID=8YFLogxK
U2 - 10.1111/ans.18666
DO - 10.1111/ans.18666
M3 - Article
C2 - 37602755
AN - SCOPUS:85168612234
SN - 1445-1433
VL - 94
SP - 68
EP - 77
JO - ANZ Journal of Surgery
JF - ANZ Journal of Surgery
IS - 1-2
ER -