TY - JOUR
T1 - Comparison of large language models in management advice for melanoma
T2 - Google's AI BARD, BingAI and ChatGPT
AU - Mu, Xin
AU - Lim, Bryan
AU - Seth, Ishith
AU - Xie, Yi
AU - Cevik, Jevan
AU - Sofiadellis, Foti
AU - Hunter-Smith, David J.
AU - Rozen, Warren M.
N1 - Publisher Copyright:
© 2023 The Authors. Skin Health and Disease published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists.
PY - 2024/2
Y1 - 2024/2
N2 - Large language models (LLMs) are emerging artificial intelligence (AI) technology refining research and healthcare. Their use in medicine has seen numerous recent applications. One area where LLMs have shown particular promise is in the provision of medical information and guidance to practitioners. This study aims to assess three prominent LLMs—Google's AI BARD, BingAI and ChatGPT-4 in providing management advice for melanoma by comparing their responses to current clinical guidelines and existing literature. Five questions on melanoma pathology were prompted to three LLMs. A panel of three experienced Board-certified plastic surgeons evaluated the responses for reliability using reliability matrix (Flesch Reading Ease Score, the Flesch-Kincaid Grade Level and the Coleman-Liau Index), suitability (modified DISCERN score) and comparing them to existing guidelines. t-Test was performed to calculate differences in mean readability and reliability scores between LLMs and p value <0.05 was considered statistically significant. The mean readability scores across three LLMs were same. ChatGPT exhibited superiority with a Flesch Reading Ease Score of 35.42 (±21.02), Flesch–Kincaid Grade Level of 11.98 (±4.49) and Coleman–Liau Index of 12.00 (±5.10), however all of these were insignificant (p > 0.05). Suitability-wise using DISCERN score, ChatGPT 58 (±6.44) significantly (p = 0.04) outperformed BARD 36.2 (±34.06) and was insignificant to BingAI's 49.8 (±22.28). This study demonstrates that ChatGPT marginally outperforms BARD and BingAI in providing reliable, evidence-based clinical advice, but they still face limitations in depth and specificity. Future research should improve LLM performance by integrating specialized databases and expert knowledge to support patient-centred care.
AB - Large language models (LLMs) are emerging artificial intelligence (AI) technology refining research and healthcare. Their use in medicine has seen numerous recent applications. One area where LLMs have shown particular promise is in the provision of medical information and guidance to practitioners. This study aims to assess three prominent LLMs—Google's AI BARD, BingAI and ChatGPT-4 in providing management advice for melanoma by comparing their responses to current clinical guidelines and existing literature. Five questions on melanoma pathology were prompted to three LLMs. A panel of three experienced Board-certified plastic surgeons evaluated the responses for reliability using reliability matrix (Flesch Reading Ease Score, the Flesch-Kincaid Grade Level and the Coleman-Liau Index), suitability (modified DISCERN score) and comparing them to existing guidelines. t-Test was performed to calculate differences in mean readability and reliability scores between LLMs and p value <0.05 was considered statistically significant. The mean readability scores across three LLMs were same. ChatGPT exhibited superiority with a Flesch Reading Ease Score of 35.42 (±21.02), Flesch–Kincaid Grade Level of 11.98 (±4.49) and Coleman–Liau Index of 12.00 (±5.10), however all of these were insignificant (p > 0.05). Suitability-wise using DISCERN score, ChatGPT 58 (±6.44) significantly (p = 0.04) outperformed BARD 36.2 (±34.06) and was insignificant to BingAI's 49.8 (±22.28). This study demonstrates that ChatGPT marginally outperforms BARD and BingAI in providing reliable, evidence-based clinical advice, but they still face limitations in depth and specificity. Future research should improve LLM performance by integrating specialized databases and expert knowledge to support patient-centred care.
UR - http://www.scopus.com/inward/record.url?scp=85178230114&partnerID=8YFLogxK
U2 - 10.1002/ski2.313
DO - 10.1002/ski2.313
M3 - Article
C2 - 38312244
AN - SCOPUS:85178230114
SN - 2690-442X
VL - 4
JO - Skin Health and Disease
JF - Skin Health and Disease
IS - 1
ER -