Abstract
Objective: Large language models (LLMs) have demonstrated mixed results in their ability to pass various specialist medical examination and their performance within the field of emergency medicine remains unknown. Methods: We explored the performance of three prevalent LLMs (OpenAI's GPT series, Google's Bard, and Microsoft's Bing Chat) on a practice ACEM primary examination. Results: All LLMs achieved a passing score, with scores with GPT 4.0 outperforming the average candidate. Conclusion: Large language models, by passing the ACEM primary examination, show potential as tools for medical education and practice. However, limitations exist and are discussed.
| Original language | English |
|---|---|
| Pages (from-to) | 876-878 |
| Number of pages | 3 |
| Journal | EMA - Emergency Medicine Australasia |
| Volume | 35 |
| Issue number | 5 |
| DOIs | |
| Publication status | Published - Oct 2023 |
Keywords
- artificial intelligence
- Bing
- chat GPT
- emergency medicine
- medical education
- specialty examination
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver