Will code one day run a code? Performance of language models on ACEM primary examinations and implications

Objective: Large language models (LLMs) have demonstrated mixed results in their ability to pass various specialist medical examination and their performance within the field of emergency medicine remains unknown. Methods: We explored the performance of three prevalent LLMs (OpenAI's GPT series, Google's Bard, and Microsoft's Bing Chat) on a practice ACEM primary examination. Results: All LLMs achieved a passing score, with scores with GPT 4.0 outperforming the average candidate. Conclusion: Large language models, by passing the ACEM primary examination, show potential as tools for medical education and practice. However, limitations exist and are discussed.

Original languageEnglish
Pages (from-to)876-878
Number of pages3
JournalEMA - Emergency Medicine Australasia
Issue number5
Publication statusPublished - Oct 2023


  • artificial intelligence
  • Bing
  • chat GPT
  • emergency medicine
  • medical education
  • specialty examination

