Can ChatGPT Write Effective Board-Style Dermatology Exam Questions?

by Haroon Ahmad, MD 2025-01-01 00:00
PhysicianPractice Innovation

🔍 Key Finding ChatGPT demonstrates limitations as a study tool for the American Board of Dermatology Applied Exam, generating only 40% of questions deemed accurate and appropriate due to issues with complexity, clarity, and domain-specific knowledge. While ChatGPT may assist in creating simple questions, it cannot replace the expertise of dermatologists in developing high-quality board-style questions.

🔬 Methodology Overview

  • Design: Qualitative analysis of ChatGPT-generated questions.
  • Data Source: Continuing medical education (CME) articles from the Journal of the American Academy of Dermatology (JAAD) (Volume 88, Issues 1-4).
  • Question Generation: Used ChatPDF, an application integrating PDF files with ChatGPT 3.5, to generate five American Board of Dermatology Applied Exam (ABD-AE)-style multiple-choice questions per CME article.
  • Evaluation: Two board-certified dermatologists independently assessed the generated questions.
  • Assessment Criteria: Accuracy, complexity, and clarity of questions and their suitability for ABD-AE study preparation.
  • Consensus: Dermatologists discussed discrepancies in their evaluations to reach a consensus.

📊 Results

  • 40 total multiple-choice questions were generated by ChatPDF based on eight continuing medical education (CME) articles from the Journal of the American Academy of Dermatology.
  • Only 16 out of 40 questions (40%) were deemed accurate and appropriate for American Board of Dermatology Applied Exam (ABD-AE) study preparation.
  • 10 questions (25%) were assessed as having low complexity.
  • 9 questions (22.5%) were categorized as vague or unclear.
  • 5 questions (12.5%) contained inaccuracies.
  • ChatGPT demonstrated limitations in understanding domain-specific dermatology knowledge, generating high-quality distractor options, and creating image-based questions.
  • The study concluded that ChatGPT cannot replace the expertise of dermatologists and medical educators in developing high-quality board-style questions.

💡 Clinical Impact This study highlights the limitations of ChatGPT as a study tool for dermatology board exams, demonstrating that while it can generate simple questions, it lacks the complexity and accuracy required for effective exam preparation. This suggests that medical educators and domain experts remain essential for creating high-quality board-style questions that accurately assess a candidate’s knowledge.

🤔 Limitations

  • Low complexity of generated questions.
  • Vague or unclear phrasing of questions and answer choices.
  • Inaccurate medical information in questions and answer choices.
  • Lack of domain-specific knowledge in dermatology.
  • Inability to understand context and generate high-quality distractor options.
  • Incapacity to generate images.
  • Cannot replace the expertise of dermatologists and medical educators.

✨ What It Means For You This study reveals that while ChatGPT can assist in generating simple dermatology questions, it cannot replace the expertise of dermatologists in crafting high-quality, board-style questions for exam preparation. Doctors should therefore exercise caution when using AI-generated questions and prioritize expert-created materials for assessing knowledge and reasoning abilities.

Reference Ayub I, Hamann D, Hamann CR, Davis MJ. Exploring the Potential and Limitations of Chat Generative Pre-trained Transformer (ChatGPT) in Generating Board-Style Dermatology Questions: A Qualitative Analysis. Cureus. 2023;15:e43717. https://doi.org/10.7759/cureus.43717