Can Interpretable AI Enhance Trust and Accuracy in Skin Lesion Diagnosis?

by Haroon Ahmad, MD 2025-01-01 00:00
PhysicianPractice Innovation

🔍 Key Finding Explanations generated by the XAI method ABELE, using exemplars and counter-exemplars of skin lesions, increased trust and confidence in an AI diagnostic system, particularly among domain experts, and latent space analysis of the AI model revealed distinct separations among skin lesion classes that could improve diagnostic accuracy.

🔬 Methodology Overview

  • Design: Case study evaluating an explainable AI (XAI) method for skin lesion classification.
  • Data Source: ISIC 2019 (International Skin Imaging Collaboration) challenge dataset.
  • Black Box Classifier Training: ResNet-50 architecture, pre-trained on ImageNet, fine-tuned on ISIC 2019 data (excluding UNK class), using transfer learning and binary cross-entropy loss.
  • Explanation Method: ABELE (Adversarial Black box Explainer generating Latent Exemplars) utilizing a Progressively Growing Adversarial Autoencoder (PGAAE) with 256 latent features, incorporating denoising and minibatch discrimination techniques.
  • Explanation Output: Exemplars, counter-exemplars, and saliency maps.
  • Evaluation: User study with domain experts, novices, and laypersons assessing trust, confidence, and explanation helpfulness, along with quantitative evaluation of saliency maps using deletion and insertion AUC.
  • Latent Space Analysis: Multidimensional scaling (MDS) to visualize and analyze the latent space for class separability and potential diagnostic insights.

📊 Results

  • Classifier Performance: ResNet-based deep learning model achieved 0.838 normalized multi-class accuracy on the ISIC 2019 skin lesion dataset test set.
  • ABELE Explanations & Trust: Explanations generated by ABELE generally increased user confidence in the AI’s classification, except in two instances (Q3 and Q9). Confidence increased from 67.69% to 77.12% after viewing explanations.
  • Incorrect Advice & Trust: Medical experts showed a 14% decrease in confidence after receiving incorrect advice from the AI, demonstrating the impact of explanation accuracy on trust.
  • Exemplars vs. Counter-Exemplars: Exemplars were found to be more helpful than counter-exemplars in understanding the AI’s classification, particularly for medical experts.
  • Saliency Map Comparison: ABELE generated more detailed and effective saliency maps compared to LIME and LORE, as evidenced by lower deletion AUC scores (ABELE: 0.461, LIME: 0.736, LORE: 0.711) and higher insertion AUC scores (ABELE: 0.748, LIME: 0.417, LORE: 0.471).
  • Latent Space Analysis: Melanoma lesions occupied a distinct region in the latent space compared to other skin lesion types, suggesting potential for improved differentiation between melanoma and frequently misdiagnosed benign lesions. A Random Forest classifier trained on the 2D MDS-reduced latent space achieved 85.60% accuracy in distinguishing Melanoma from Benign Keratosis and 78.53% accuracy in distinguishing Melanoma from Melanocytic Nevus.

💡 Clinical Impact This research demonstrates that explainable AI (XAI), specifically the ABELE method, can improve clinician trust and confidence in AI-driven skin lesion diagnosis by providing visual exemplars, counter-exemplars, and saliency maps, potentially leading to more accurate and reliable melanoma detection and differentiation from benign lesions. Furthermore, analysis of the AI’s latent space may offer new insights into distinguishing visually similar lesions, potentially reducing misdiagnosis rates.

🤔 Limitations

  • Reliance on a predefined dataset that may not fully represent the diversity of skin lesions and skin tones.
  • Potential biases in model training and explanation generation due to dataset limitations.
  • Limited focus on patient comprehension of the AI-generated explanations.
  • Current difficulty in generating high-fidelity explanations for complex or rare skin lesion cases with low model confidence.
  • Explanation extraction speed is dependent on image complexity, limiting real-time application.

✨ What It Means For You This research introduces an AI-powered tool, ABELE, that provides visual explanations (exemplars and counter-exemplars) for skin lesion classifications, aiming to increase dermatologist trust and diagnostic accuracy, particularly in differentiating challenging cases like melanoma vs. benign keratosis. The improved interpretability and latent space analysis offered by ABELE may lead to more confident diagnoses and potentially reduce misdiagnosis rates, though further real-world validation is needed.

Reference Metta C, Beretta A, Guidotti R, Yin Y, Gallinari P, Rinzivillo S, Giannotti F. Advancing Dermatological Diagnostics: Interpretable AI for Enhanced Skin Lesion Classification. Diagnostics. 2024;14:753. https://doi.org/10.3390/diagnostics14070753