Can Artificial Intelligence Improve Skin Lesion Diagnosis in Primary Care?
🔍 Key Finding In a real-world primary care setting, an AI model for diagnosing 44 skin diseases had lower diagnostic accuracy than general practitioners and teledermatologists, highlighting the need for clinician involvement in AI training and model adaptation to real-world settings. However, the AI’s high specificity and usefulness in generating differential diagnoses suggest its potential as a clinical decision support tool, particularly after addressing current limitations and expanding the training dataset.
🔬 Methodology Overview
- Design: Prospective, multicentre observational feasibility study.
- Setting: 6 Primary Care centers in central Catalonia, Spain.
- Participants: 100 consecutive patients ≥18 years old consulting for a skin lesion. Exclusion criteria included lesions that could not be photographed or patients with difficulty understanding the protocol.
- Intervention: GPs made an initial diagnosis (Top-3), then photographed the lesion and uploaded it to the Autoderm AI model, which generated a Top-5 differential diagnosis. The same image was sent to a dermatologist via teledermatology (TD) for diagnosis (Top-3). A second dermatologist reviewed all 100 images (Top-3) and a third dermatologist adjudicated discrepancies to establish a gold standard diagnosis.
- Data Collection: Demographics, phototype, diagnostic difficulty and certainty (GP and dermatologists), image quality, consultation time, and GP satisfaction with the AI tool.
- Outcome Measures: Diagnostic accuracy, sensitivity, and specificity of GP, TD, and AI assessments (Top-1, Top-3, and Top-5 for AI and TD; Top-3 for GP).
- Statistical Analysis: Descriptive statistics, confusion matrices, and 95% confidence intervals were calculated. Subgroup analysis was performed for the 82 cases where the gold standard diagnosis was among the 44 conditions included in the AI model.
📊 Results
- Overall Top-1 Accuracy: The AI model’s overall Top-1 accuracy was 39%, lower than GPs (64%) and dermatologists (72%).
- Top-3 and Top-5 Accuracy: The AI’s Top-3 accuracy (61%) increased significantly, and its Top-5 accuracy (72%) was comparable to dermatologists’ Top-3 accuracy (90%).
- Accuracy on Trained Diagnoses: When limited to the 44 skin diseases the AI was trained on (82 out of 100 cases), its Top-1 accuracy increased to 48%, Top-3 to 75%, and Top-5 to 89%.
- Sensitivity for Benign Tumors: The AI demonstrated higher sensitivity (Top-3 87%, Top-5 96%) than clinicians (GPs Top-3 76%, dermatologists Top-3 84%) for benign tumors, the most prevalent category (53 cases).
- Specificity: Specificity across all assessment levels (AI, TD, and GP) was consistently high, ranging from 0.96 to 0.99.
- GP Satisfaction: 92% of GPs found the AI useful for differential diagnosis, and 60% found it helpful in reaching a final diagnosis. 34% believed it could have prevented a teledermatology consultation.
💡 Clinical Impact The overall diagnostic accuracy of the AI model for skin lesions, under real-life conditions in primary care, was lower than that of both general practitioners and teledermatologists, emphasizing the need for clinician involvement in model training and data validation in real-world settings. However, the AI model showed promise as a diagnostic support tool, particularly for differential diagnosis, and could potentially improve primary care efficiency by reducing unnecessary referrals and expediting diagnoses.
🤔 Limitations
- The number of images used (n=100) for performance evaluation of the ML model is limited. Some conditions may not be evaluated due to unbalanced sample data, leading to an insufficient confidence level.
- No representative results were obtained for less frequent diseases due to the small sample size and consecutive case collection.
- The GPs who participated in the study voluntarily showed an interest in dermatology, which may have led to higher diagnostic accuracy than reported in the literature.
- Diagnoses made with a single image may have inherent limitations compared to diagnoses made in a clinical setting.
- The majority of phototypes in the study population are type II and III, which could be related to a decrease in diagnostic accuracy.
- It is not possible to know the number of patients invited to participate in the study because the GPs did not register the patients who declined.
✨ What It Means For You This study suggests that AI can be a useful tool for differential diagnosis of skin lesions in primary care, potentially saving time and allowing dermatology specialists to focus on complex cases. However, the lower accuracy compared to clinicians highlights the need for further development and training of AI models on diverse datasets and real-world images before widespread implementation in primary care.
Reference Escalé-Besa A, Yélamos O, Vidal-Alaball J, Fuster-Casanovas A, Miró Catalina Q, Börve A, Ander-Egg Aguilar R, Fustà-Novell X, Cubiró X, Esquius Rafat M, López-Sanchez C, Marin-Gomez FX. Exploring the potential of artificial intelligence in improving skin lesion diagnosis in primary care. Scientific Reports. 2023;13:4293. https://doi.org/10.1038/s41598-023-31340-1