Deep Learning for Medical Image Interpretation
Pranav Rajpurkar
PhD Candidate, Computer Science Stanford ML Group
Research in the
Stanford ML Group
High Performance AI+Healthcare
High Performance AI+Healthcare
Gulshan et al, 2016
Esteva et al, 2017
Bejnordi et al, 2017
Rajpurkar et al, 2017
Systems already in clinical practice
Big application - Radiology
Detection of Breast Cancer with
Mammography
View Classification of Echocardiograms
Madani et al, 2018
Pneumonia Detection
Work with Jeremy Irvin, Dr.
Matt Lungren, Dr. Curt
Langlotz, Dr. Robyn Ball, Dr.
Bhavik Patel, and others at
Stanford Medical School
CheXNet, Rajpurkar et al, 2017
Accelerating Field
2019 > 2018 > 2017
DiagnosisPrognosis Treatment
Mobile
Smartphones,
wearables (ECG, PPG
data)
EHR |
|
Genomics |
Medications, lab tests, |
|
DNA sequencing, RNA |
clinical notes |
|
measurements |
|
|
|
Medical Imaging
Ultrasound
Clinician
Operations
Inside a hospital room
Medical Imaging
Across the hospital
EHR
Beyond the hospital
Mobile
Patients
Detecting middle ear fluid using smartphones + AI
Chan et al, Science Translational Medicine 2019
Detecting abnormal heart rhythms with AI and patches
Work with Masoumeh Haghpanahi at iRhythm Technologies, Dr. Geoffrey H. Tison at UCSF and others
Hannun & Rajpurkar et al., Nature Medicine, 2019
Lung cancer screening with deep learning on
Ardila et al, 2019
DL to detect appendicitis
Work with Dr. Bhavik Patel, Allison Park, Jeremy
Irvin and others at Stanford Medical School
Ongoing work, Stanford ML Group
Zhang et al, 2019
Similar image search for histopathology:
SMILY
Hedge et al, 2019
Weakly supervised deep learning on whole slide images
Work with Bora Uyumazturk, Amir Kiani, Dr. Jeanne Shen, Dr. Robyn Ball, Dr. Curt Langlotz, and others at Stanford Medical School
Ongoing work, Stanford ML Group
DL to predict microsatellite instability directly from histology in gastrointestinal cancer
Kather et al, 2019
Weakly supervised deep learning on whole slide images
Campanella et al., 2019
RL Approach to Pick Pathology Patches
Work with Amir Kiani, Dr. Jeanne Shen, and others at Stanford Medical School
Ongoing work, Stanford ML Group
Performance of DL vs Manual Grading for Detecting Diabetic Retinopathy in India
Gulshan et al., 2019
We are investigating performance of XRay Interpretation in clinic
Work in Stanford ML Group (Anuj Pareek, Sharon Zhou, Mark Sabini, Minh Phu, Chris Wang, Dr. Lungren, and others)
Detection of Brain Activation in Unresponsive Patients with Acute Brain Injury
Claassen et al., NEJM, 2019
Detection of treatment response to antidepressant medications using EEG
Work in Stanford ML Group (Nathan Dass, Vinjai Vale, Jingbo Yang, Dr. Leanne Williams)
Continuous prediction of future acute kidney injury
Tomasev et al., 2019
Prediction of
Avati et al., 2018 (Stanford ML Group with Prof. Nigam Shah and others)
Applications of deep learning in fluorescence microscopy
Belthangady et al., 2019
Can we detect abnormalities with just a few drops of blood?
Shape: 10% cells |
Size: |
Density: Distribution |
sickled. See Sample 1. |
MCV mean 120 std 10 |
Abnormal. Cells too light. |
|
|
|
Ongoing work, Stanford ML Group (with Nathan Dass, Prof. Utkan Demirci, Prof. Matt Lungren and others
Deep learning with
Li et al., 2019
Cell Sorting with Deep Learning
Ongoing work, Stanford ML Group (with Bora Uyumazturk, Jose Giron, Prof. Gozde Durmus, Prof. Utkan Demirci
Christiano et al, 2019
Tracking the Leader: Gaze Behavior in Group Interactions
Capozzi et al, 2019
https://doctorpenguin.com
With Eric Topol
Paradigm shift of deep learning
What changed?
Machine Learning Framework
Paradigm shift
Feature Extraction might include software to extract corners, Edges. Bottleneck is human time and creativity!
Paradigm shift
What changed?
Next Paradigm shift?
ML engineers still involved in design of neural network architectures, and data processing decisions.
Trend is to automate that away
https://pbs.twimg.com/media/CYHtC4WVAAEyGAL.png
https://pbs.twimg.com/media/CYHtC4WVAAEyGAL.png
To what extent are
DL detects abnormalities in knee MRIs at the level of radiologists
Work with Dr. Matt Lungren, Dr. Curt Langlotz, Dr. Robyn Ball, Dr. Bhavik Patel, and others at Stanford Medical School
Clinical Background
Magnetic resonance imaging of the knee is the standard of care imaging modality to evaluate knee disorders.
More musculoskeletal MRI examinations are performed on the knee than on any other region of the body.
Rajpurkar & Irvin et al., PLOS Medicine, 2018
DL can detect abnormalities in knee MRIs at the level of radiologists
Rajpurkar & Irvin et al., PLOS Medicine, 2018
Accuracy of experts w/ & w/o assistance can be compared with crossover design
Bien & Rajpurkar et al, PLOS Medicine, 2018
Model assistance improved specificity of detecting ACL tears on knee MRIs
Model assistance resulted in a
mean increase of 0.048 (4.8%) in ACL specificity.
No other significant improvements.
Bien & Rajpurkar et al, PLOS Medicine, 2018
DL detects cerebral aneurysms in head CTAs at the level of radiologists
Work with Dr. Kristen Yeom, Dr. Matt Lungren, Dr. Robyn Ball, Dr. Bhavik Patel, and others at Stanford Medical School
Clinical Background
Aneurysms occur in
CT angiography (CTA) is the primary imaging modality currently used for diagnosis and
Park & Chute & Rajpurkar et al., to appear
DL detects cerebral aneurysms in head CTAs at the level of radiologists
Park & Chute & Rajpurkar et al., to appear
Does model assistance with segmentation improve clinicians in detecting aneurysms?
Park & Chute & Rajpurkar et al, JAMA Network Open, 2019
Model assistance improved sensitivity in detecting aneurysms from head CTAs
Significant improvement in sensitivity and accuracy of finding aneurysms.
Specificity improvement is not significant.
|
|
|
Without |
|
|
|
|
|
|
|
|
|
|
|
|
|
With |
Augmentation |
|
|
|
|
Metric |
|
Augmentation |
|
|
|||||
|
|
|
(95% CI) |
|
|
|||||
|
|
|
(95% CI) |
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
Sensitivity |
0.831 (0.794, 0.862) |
0.890 (0.858, 0.915) |
|
0.01 |
|||||
|
|
|
|
|
|
|
|
|
|
|
|
Specificity |
0.960 (0.937, 0.974) |
0.975 (0.957, 0.986) |
|
0.16 |
Accuracy |
0.893 (0.782, 0.912) |
0.932 (0.913, 0.946) |
0.02 |
|
|
|
|
Park & Chute & Rajpurkar et al, to appear
Clinician performance can be equalized with model assistance
Greatest improvement for clinician with lowest unaugmented score.
Smallest improvement for clinician with highest unaugmented score.
Park & Chute & Rajpurkar et al, to appear
Does model assistance improve cancer subtyping for pathologists?
Cancer Type A: 90%
Cancer Type B: 10%
Work with Dr. Jeanne Shen, Dr. Robyn Ball, Dr. Curt Langlotz, and others at Stanford Medical School
Rajpurkar & Uyumazturk & Kiani et al, in submission
Assistance in the form of likelihoods and heatmaps showing regions most consistent with pathology.
Assistance did not significantly increase the accuracy of the pathologists
-Improvement in accuracy of the pathologists as a group is not significant (p = 0.184).
Rajpurkar & Uyumazturk & Kiani et al, in submission
Accuracy of the diagnostic assistant is dependent on pathologist experience
-Accuracy in the hands of gastrointestinal (GI) pathology specialists significantly higher compared with
Rajpurkar & Uyumazturk & Kiani et al, in submission
Can DL improve performance of physicians in South Africa on TB task
Work with Dr. Matt Lungren, Dr. Robyn Ball, and others at Stanford Medical School, and Dr. Tom Boyles at University of Cape Town, SA.
Rajpurkar & O’Connell et al, in submission
Clinicians Assisted is more accurate than unassisted (p = 0.002)
|
Accuracy (95% CI) |
Sensitivity (95% CI) |
Specificity (95% CI) |
|
|
|
|
|
|
Clinicians Assisted |
0.653 (0.602, 0.703) |
0.728 (0.660, 0.797) |
0.609 (0.517, 0.701) |
|
|
|
|
|
|
Clinicians |
0.602 (0.572, 0.632) |
0.704 (0.636, 0.772) |
0.521 (0.449, 0.593) |
|
Unassisted |
||||
|
|
|
||
|
|
|
|
Rajpurkar & O’Connell et al, in submission
Autonomous Algorithm more accurate than Clinicians Assisted (p < 0.001)
|
Accuracy (95% CI) |
Sensitivity (95% CI) |
Specificity (95% CI) |
|
|
|
|
|
|
Clinicians Assisted |
0.653 (0.602, 0.703) |
0.728 (0.660, 0.797) |
0.609 (0.517, 0.701) |
|
|
|
|
|
|
Clinicians |
0.602 (0.572, 0.632) |
0.704 (0.636, 0.772) |
0.521 (0.449, 0.593) |
|
Unassisted |
||||
|
|
|
||
|
|
|
|
|
Autonomous |
0.794 (0.769, 0.818) |
0.671 (0.616, 0.725) |
0.871 (0.847, 0.895) |
|
Algorithm |
||||
|
|
|
||
|
|
|
|
Rajpurkar & O’Connell et al, in submission
To what extent are expert level algorithms working in clinical practice?
Challenge with the deployment of these algorithms is that test != train
Challenge with the deployment of these algorithms is that testing set may look very different from training set.
○Different properties of data (protocol, prevalences,
Imagine snapping a photo of a chest
Clinical Background
The majority of chest
With Amir Kiani and others, Stanford ML Group
XRay4All: Building a deployable web app for automated medical imaging
Platform analyzes uploaded images on a secure cloud backend and provides a probabilistic interpretation for different medical conditions.
You can also “teach” our AI agents by providing labeled images or correcting its reported interpretations.”
Work with Dr. Matt Lungren, Dr. Jeanne Shen,
Dr. Terry Fotre, and others at the Stanford Med
School and Hospital
With Amir Kiani and others, Stanford ML Group
We are building for fast inference time and secure storage
With Amir Kiani and others, Stanford ML Group
We are investigating performance of CheXNet when taking photos in clinic
Ongoing work, Stanford ML Group
We have released labeled datasets needed to train + validate models
200k Chest |
|
1,370 knee MRI exams |
40k bone |
Expert Ground Truth Labels |
|
Expert Ground Truth Labels |
Expert Ground |
Radiologist Benchmarks |
|
Open Competition |
Truth Labels |
|
|
|
|
Work with Dr. Matt Lungren, Dr. Curt Langlotz and many others at the Stanford Med School Irvin & Rajpurkar et al. 2019, AAAI; Bien & Rajpurkar et al, PLOS Medicine; Rajpurkar & Irvin et al. 2018, MIDL
Accurate labels generated w/
200k Chest
Expert Ground Truth Labels
Radiologist Benchmarks
Irvin & Rajpurkar et al. 2019, AAAI; Bien & Rajpurkar et al, PLOS Medicine; Rajpurkar & Irvin et al. 2018, MIDL
Open invite to the world to participate in our competitions w/ hidden test set
200k Chest |
|
1,370 knee MRI exams |
|
40k bone |
Expert Ground Truth Labels |
|
Expert Ground Truth Labels |
|
Expert Ground |
Radiologist Benchmarks |
|
Open Competition |
|
Truth Labels |
|
|
|
|
|
|
|
|
|
|
Irvin & Rajpurkar et al. 2019, AAAI; Bien & Rajpurkar et al, PLOS Medicine; Rajpurkar & Irvin et al. 2018, MIDL
AI For Healthcare Bootcamp
AI For Healthcare
Bootcamp
Bootcamp is a
AI For Healthcare
Bootcamp
Training from PhD students and faculty in the medical school to work on