Deep Learning for Medical Image Interpretation

Pranav Rajpurkar

PhD Candidate, Computer Science Stanford ML Group

Cutting-edge of AI for Healthcare Research

Research in the

Stanford ML Group

High Performance AI+Healthcare

High Performance AI+Healthcare

Ophthalmologist-Level Performance Dec 2016

Gulshan et al, 2016

Dermatologist-Level Performance in Feb 2017

Esteva et al, 2017

Pathologist-Level Performance in Dec 2017

Bejnordi et al, 2017

Cardiologist-level Performance in July 2017

Rajpurkar et al, 2017

Systems already in clinical practice

Big application - Radiology

Detection of Breast Cancer with

Mammography

Rodríguez-Ruiz et al, 2018

View Classification of Echocardiograms

Madani et al, 2018

Pneumonia Detection

Work with Jeremy Irvin, Dr.

Matt Lungren, Dr. Curt

Langlotz, Dr. Robyn Ball, Dr.

Bhavik Patel, and others at

Stanford Medical School

CheXNet, Rajpurkar et al, 2017

Accelerating Field

2019 > 2018 > 2017

DiagnosisPrognosis Treatment

Mobile

Smartphones,

wearables (ECG, PPG

data)

EHR

 

Genomics

Medications, lab tests,

 

DNA sequencing, RNA

clinical notes

 

measurements

 

 

 

Medical Imaging

X-ray, CT, MRI,

Ultrasound

Clinician

Operations

Inside a hospital room

Medical Imaging

Across the hospital

EHR

Beyond the hospital

Mobile

Patients

Detecting middle ear fluid using smartphones + AI

Chan et al, Science Translational Medicine 2019

Detecting abnormal heart rhythms with AI and patches

Work with Masoumeh Haghpanahi at iRhythm Technologies, Dr. Georey H. Tison at UCSF and others

Hannun & Rajpurkar et al., Nature Medicine, 2019

Lung cancer screening with deep learning on low-dose chest computed tomography

Ardila  et al, 2019

DL to detect appendicitis

Work with Dr. Bhavik Patel, Allison Park, Jeremy

Irvin and others at Stanford Medical School

Ongoing work, Stanford ML Group

Pathologist-level interpretable whole-slide cancer diagnosis with deep learning

Zhang et al, 2019

Similar image search for histopathology:

SMILY

Hedge et al, 2019

Weakly supervised deep learning on whole slide images

Work with Bora Uyumazturk, Amir Kiani, Dr. Jeanne Shen, Dr. Robyn Ball, Dr. Curt Langlotz, and others at Stanford Medical School

Ongoing work, Stanford ML Group

DL to predict microsatellite instability directly from histology in gastrointestinal cancer

Kather et al, 2019

Weakly supervised deep learning on whole slide images

Campanella et al., 2019

RL Approach to Pick Pathology Patches

Work with Amir Kiani, Dr. Jeanne Shen, and others at Stanford Medical School

Ongoing work, Stanford ML Group

Performance of DL vs Manual Grading for Detecting Diabetic Retinopathy in India

Gulshan et al., 2019

We are investigating performance of XRay Interpretation in clinic

Work in Stanford ML Group (Anuj Pareek, Sharon Zhou, Mark Sabini, Minh Phu, Chris Wang, Dr. Lungren, and others)

Detection of Brain Activation in Unresponsive Patients with Acute Brain Injury

Claassen et al., NEJM, 2019

Detection of treatment response to antidepressant medications using EEG

Work in Stanford ML Group (Nathan Dass, Vinjai Vale, Jingbo Yang, Dr. Leanne Williams)

Continuous prediction of future acute kidney injury

Tomasev et al., 2019

Prediction of 6-month mortality for palliative care

Avati et al., 2018 (Stanford ML Group with Prof. Nigam Shah and others)

Applications of deep learning in fluorescence microscopy

Belthangady et al., 2019

Can we detect abnormalities with just a few drops of blood?

Shape: 10% cells

Size:

Density: Distribution

sickled. See Sample 1.

MCV mean 120 std 10

Abnormal. Cells too light.

 

 

 

Ongoing work, Stanford ML Group (with Nathan Dass, Prof. Utkan Demirci, Prof. Matt Lungren and others

Deep learning with Real-time Inference in Cell Sorting and Flow Cytometry

Li et al., 2019

Cell Sorting with Deep Learning

Ongoing work, Stanford ML Group (with Bora Uyumazturk, Jose Giron, Prof. Gozde Durmus, Prof. Utkan Demirci

Genome-wide cell-free DNA fragmentation in patients with cancer

Christiano et al, 2019

Tracking the Leader: Gaze Behavior in Group Interactions

Capozzi et al, 2019

https://doctorpenguin.com

With Eric Topol

Paradigm shift of deep learning

What changed?

Machine Learning Framework

https://www.slideshare.net/LuMa921/deep-learning-a-visual-introduction

Paradigm shift

Feature Extraction might include software to extract corners, Edges. Bottleneck is human time and creativity!

https://cdn-images-1.medium.com/max/1600/0*XdmPfRUMnLUHsGJp.png

Paradigm shift

https://cdn-images-1.medium.com/max/1600/0*XdmPfRUMnLUHsGJp.png

What changed?

https://www.slideshare.net/LuMa921/deep-learning-a-visual-introduction/8

Next Paradigm shift?

ML engineers still involved in design of neural network architectures, and data processing decisions.

Trend is to automate that away

https://pbs.twimg.com/media/CYHtC4WVAAEyGAL.png

https://pbs.twimg.com/media/CYHtC4WVAAEyGAL.png

To what extent are expert-level algorithms helping clinicians make better decisions?

DL detects abnormalities in knee MRIs at the level of radiologists

Work with Dr. Matt Lungren, Dr. Curt Langlotz, Dr. Robyn Ball, Dr. Bhavik Patel, and others at Stanford Medical School

Clinical Background

Magnetic resonance imaging of the knee is the standard of care imaging modality to evaluate knee disorders.

More musculoskeletal MRI examinations are performed on the knee than on any other region of the body.

Rajpurkar & Irvin et al., PLOS Medicine, 2018

DL can detect abnormalities in knee MRIs at the level of radiologists

Rajpurkar & Irvin et al., PLOS Medicine, 2018

Accuracy of experts w/ & w/o assistance can be compared with crossover design

Bien & Rajpurkar et al, PLOS Medicine, 2018

Model assistance improved specificity of detecting ACL tears on knee MRIs

Model assistance resulted in a

mean increase of 0.048 (4.8%) in ACL specificity.

No other significant improvements.

Bien & Rajpurkar et al, PLOS Medicine, 2018

DL detects cerebral aneurysms in head CTAs at the level of radiologists

Work with Dr. Kristen Yeom, Dr. Matt Lungren, Dr. Robyn Ball, Dr. Bhavik Patel, and others at Stanford Medical School

Clinical Background

Aneurysms occur in 1-3% of the population (bulging area in the wall of an artery in the brain, resulting in an abnormal widening, ballooning, which is at risk for rupture).

CT angiography (CTA) is the primary imaging modality currently used for diagnosis and pre-surgical planning of intracranial aneurysms.

Park & Chute & Rajpurkar et al., to appear

DL detects cerebral aneurysms in head CTAs at the level of radiologists

Park & Chute & Rajpurkar et al., to appear

Does model assistance with segmentation improve clinicians in detecting aneurysms?

Park & Chute & Rajpurkar et al, JAMA Network Open, 2019

Model assistance improved sensitivity in detecting aneurysms from head CTAs

Significant improvement in sensitivity and accuracy of finding aneurysms.

Specificity improvement is not significant.

 

 

 

Without

 

 

 

 

 

 

 

 

 

 

 

 

 

With

Augmentation

 

 

 

Metric

 

Augmentation

 

 

P-value

 

 

 

(95% CI)

 

 

 

 

 

(95% CI)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Sensitivity

0.831 (0.794, 0.862)

0.890 (0.858, 0.915)

 

0.01

 

 

 

 

 

 

 

 

 

 

 

 

Specificity

0.960 (0.937, 0.974)

0.975 (0.957, 0.986)

 

0.16

Accuracy

0.893 (0.782, 0.912)

0.932 (0.913, 0.946)

0.02

 

 

 

 

Park & Chute & Rajpurkar et al, to appear

Clinician performance can be equalized with model assistance

Greatest improvement for clinician with lowest unaugmented score.

Smallest improvement for clinician with highest unaugmented score.

Park & Chute & Rajpurkar et al, to appear

Does model assistance improve cancer subtyping for pathologists?

Cancer Type A: 90%

Cancer Type B: 10%

Work with Dr. Jeanne Shen, Dr. Robyn Ball, Dr. Curt Langlotz, and others at Stanford Medical School

Rajpurkar & Uyumazturk & Kiani et al, in submission

Assistance in the form of likelihoods and heatmaps showing regions most consistent with pathology.

Assistance did not significantly increase the accuracy of the pathologists

-Improvement in accuracy of the pathologists as a group is not significant (p = 0.184).

Rajpurkar & Uyumazturk & Kiani et al, in submission

Accuracy of the diagnostic assistant is dependent on pathologist experience

-Accuracy in the hands of gastrointestinal (GI) pathology specialists significantly higher compared with non-GI specialists and pathology trainees (p = 0.032).

Rajpurkar & Uyumazturk & Kiani et al, in submission

Can DL improve performance of physicians in South Africa on TB task

Work with Dr. Matt Lungren, Dr. Robyn Ball, and others at Stanford Medical School, and Dr. Tom Boyles at University of Cape Town, SA.

Rajpurkar & O’Connell et al, in submission

Clinicians Assisted is more accurate than unassisted (p = 0.002)

 

Accuracy (95% CI)

Sensitivity (95% CI)

Specificity (95% CI)

 

 

 

 

Clinicians Assisted

0.653 (0.602, 0.703)

0.728 (0.660, 0.797)

0.609 (0.517, 0.701)

 

 

 

 

Clinicians

0.602 (0.572, 0.632)

0.704 (0.636, 0.772)

0.521 (0.449, 0.593)

Unassisted

 

 

 

 

 

 

 

Rajpurkar & O’Connell et al, in submission

Autonomous Algorithm more accurate than Clinicians Assisted (p < 0.001)

 

Accuracy (95% CI)

Sensitivity (95% CI)

Specificity (95% CI)

 

 

 

 

Clinicians Assisted

0.653 (0.602, 0.703)

0.728 (0.660, 0.797)

0.609 (0.517, 0.701)

 

 

 

 

Clinicians

0.602 (0.572, 0.632)

0.704 (0.636, 0.772)

0.521 (0.449, 0.593)

Unassisted

 

 

 

 

 

 

 

Autonomous

0.794 (0.769, 0.818)

0.671 (0.616, 0.725)

0.871 (0.847, 0.895)

Algorithm

 

 

 

 

 

 

 

Rajpurkar & O’Connell et al, in submission

To what extent are expert level algorithms working in clinical practice?

Challenge with the deployment of these algorithms is that test != train

Doctor-level performances in lab setting, test set often from the same distribution as training set (internal validation).

Challenge with the deployment of these algorithms is that testing set may look very dierent from training set.

Dierent properties of data (protocol, prevalences, non-stationarity)

Imagine snapping a photo of a chest x-ray for immediate analysis

Clinical Background

The majority of chest x-rays in the world are done on film.

With Amir Kiani and others, Stanford ML Group

XRay4All: Building a deployable web app for automated medical imaging

Platform analyzes uploaded images on a secure cloud backend and provides a probabilistic interpretation for dierent medical conditions.

You can also “teach” our AI agents by providing labeled images or correcting its reported interpretations.”

Work with Dr. Matt Lungren, Dr. Jeanne Shen,

Dr. Terry Fotre, and others at the Stanford Med

School and Hospital

With Amir Kiani and others, Stanford ML Group

We are building for fast inference time and secure storage

With Amir Kiani and others, Stanford ML Group

We are investigating performance of CheXNet when taking photos in clinic

Ongoing work, Stanford ML Group

XRay4All: Live demo

Link

With Amir Kiani and others, Stanford ML Group

We have released labeled datasets needed to train + validate models

200k Chest X-Rays

 

1,370 knee MRI exams

40k bone x-rays

Expert Ground Truth Labels

 

Expert Ground Truth Labels

Expert Ground

Radiologist Benchmarks

 

Open Competition

Truth Labels

 

 

 

 

Work with Dr. Matt Lungren, Dr. Curt Langlotz and many others at the Stanford Med School Irvin & Rajpurkar et al. 2019, AAAI; Bien & Rajpurkar et al, PLOS Medicine; Rajpurkar & Irvin et al. 2018, MIDL

Accurate labels generated w/ open-sourced labeler for radiology reports

200k Chest X-Rays

Expert Ground Truth Labels

Radiologist Benchmarks

Irvin & Rajpurkar et al. 2019, AAAI; Bien & Rajpurkar et al, PLOS Medicine; Rajpurkar & Irvin et al. 2018, MIDL

Open invite to the world to participate in our competitions w/ hidden test set

200k Chest X-Rays

 

1,370 knee MRI exams

 

40k bone x-rays

Expert Ground Truth Labels

 

Expert Ground Truth Labels

 

Expert Ground

Radiologist Benchmarks

 

Open Competition

 

Truth Labels

 

 

 

 

 

 

 

 

 

 

Irvin & Rajpurkar et al. 2019, AAAI; Bien & Rajpurkar et al, PLOS Medicine; Rajpurkar & Irvin et al. 2018, MIDL

AI For Healthcare Bootcamp

AI For Healthcare

Bootcamp

Bootcamp is a 2-quarter program that provides Stanford students an opportunity to do cutting-edge research at the intersection of AI and healthcare.

AI For Healthcare

Bootcamp

Training from PhD students and faculty in the medical school to work on high-impact research in small interdisciplinary teams.

AI For Healthcare Bootcamp

https://stanfordmlgroup.github.io/programs/aihc-bootcamp/