What characterises a patient who is predicted to have high variation in thrombolysis use across different hospitals?#

We categorise patients based on the proportion of hospitals that are predicted to give them thrombolysis

  • Non-thrombolysed: 70% or more units would not thrombolyse this patient

  • Contentious: 30-70% of units would thrombolyse this patient

  • Thrombolysed: 70% or more units would thrombolyse this patient

We provide descriptive statistics for those three groups. We then go onto to train a Random Forest model to distinguish Contentious from Thrombolysed, and examine feture importance and Shapley values to identify features most predictive of the distinction between those two categories.

# Turn warnings off to keep notebook tidy
import warnings
warnings.filterwarnings("ignore")

Import libraries#

import os
import pandas as pd
import numpy as np
import pickle as pkl
import matplotlib.pyplot as plt
%matplotlib inline

import sklearn as sk
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import roc_curve, auc

from scipy.spatial.distance import hamming
from seriate import seriate

import shap

Load data#

train = pd.read_csv('./../data/10k_training_test/cohort_10000_train.csv')
test = pd.read_csv('./../data/10k_training_test/cohort_10000_test.csv')
# Combine train and test data
data = pd.concat([train, test], axis=0)

Load pre-trained models#

These models are Random Forest models trained for each hopsital, calibrated so that each hopsital has a unique threshold for classification, that gives the expected thrombolysis use at that hopsital.

with open ('./models/trained_hospital_models_for _cohort.pkl', 'rb') as f:
    
    hospital2model = pkl.load(f)

Pass 10K test cohort through all hospital models#

cohort = test

hospitals = list(set(data['StrokeTeam'].values))

cohort_results = pd.DataFrame(columns = hospitals, index = cohort.index.values)
cohort_results_prob = pd.DataFrame(columns = hospitals, index = cohort.index.values)
    
for hospital_train in hospitals:
   
    test_patients = cohort
    
    y = test_patients['S2Thrombolysis']
    X = test_patients.drop(['StrokeTeam','S2Thrombolysis'], axis=1)

    model = hospital2model[hospital_train][0]
    threshold = hospital2model[hospital_train][1]
    #threshold=0.5

    y_prob = model.predict_proba(X)[:,1]
    
    y_pred = [1 if p>=threshold else 0 for p in y_prob]
    
    new_column = pd.Series(y_pred, name=hospital_train, index=test_patients.index.values)
        
    cohort_results.update(new_column)
    
    new_column_prob = pd.Series(y_prob, name=hospital_train, index=test_patients.index.values)
        
    cohort_results_prob.update(new_column_prob)

Identify contentious patients#

We categorise patients based on the proportion of hospitals that are predicted to give them thrombolysis

  • Non-thrombolysed: 70% or more units would not thrombolyse this patient

  • Contentious: 30-70% of units would thrombolyse this patient

  • Thrombolysed: 70% or more units would thrombolyse this patient

# Get proportion of hopsitals thrombolysing, or agreeing
results = cohort_results 
results['sum'] = results.sum(axis=1)
results['percent'] = results['sum']/len(hospitals)
results['percent_agree'] = [max(p, 1-p) for p in results['percent']]

# Extract cohort patients that only 30-70% of hospitals would thrombolyse
contentious = results[results['percent_agree']<=0.7]
contentious = contentious.drop(['sum', 'percent', 'percent_agree'], axis=1)

Characterise contentious patients#

Onset to arrival#

content = (cohort.loc[contentious.index]['S1OnsetToArrival_min'].mean(),\
           cohort.loc[contentious.index]['S1OnsetToArrival_min'].std())
throm = (cohort.loc[cohort_results['percent']>0.7]['S1OnsetToArrival_min'].mean(),\
         cohort.loc[cohort_results['percent']>0.7]['S1OnsetToArrival_min'].std())
no_throm = (cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S1OnsetToArrival_min'].mean(),\
         cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S1OnsetToArrival_min'].std())

print(f'Contentious: Mean = {content[0]:.0f}, StdDev = {content[1]:.0f}')
print(f'Thrombolysed: Mean = {throm[0]:.0f}, StdDev = {throm[1]:.0f}')
print(f'Not Thrombolysed: Mean = {no_throm[0]:.0f}, StdDev = {no_throm[1]:.0f}')
Contentious: Mean = 102, StdDev = 49
Thrombolysed: Mean = 88, StdDev = 40
Not Thrombolysed: Mean = 124, StdDev = 55

NIHSS#

content = (cohort.loc[contentious.index]['S2NihssArrival'].mean(),\
           cohort.loc[contentious.index]['S2NihssArrival'].std())
throm = (cohort.loc[cohort_results['percent']>0.7]['S2NihssArrival'].mean(),\
         cohort.loc[cohort_results['percent']>0.7]['S2NihssArrival'].std())
no_throm = (cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S2NihssArrival'].mean(),\
         cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S2NihssArrival'].std())

print(f'Contentious: Mean = {content[0]:.1f}, StdDev = {content[1]:.1f}')
print(f'Thrombolysed: Mean = {throm[0]:.1f}, StdDev = {throm[1]:.1f}')
print(f'Not Thrombolysed: Mean = {no_throm[0]:.1f}, StdDev = {no_throm[1]:.1f}')
Contentious: Mean = 11.8, StdDev = 7.9
Thrombolysed: Mean = 13.3, StdDev = 6.5
Not Thrombolysed: Mean = 5.9, StdDev = 6.8

NIHSS components#

nihss = ['Loc', 'LocQuestions', 'LocCommands',
       'BestGaze', 'Visual', 'FacialPalsy', 'MotorArmLeft', 'MotorArmRight',
       'MotorLegLeft', 'MotorLegRight', 'LimbAtaxia', 'Sensory',
       'BestLanguage', 'Dysarthria', 'ExtinctionInattention']
       #'S2NihssArrival']
content = [cohort.loc[contentious.index][n].mean() for n in nihss]
content_err_min = [cohort.loc[contentious.index][n].quantile(0.25) for n in nihss]
content_err_max = [cohort.loc[contentious.index][n].quantile(0.75) for n in nihss]

content_err = [content_err_min, content_err_max]

throm = [cohort.loc[cohort_results['percent']>0.7][n].mean() for n in nihss]
throm_err_min = [cohort.loc[cohort_results['percent']>0.7][n].quantile(0.25) for n in nihss]
throm_err_max = [cohort.loc[cohort_results['percent']>0.7][n].quantile(0.75) for n in nihss]

throm_err = [throm_err_min, throm_err_max]

no_throm = [cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2Thrombolysis']==1][n].mean() for n in nihss]
no_throm_err_min = [cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2Thrombolysis']==1][n].quantile(0.25) for n in nihss]
no_throm_err_max = [cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2Thrombolysis']==1][n].quantile(0.75) for n in nihss]

no_throm_err = [no_throm_err_min, no_throm_err_max]
X = np.arange(len(nihss))
fig,ax = plt.subplots(figsize=(8,5))

ax.bar(X - 0.2, throm, color = 'b', width = 0.2, label='Thrombolysed', yerr= throm_err)
#ax.bar(X + 0.1, throm, color = 'r', width = 0.2, label = 'True Thrombolysed')
#ax.bar(X - 0.1, all_, color = 'g', width = 0.2, label = 'Cohort')
ax.bar(X, content, color = 'y', width = 0.2,  label = 'Contentious',yerr= content_err)
ax.bar(X + 0.2, no_throm, color = 'r', width = 0.2, label = 'Not Thrombolysed',yerr= no_throm_err)
ax.set_xticks(X)
ax.set_xticklabels(nihss,  rotation=90)
plt.legend(loc='best')
plt.tight_layout()
plt.grid()
plt.savefig('./output/contentious_nihss.jpg', dpi=300)
plt.show()
../_images/characterise_contentious_patients_20_0.png

Histograms of NIHSS

fig = plt.figure(figsize = (15,25))

bins = np.arange(6)
names = ['Thrombolysed','Contentious','Not Thrombolysed']

hist_list = []

for i in range(len(nihss)):
    
    ax = fig.add_subplot(5,3,i+1)
    
    content = cohort.loc[contentious.index][nihss[i]]
    throm = cohort.loc[cohort_results['percent']>0.7][nihss[i]]
    nothrom = cohort.loc[cohort_results['percent']<0.3][nihss[i]]
    
    dta = [throm,content,nothrom]
    x, bins, _ = ax.hist(dta,  histtype='bar', density=True, label=names, bins=bins, align='left')
    #ax.hist(nothrom,bins=bins, label = f'Not Thrombolysed', histtype='bar',stacked=True)
    #ax.hist(throm, bins=bins,label = f'Thrombolysed', histtype='bar',stacked=True)
    #ax.hist(content, bins=bins, label = f'Contentious', histtype='bar',stacked=True)
    hist_list.append(x)
    feat = nihss[i]
    
    plt.title(feat, fontsize=15)
    
    ax.legend(loc='best')

fig.savefig('./output/contentious_nihss_hist.jpg', dpi=300)
../_images/characterise_contentious_patients_22_0.png

Use Kolmogorov-Smirnov test to check for differenecnes in distrubtions.

from scipy.stats import kstest

for i in range(len(nihss)):
    
    content = cohort.loc[contentious.index][nihss[i]]
    throm = cohort.loc[cohort_results['percent']>0.7][nihss[i]]
    nothrom = cohort.loc[cohort_results['percent']<0.3][nihss[i]]
    
    print(nihss[i], '\n')
    
    print('Throm vs contentious')
    
    D,p = kstest(throm,content)
    print(f'D = {round(D,2)}, p = {round(p,3)}')
    
    D,p = kstest(nothrom,content)
    print('NoThrom vs contentious')
    print(f'D = {round(D,2)}, p = {round(p,3)}', '\n')
Loc 

Throm vs contentious
D = 0.02, p = 0.747
NoThrom vs contentious
D = 0.03, p = 0.136 

LocQuestions 

Throm vs contentious
D = 0.03, p = 0.542
NoThrom vs contentious
D = 0.18, p = 0.0 

LocCommands 

Throm vs contentious
D = 0.03, p = 0.552
NoThrom vs contentious
D = 0.11, p = 0.0 

BestGaze 

Throm vs contentious
D = 0.07, p = 0.0
NoThrom vs contentious
D = 0.14, p = 0.0 

Visual 

Throm vs contentious
D = 0.12, p = 0.0
NoThrom vs contentious
D = 0.14, p = 0.0 

FacialPalsy 

Throm vs contentious
D = 0.14, p = 0.0
NoThrom vs contentious
D = 0.23, p = 0.0 

MotorArmLeft 

Throm vs contentious
D = 0.09, p = 0.0
NoThrom vs contentious
D = 0.1, p = 0.0 

MotorArmRight 

Throm vs contentious
D = 0.03, p = 0.453
NoThrom vs contentious
D = 0.16, p = 0.0 

MotorLegLeft 

Throm vs contentious
D = 0.06, p = 0.003
NoThrom vs contentious
D = 0.11, p = 0.0 

MotorLegRight 

Throm vs contentious
D = 0.02, p = 0.71
NoThrom vs contentious
D = 0.15, p = 0.0 

LimbAtaxia 

Throm vs contentious
D = 0.01, p = 1.0
NoThrom vs contentious
D = 0.04, p = 0.02 

Sensory 

Throm vs contentious
D = 0.09, p = 0.0
NoThrom vs contentious
D = 0.19, p = 0.0 

BestLanguage 

Throm vs contentious
D = 0.02, p = 0.745
NoThrom vs contentious
D = 0.2, p = 0.0 

Dysarthria 

Throm vs contentious
D = 0.1, p = 0.0
NoThrom vs contentious
D = 0.21, p = 0.0 

ExtinctionInattention 

Throm vs contentious
D = 0.12, p = 0.0
NoThrom vs contentious
D = 0.19, p = 0.0 

NIHSS binned by value#

content = cohort.loc[cohort_results['percent_agree']<=0.7]#['S2NihssArrival']
throm = cohort.loc[cohort_results['percent']>0.7]#['S2NihssArrival']
no_throm = cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]#['S2NihssArrival']

bars = []
numbers = []
numbers_stratified = []
for i in range(40):
    
    n_content = len(content.loc[content['S2NihssArrival']==i])
    n_throm = len(throm.loc[throm['S2NihssArrival']==i])
    n_nothrom = len(no_throm.loc[no_throm['S2NihssArrival']==i])
    
    all_=np.array([n_content,n_throm,n_nothrom])
    
    numbers_stratified.append(all_)
    
    numbers.append(np.sum(all_))
    
    bars.append(all_/sum(all_))
    
c,t,nt = zip(*bars)
fig,ax = plt.subplots(figsize=(6,6))

locs = np.arange(40)

ax.bar(locs, t, width = 0.5, label='Thrombolysed')
ax.bar(locs, c, width=0.5, bottom=t, label='Contentious')
ax.bar(locs, nt, width = 0.5, bottom = np.vstack((t,c)).sum(axis=0), label='Not Thrombolysed')

plt.legend(loc='best', fontsize=12)
#plt.ylim(0,1.2)
plt.xlabel('NIHSS')
plt.savefig('./output/nihss_proportions_1.jpg', dpi=300)
plt.show()
../_images/characterise_contentious_patients_27_0.png

Add errors using bernoulli distribution

\[\sigma = \sqrt{\frac{pq}{n}}\]
  • p = proportion of NIHSS value agreed

  • q = proportion of NIHSS value contentious

  • n = number of people with NIHSS value

fig,ax = plt.subplots(figsize=(5,5))

locs = np.arange(40)

a = np.vstack((t,nt)).sum(axis=0)

ax.bar(locs, a, width = 0.5, label='Agreed')
ax.bar(locs, c, width=0.5, bottom=a, label='Contentious')


err = np.sqrt(a*c/np.array(numbers))

#ax.bar(locs, err, width = 0.5, bottom = a-err/2)

ax.plot(locs, a-err/2, 'k--')
ax.plot(locs, a+err/2, 'k--')

plt.fill_between(locs, a-err/2, a+err/2, 'g', alpha=1)

plt.legend(loc='best', fontsize=12)
#plt.ylim(0,1.2)
plt.xlabel('NIHSS')
plt.savefig('./output/nihss_proportions_2.jpg', dpi=300)
plt.show()
../_images/characterise_contentious_patients_29_0.png
fig,ax = plt.subplots(figsize=(5,5))

n_c,n_t,n_nt = zip(*numbers_stratified)

plt.plot(n_t, 'o-', label = 'Thrombolysed')
plt.plot(n_c, 'o-', label = 'Contentious')
plt.plot(n_nt, 'o-', label = 'Not Thrombolysed')
plt.plot(numbers, '+-', label = 'Total')

plt.legend(loc='best')
plt.grid()
plt.savefig('./output/nihss_numbers.jpg', dpi=300)
plt.show()
../_images/characterise_contentious_patients_30_0.png
fig = plt.figure(figsize=(10,10))

ax0 = fig.add_subplot(2,2,1)

content = 2*cohort.loc[cohort_results['percent_agree']<=0.7]['S1OnsetTimeType_Precise'] + cohort.loc[cohort_results['percent_agree']<=0.7]['S1OnsetTimeType_Best estimate']
throm = 2*cohort.loc[cohort_results['percent']>0.7]['S1OnsetTimeType_Precise'] + cohort.loc[cohort_results['percent']>0.7]['S1OnsetTimeType_Best estimate']
no_throm = 2*cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S1OnsetTimeType_Precise']\
+ cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S1OnsetTimeType_Best estimate']

throm_h,edges = np.histogram(throm, bins = [1,2,3], density=True)
nothrom_h,edges = np.histogram(no_throm, bins = [1,2,3], density=True)
content_h,edges = np.histogram(content, bins = [1,2,3], density=True)

ax0.bar([0.85,1.15],throm_h, width=0.3, color = ['yellow','navy'], alpha=0.7)

ax0.bar([1.85,2.15],content_h, width=0.3, color = ['yellow','navy'], alpha=0.7)

ax0.bar([2.85,3.15],nothrom_h, width=0.3, color = ['yellow','navy'], alpha=0.7)

ax0.set_xticks([1,2,3])

ax0.set_xticklabels(names, rotation=90)

ax0.text(0.05, 0.95, 'a', horizontalalignment='center', weight='bold', fontsize=14,
      verticalalignment='center', transform=ax0.transAxes)

plt.ylabel('Proportion of group', fontsize=15)

colors = {'Best estimate':'yellow', 'Precise':'navy'}         
labels = list(colors.keys())
handles = [plt.Rectangle((0,0),1,1, color=colors[label], alpha=0.7) for label in labels]
plt.legend(handles, labels)

ax = fig.add_subplot(2,2,2)

locs = np.arange(40)

ax.bar(locs, t, width = 0.5, label='Thrombolysed')
ax.bar(locs, c, width=0.5, bottom=t, label='Contentious')
ax.bar(locs, nt, width = 0.5, bottom = np.vstack((t,c)).sum(axis=0), label='Not Thrombolysed')

plt.xlabel('NIHSS', fontsize=15)
plt.ylabel('Proportion of patients', fontsize=15 )
plt.legend(loc='best')
plt.ylim(0,1.2)

ax.text(0.05, 0.95, 'b', horizontalalignment='center', weight='bold', fontsize=14,
      verticalalignment='center', transform=ax.transAxes)

ax2 = fig.add_subplot(2,2,3)


locs = np.arange(40)

a = np.vstack((t,nt)).sum(axis=0)

ax2.bar(locs, a, width = 0.5, color = 'tab:purple', label='Agreed')
ax2.bar(locs, c, width=0.5, bottom=a, color = 'tab:orange', label='Contentious')
err = np.sqrt(a*c/np.array(numbers))

#ax.bar(locs, err, width = 0.5, bottom = a-err/2)

ax2.plot(locs, a-err/2, 'k--')
ax2.plot(locs, a+err/2, 'k--')

plt.fill_between(locs, a-err/2, a+err/2, color = 'tab:gray', alpha=1)



plt.legend(loc='best')
plt.ylim(0,1.2)
plt.xlabel('NIHSS', fontsize=15)
plt.ylabel('Proportion of patients', fontsize=15 )
ax2.grid()
ax2.text(0.05, 0.95, 'c', horizontalalignment='center', weight='bold', fontsize=14,
      verticalalignment='center', transform=ax2.transAxes)

ax1 = fig.add_subplot(2,2,4)

n_c,n_t,n_nt = zip(*numbers_stratified)

plt.plot(n_t, 'o-', label = 'Thrombolysed')
plt.plot(n_c, 'o-', label = 'Contentious')
plt.plot(n_nt, 'o-', label = 'Not Thrombolysed')
plt.plot(numbers, '+-', label = 'Total')

plt.legend(loc='best')
plt.xlabel('NIHSS', fontsize=15)
plt.ylabel('N patients', fontsize=15 )
plt.grid()

ax1.text(0.05, 0.95, 'd', horizontalalignment='center', weight='bold', fontsize=14,
      verticalalignment='center', transform=ax1.transAxes)

plt.tight_layout()

plt.savefig('./output/nihss_4.jpg', dpi=300)

plt.show()
../_images/characterise_contentious_patients_31_0.png

Rankin before stroke#

content = (cohort.loc[contentious.index]['S2RankinBeforeStroke'].mean(), \
           cohort.loc[contentious.index]['S2RankinBeforeStroke'].std())
throm = (cohort.loc[cohort_results['percent']>0.7]['S2RankinBeforeStroke'].mean(),\
         cohort.loc[cohort_results['percent']>0.7]['S2RankinBeforeStroke'].std())
no_throm = (cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S2RankinBeforeStroke'].mean(),\
         cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S2RankinBeforeStroke'].std())

print(f'Contentious: Mean = {content[0]:.2f}, StdDev = {content[1]:.2f}')
print(f'Thrombolysed: Mean = {throm[0]:.2f}, StdDev = {throm[1]:.2f}')
print(f'Not Thrombolysed: Mean = {no_throm[0]:.2f}, StdDev = {no_throm[1]:.2f}')
Contentious: Mean = 1.12, StdDev = 1.44
Thrombolysed: Mean = 0.58, StdDev = 1.05
Not Thrombolysed: Mean = 1.21, StdDev = 1.49

Rankin distributions#

content = cohort.loc[cohort_results['percent_agree']<=0.7]#['S2NihssArrival']
throm = cohort.loc[cohort_results['percent']>0.7]#['S2NihssArrival']
no_throm = cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]#['S2NihssArrival']

bars = []
numbers = []
numbers_stratified = []
for i in range(6):
    
    n_content = len(content.loc[content['S2RankinBeforeStroke']==i])
    n_throm = len(throm.loc[throm['S2RankinBeforeStroke']==i])
    n_nothrom = len(no_throm.loc[no_throm['S2RankinBeforeStroke']==i])
    
    all_=np.array([n_content,n_throm,n_nothrom])
    
    numbers_stratified.append(all_)
    
    numbers.append(np.sum(all_))
    
    bars.append(all_/sum(all_))
    
c,t,nt = zip(*bars)
fig,ax = plt.subplots(figsize=(5,5))

locs = np.arange(6)

ax.bar(locs, t, width = 0.5, label='Thrombolysed')
ax.bar(locs, c, width=0.5, bottom=t, label='Contentious')
ax.bar(locs, nt, width = 0.5, bottom = np.vstack((t,c)).sum(axis=0), label='Not Thrombolysed')

plt.legend(loc='best', fontsize=12)
#plt.ylim(0,1.2)
plt.xlabel('Rankin')
plt.savefig('./output/rankin_proportions_1.jpg', dpi=300)
plt.show()
../_images/characterise_contentious_patients_36_0.png
fig,ax = plt.subplots(figsize=(5,5))

n_c,n_t,n_nt = zip(*numbers_stratified)

plt.plot(n_t, 'o-', label = 'Thrombolysed')
plt.plot(n_c, 'o-', label = 'Contentious')
plt.plot(n_nt, 'o-', label = 'Not Thrombolysed')
plt.plot(numbers, '+-', label = 'Total')

plt.legend(loc='best')
plt.grid()
plt.savefig('./output/rankin_numbers.jpg', dpi=300)
plt.show()
../_images/characterise_contentious_patients_37_0.png

Age#

content = (cohort.loc[contentious.index]['S1AgeOnArrival'].mean(), \
           cohort.loc[contentious.index]['S1AgeOnArrival'].std())
throm = (cohort.loc[cohort_results.sum(axis=1)>92]['S1AgeOnArrival'].mean(),\
         cohort.loc[cohort_results.sum(axis=1)>92]['S1AgeOnArrival'].std())
no_throm = (cohort.loc[cohort_results.sum(axis=1)<40].loc[cohort['S2StrokeType_Infarction']==1]['S1AgeOnArrival'].mean(),\
         cohort.loc[cohort_results.sum(axis=1)<40].loc[cohort['S2StrokeType_Infarction']==1]['S1AgeOnArrival'].std())

print(f'Contentious: Mean = {content[0]:.1f}, StdDev = {content[1]:.1f}')
print(f'Thrombolysed: Mean = {throm[0]:.1f}, StdDev = {throm[1]:.1f}')
print(f'Not Thrombolysed: Mean = {no_throm[0]:.1f}, StdDev = {no_throm[1]:.1f}')
Contentious: Mean = 75.1, StdDev = 13.9
Thrombolysed: Mean = 73.3, StdDev = 13.7
Not Thrombolysed: Mean = 76.2, StdDev = 14.1

Scan time#

content = (cohort.loc[contentious.index]['S2BrainImagingTime_min'].mean(), \
           cohort.loc[contentious.index]['S2BrainImagingTime_min'].std())
throm = (cohort.loc[cohort_results['percent']>0.7]['S2BrainImagingTime_min'].mean(),\
         cohort.loc[cohort_results['percent']>0.7]['S2BrainImagingTime_min'].std())
no_throm = (cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S2BrainImagingTime_min'].mean(),\
         cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S2BrainImagingTime_min'].std())

print(f'Contentious: Mean = {content[0]:.1f}, StdDev = {content[1]:.1f}')
print(f'Thrombolysed: Mean = {throm[0]:.1f}, StdDev = {throm[1]:.1f}')
print(f'Not Thrombolysed: Mean = {no_throm[0]:.1f}, StdDev = {no_throm[1]:.1f}')
Contentious: Mean = 26.3, StdDev = 22.8
Thrombolysed: Mean = 18.8, StdDev = 12.2
Not Thrombolysed: Mean = 153.0, StdDev = 901.6

Onset known type#

content = 2*cohort.loc[cohort_results['percent_agree']<=0.7]['S1OnsetTimeType_Precise'] + cohort.loc[cohort_results['percent_agree']<=0.7]['S1OnsetTimeType_Best estimate']
throm = 2*cohort.loc[cohort_results['percent']>0.7]['S1OnsetTimeType_Precise'] + cohort.loc[cohort_results['percent']>0.7]['S1OnsetTimeType_Best estimate']
no_throm = 2*cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S1OnsetTimeType_Precise']\
+ cohort.loc[cohort_results['percent']<0.3].loc[cohort['S2StrokeType_Infarction']==1]['S1OnsetTimeType_Best estimate']

#throm, content, no throm on x axis
fig, ax = plt.subplots(figsize=(5,5))

throm_h,edges = np.histogram(throm, bins = [1,2,3], density=True)
nothrom_h,edges = np.histogram(no_throm, bins = [1,2,3], density=True)
content_h,edges = np.histogram(content, bins = [1,2,3], density=True)

ax.bar([0.85,1.15],throm_h, width=0.3, color = ['g','r'], alpha=0.7)

ax.bar([1.85,2.15],content_h, width=0.3, color = ['g','r'], alpha=0.7)

ax.bar([2.85,3.15],nothrom_h, width=0.3, color = ['g','r'], alpha=0.7)

ax.set_xticks([1,2,3])

ax.set_xticklabels(names, rotation=90)

colors = {'Best estimate':'g', 'Precise':'r'}         
labels = list(colors.keys())
handles = [plt.Rectangle((0,0),1,1, color=colors[label], alpha=0.7) for label in labels]
plt.legend(handles, labels)
plt.savefig('./output/onset_time_type_histogram.jpg', dpi=300)
plt.tight_layout()
plt.show()
../_images/characterise_contentious_patients_43_0.png

Predicting which patients will be contentious, or agreed thrombolyse/no-thrombolyse#

We fit a Random Forest classifier to predict which patients will be in each category, and then can use feature importances and Shapley values to understand which features drive categorisation.

Classify into groups#

# Define the band considered contentious
width = 0.4 # Patients with 30% to 70% hopsitals thrombolysing will be consider contentious

T = cohort.loc[cohort_results['percent']>(0.5 + width/2)]
NT = cohort.loc[cohort_results['percent']<(0.5 - width/2)].loc[cohort['S2StrokeType_Infarction']==1]
C = cohort.loc[cohort_results['percent_agree']<=(0.5 + width/2)]
# Proportion in each group
t_count = len(T)
nt_count = len(NT)
c_count = len(C)
total = sum([t_count, nt_count, c_count])

prop_t = t_count / total
prop_nt = nt_count / total
prop_c = c_count / total

print(f'Prop agree thrombolyse: {prop_t:0.3f}')
print(f'Prop contentious: {prop_c:0.3f}')
print(f'Prop agree not thrombolyse: {prop_nt:0.3f}')
Prop agree thrombolyse: 0.240
Prop contentious: 0.162
Prop agree not thrombolyse: 0.598

Accuracy measurement#

def calculate_accuracy(observed, predicted):
    
    """
    Calculates a range of accuracy scores from observed and predicted classes.
    
    Takes two list or NumPy arrays (observed class values, and predicted class 
    values), and returns a dictionary of results.
    
     1) observed positive rate: proportion of observed cases that are +ve
     2) Predicted positive rate: proportion of predicted cases that are +ve
     3) observed negative rate: proportion of observed cases that are -ve
     4) Predicted negative rate: proportion of predicted cases that are -ve  
     5) accuracy: proportion of predicted results that are correct    
     6) precision: proportion of predicted +ve that are correct
     7) recall: proportion of true +ve correctly identified
     8) f1: harmonic mean of precision and recall
     9) sensitivity: Same as recall
    10) specificity: Proportion of true -ve identified:        
    11) positive likelihood: increased probability of true +ve if test +ve
    12) negative likelihood: reduced probability of true +ve if test -ve
    13) false positive rate: proportion of false +ves in true -ve patients
    14) false negative rate: proportion of false -ves in true +ve patients
    15) true positive rate: Same as recall
    16) true negative rate: Same as specificity
    17) positive predictive value: chance of true +ve if test +ve
    18) negative predictive value: chance of true -ve if test -ve
    
    """
    
    # Converts list to NumPy arrays
    if type(observed) == list:
        observed = np.array(observed)
    if type(predicted) == list:
        predicted = np.array(predicted)
    
    # Calculate accuracy scores
    observed_positives = observed == 1
    observed_negatives = observed == 0
    predicted_positives = predicted == 1
    predicted_negatives = predicted == 0
    
    true_positives = (predicted_positives == 1) & (observed_positives == 1)
    
    false_positives = (predicted_positives == 1) & (observed_positives == 0)
    
    true_negatives = (predicted_negatives == 1) & (observed_negatives == 1)
    
    false_negatives = (predicted_negatives == 1) & (observed_negatives == 0)
    
    accuracy = np.mean(predicted == observed)
    
    precision = (np.sum(true_positives) /
                 (np.sum(true_positives) + np.sum(false_positives)))
        
    recall = np.sum(true_positives) / np.sum(observed_positives)
    
    sensitivity = recall
    
    f1 = 2 * ((precision * recall) / (precision + recall))
    
    specificity = np.sum(true_negatives) / np.sum(observed_negatives)
    
    positive_likelihood = sensitivity / (1 - specificity)
    
    negative_likelihood = (1 - sensitivity) / specificity
    
    false_positive_rate = 1 - specificity
    
    false_negative_rate = 1 - sensitivity
    
    true_positive_rate = sensitivity
    
    true_negative_rate = specificity
    
    positive_predictive_value = (np.sum(true_positives) / 
                            (np.sum(true_positives) + np.sum(false_positives)))
    
    negative_predictive_value = (np.sum(true_negatives) / 
                            (np.sum(true_negatives) + np.sum(false_negatives)))
    
    # Create dictionary for results, and add results
    results = dict()
    
    results['observed_positive_rate'] = np.mean(observed_positives)
    results['observed_negative_rate'] = np.mean(observed_negatives)
    results['predicted_positive_rate'] = np.mean(predicted_positives)
    results['predicted_negative_rate'] = np.mean(predicted_negatives)
    results['accuracy'] = accuracy
    results['precision'] = precision
    results['recall'] = recall
    results['f1'] = f1
    results['sensitivity'] = sensitivity
    results['specificity'] = specificity
    results['positive_likelihood'] = positive_likelihood
    results['negative_likelihood'] = negative_likelihood
    results['false_positive_rate'] = false_positive_rate
    results['false_negative_rate'] = false_negative_rate
    results['true_positive_rate'] = true_positive_rate
    results['true_negative_rate'] = true_negative_rate
    results['positive_predictive_value'] = positive_predictive_value
    results['negative_predictive_value'] = negative_predictive_value
    
    return results

Define funtion to fit classification predictor model#

from sklearn.model_selection import train_test_split

def classify_group(g1, g2):
    
    data = pd.concat([g1,g2])
    
    reduced = data.drop(['S2Thrombolysis', 'StrokeTeam'], axis=1)
    
    yvals = np.concatenate((np.ones(len(g1)),np.zeros(len(g2))))
    
    X_train, X_test, y_train, y_test = train_test_split(
    reduced.values, yvals, test_size=0.33, random_state=42)
    
    forest = RandomForestClassifier(n_estimators=100, n_jobs=-1,
                                    class_weight='balanced', random_state=0)
    
    forest.fit(X_train,y_train)
    
    y_pred = forest.predict(X_test)
    
    y_prob = forest.predict_proba(X_test)[:,1]
    
    imp = forest.feature_importances_
    
    return forest, y_test, y_pred, y_prob, imp, reduced.columns.values

Model that distinguishes agreed thrombolysed vs contentious#

forest, y_test, y_pred, y_prob, imp, feat = classify_group(T,C)
accuracy = pd.Series(calculate_accuracy(y_test,y_pred))
accuracy
observed_positive_rate       0.612154
observed_negative_rate       0.387846
predicted_positive_rate      0.651475
predicted_negative_rate      0.348525
accuracy                     0.912422
precision                    0.902606
recall                       0.960584
f1                           0.930693
sensitivity                  0.960584
specificity                  0.836406
positive_likelihood          5.871738
negative_likelihood          0.047126
false_positive_rate          0.163594
false_negative_rate          0.039416
true_positive_rate           0.960584
true_negative_rate           0.836406
positive_predictive_value    0.902606
negative_predictive_value    0.930769
dtype: float64

Show feature imporatances for contentious vs thrombolysed#

features = pd.DataFrame()
features['feature'] = feat
features['importance'] = imp

features.sort_values(by = 'importance', ascending=False, inplace=True)

# Display top 20 features
features[:20]
feature importance
18 S2NihssArrival 0.104195
36 S1OnsetTimeType_Precise 0.085606
19 S2BrainImagingTime_min 0.077009
34 S1OnsetTimeType_Best estimate 0.066169
1 S1OnsetToArrival_min 0.051140
2 S2RankinBeforeStroke 0.049437
0 S1AgeOnArrival 0.030873
8 FacialPalsy 0.028570
7 Visual 0.020973
16 Dysarthria 0.018751
17 ExtinctionInattention 0.016399
11 MotorLegLeft 0.016062
12 MotorLegRight 0.015735
15 BestLanguage 0.015133
9 MotorArmLeft 0.014813
14 Sensory 0.014614
67 AtrialFibrillation_Yes 0.014449
78 AFAnticoagulent_Yes 0.014248
66 AtrialFibrillation_No 0.013494
10 MotorArmRight 0.013312
fig,ax = plt.subplots(figsize=(10,5))
n = 20
plt.bar(features['feature'][:n],features['importance'][:n])
ax.set_xticklabels(features['feature'][:n],  rotation=90)
ax.set_ylabel('Feature importance')
plt.savefig('./output/contentious_vs_thromb_feature_importance.jpg', dpi=300)
plt.show()
../_images/characterise_contentious_patients_57_0.png

Get improvement in ROC AUC with increasing number of features#

def auc_feat(g1,g2,i):
    
    data = pd.concat([g1,g2])
    
    # Group 1 has label '1', and Group 2 has label '0'
    yvals = np.concatenate((np.ones(len(g1)),np.zeros(len(g2))))
    
    results = []
    
    X_train, X_test, y_train, y_test = train_test_split(
    data.values, yvals, test_size=0.33, random_state=42)
    
    for run in range(100):
        
        # Boot strap samples
        idx = np.random.randint(len(X_train), size=len(X_train))
        X_sampled = np.array([X_train[i] for i in idx])
        y_sampled = [y_train[i] for i in idx]

        if i==0:

            X_sampled = X_sampled.reshape(-1, 1)
            X_test = X_test.reshape(-1, 1)

        # Fit classifier
        forest = RandomForestClassifier(n_estimators=100, n_jobs=-1,
                                        class_weight='balanced', random_state=0)

        forest.fit(X_sampled,y_sampled)

        y_pred = forest.predict(X_test)

        y_prob = forest.predict_proba(X_test)[:,1]
        
        # Get ROC
        fpr, tpr, thresholds = roc_curve(y_test, y_prob)
        
        # Get ROC AUC
        results.append(round(auc(fpr, tpr),2))    
    
    return np.median(results), np.percentile(results,10), np.percentile(results,90)
auc_vals = []
feats = []

for i,f in enumerate(features['feature'].values[:30]):
    
    # Add features
    feats.append(f)
    
    if i==0:
    
        T_reduced = T[f] 
        C_reduced = C[f]
        
    else:
        
        T_reduced = T[feats]
        C_reduced = C[feats]
        
    # Get median, 10th percentile and 90th percentile AUC (from 50 runs)
    med,lo,hi = auc_feat(T_reduced,C_reduced,i)
    
    auc_vals.append([lo,med,hi])
lo,med,hi = zip(*auc_vals)
lo_err = np.array(med)-np.array(lo)
hi_err = np.array(hi)-np.array(med)

fig = plt.figure(figsize=(7,6))

n = 25

ax2 = fig.add_subplot(1,1,1)


ax2.errorbar(np.arange(n), med[:n], yerr = [lo_err[:n],hi_err[:n]], fmt = '-r')

ax2.set_xticklabels(features['feature'][:n],  rotation=90)
#plt.fill_between(np.arange(1,100), lo, hi)

plt.ylabel('AUC', fontsize=15)

ax2.grid()

ax1 = ax2.twinx()

ax1.bar(features['feature'][:n],features['importance'][:n], alpha = 0.5)

ax1.set_xticklabels(features['feature'][:n],  rotation=90, fontsize=12)

#ax1.grid()
ax1.set_ylabel('Importance', fontsize=15) 

plt.tight_layout()

plt.savefig('./output/auc_features.jpg', dpi=300)

plt.show()
../_images/characterise_contentious_patients_61_0.png

Shap bee swarm plot#

import shap

def classify_group_shap(g1, g2):
    
    data = pd.concat([g1,g2])
    
    reduced = data.drop(['S2Thrombolysis', 'StrokeTeam'], axis=1)
    
    yvals = np.concatenate((np.ones(len(g1)),np.zeros(len(g2))))
    
    X_train, X_test, y_train, y_test = train_test_split(
    reduced.values, yvals, test_size=0.33, random_state=42)
    
    forest = RandomForestClassifier(n_estimators=100, n_jobs=-1,
                                    class_weight='balanced', random_state=0)
    
    forest.fit(X_train,y_train)
    
    explainer = shap.TreeExplainer(forest, reduced)
    shap_values = explainer(reduced)
    
    return forest, X_test, shap_values
forest, X_test, shapObj = classify_group_shap(T,C)
100%|===================| 6749/6776 [02:18<00:00]        
dta = pd.concat([T,C]).drop(['S2Thrombolysis', 'StrokeTeam'], axis=1)

#shap.plots.beeswarm(shap_values)

fig = plt.figure(figsize=(6,6))

shap.summary_plot(shap_values = np.take(shapObj.values, 0, axis=-1), features = dta.values,
                 feature_names = dta.columns.values, cmap = plt.get_cmap('nipy_spectral'), show=False)
plt.tight_layout()

plt.savefig('./output/shap_beeswarm.jpg', dpi=300)
../_images/characterise_contentious_patients_65_0.png

Overall shap contribution by feature

fig = plt.figure(figsize=(6,6))

shap.summary_plot(shap_values = np.take(shapObj.values, 0, axis=-1), features = dta.values,
                 feature_names = dta.columns.values, plot_type='bar')
plt.tight_layout()

plt.savefig('./output/shap_bar.jpg', dpi=300)

plt.show()
../_images/characterise_contentious_patients_67_0.png
<Figure size 432x288 with 0 Axes>

Observations#

Compared with patients where 70% of units would give thrombolysis, those contentious patients, where 30-70% of units would give thrombolysis:

  • Are more likely to have an estimated, rather than precise, onset time

  • Arrive later

  • Have longer arrival to scan times

  • Have a lower NIHSS score

  • Have greater disability prior to the stroke

  • Are older

  • Do not have facial palsy

  • Do not have visual field deficits

  • Have atrial fibrillation