How much of the inter-hospital variation in thrombolysis use do in-hospital processes explain#

Aims:

  • Investigate the correlation (explained variance) between hospital model process parameters and the variation in use of thrombolysis between hospitals.

Load data and pivot by scenario#

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Load data
scenarios = pd.read_csv('output/key_scenario_results.csv')

# Performance data
performance = pd.read_csv(
    'hosp_performance_output/hospital_performance.csv', index_col='stroke_team')

performance['hosp_speed'] = (
    np.exp(performance['arrival_scan_arrival_mins_mu']) +
    np.exp(performance['scan_needle_mins_mu']))

# Decision data
decision =  pd.read_csv(
    '../random_forest/predictions/corhort_rates.csv', index_col='hospital')
scenarios.head()
stroke_team scenario admissions thrombolysis_rate additional_good_outcomes_per_1000_patients patients_receiving_thrombolysis add_good_outcomes
0 AGNOF1041H base 671.666667 15.11 12.72 101.488833 8.543600
1 AKCGO9726K base 1143.333333 15.06 13.43 172.186000 15.354967
2 AOBTM3098N base 500.666667 7.81 5.74 39.102067 2.873827
3 APXEE8191H base 439.333333 10.08 7.35 44.284800 3.229100
4 ATDID5461S base 275.666667 9.20 6.42 25.361333 1.769780
performance.head()
thrombolysis_rate admissions 80_plus onset_known known_arrival_within_4hrs onset_arrival_mins_mu onset_arrival_mins_sigma scan_within_4_hrs arrival_scan_arrival_mins_mu arrival_scan_arrival_mins_sigma onset_scan_4_hrs eligable scan_needle_mins_mu scan_needle_mins_sigma hosp_speed
stroke_team
AGNOF1041H 0.154839 671.666667 0.425459 0.635236 0.681250 4.576874 0.557598 0.965596 1.665700 1.497966 0.935867 0.388325 3.669602 0.664462 44.525658
AKCGO9726K 0.158892 1143.333333 0.395658 0.970845 0.428829 4.625486 0.597451 0.955882 2.834183 0.999719 0.908425 0.419355 2.904479 0.874818 35.272215
AOBTM3098N 0.085885 500.666667 0.485470 0.619174 0.629032 4.603918 0.584882 0.935043 3.471419 1.254744 0.846435 0.267819 3.694918 0.518929 72.424682
APXEE8191H 0.098634 439.333333 0.515679 0.716237 0.608051 4.590357 0.496452 0.966899 3.312930 0.714465 0.904505 0.258964 3.585094 0.751204 63.522230
ATDID5461S 0.090689 275.666667 0.533546 0.573156 0.660338 4.427826 0.591373 0.878594 4.125690 0.549301 0.865455 0.315126 3.497262 0.608126 94.935429
rx = scenarios.pivot(
    index='stroke_team', columns='scenario', values='thrombolysis_rate')
rx = rx.merge(performance[['hosp_speed', 'onset_known']], left_index=True, right_index=True)
rx = rx.merge(decision['cohort_rate'], left_index=True, right_index=True)
rx.head()
base benchmark onset onset_benchmark same_patient_characteristics speed speed_benchmark speed_onset speed_onset_benchmark hosp_speed onset_known cohort_rate
AGNOF1041H 15.11 20.38 18.17 24.09 11.04 15.21 20.14 17.89 23.90 44.525658 0.635236 27.76
AKCGO9726K 15.06 14.18 14.92 14.27 22.22 15.32 14.69 15.36 14.83 35.272215 0.970845 37.45
AOBTM3098N 7.81 11.88 9.39 14.52 8.74 9.39 13.78 11.40 17.17 72.424682 0.619174 26.00
APXEE8191H 10.08 13.06 10.76 13.35 13.24 10.15 12.80 10.90 13.54 63.522230 0.716237 29.97
ATDID5461S 9.20 9.92 11.81 13.35 7.60 11.10 11.79 14.64 15.95 94.935429 0.573156 25.92

Calculate difference between each hospital’s thrombolysis rate and the mean thrombolysis#

mean_rx = rx['base'].mean()
print (f'Mean thrombolysis: {mean_rx:0.2f}')

rx['diff_from_mean'] = rx['base'] - mean_rx
Mean thrombolysis: 11.22
rx.head()
base benchmark onset onset_benchmark same_patient_characteristics speed speed_benchmark speed_onset speed_onset_benchmark hosp_speed onset_known cohort_rate diff_from_mean
AGNOF1041H 15.11 20.38 18.17 24.09 11.04 15.21 20.14 17.89 23.90 44.525658 0.635236 27.76 3.889545
AKCGO9726K 15.06 14.18 14.92 14.27 22.22 15.32 14.69 15.36 14.83 35.272215 0.970845 37.45 3.839545
AOBTM3098N 7.81 11.88 9.39 14.52 8.74 9.39 13.78 11.40 17.17 72.424682 0.619174 26.00 -3.410455
APXEE8191H 10.08 13.06 10.76 13.35 13.24 10.15 12.80 10.90 13.54 63.522230 0.716237 29.97 -1.140455
ATDID5461S 9.20 9.92 11.81 13.35 7.60 11.10 11.79 14.64 15.95 94.935429 0.573156 25.92 -2.020455

How much variation is explained by differences in decision making?#

diff_explained_by_decison_making = (np.corrcoef(
    rx['cohort_rate'], rx['diff_from_mean'])[1,0]) ** 2

print(f'{diff_explained_by_decison_making:0.3}')
0.399

How much variation is explained by differences in speed?#

diff_explained_by_speed = (np.corrcoef(
    rx['diff_from_mean'], rx['hosp_speed'])[1,0]) ** 2

print(f'{diff_explained_by_speed:0.3}')
0.254

How much variation is explained by determination of stroke onset time?#

diff_explained_by_determination_of_onset = (np.corrcoef(
    rx['diff_from_mean'], rx['onset_known'])[1,0]) ** 2

print(f'{diff_explained_by_determination_of_onset:0.3}')
0.0474

Plot relationships#

from scipy import stats
fig = plt.figure(figsize=(15,6))

ax1 = fig.add_subplot(131)
x = rx['cohort_rate']
y = rx['diff_from_mean']
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
y_fit=intercept + (x*gradient)
plt.plot(x, y, 'o', label='original data')
plt.plot(x, y_fit, 'r', label='fitted line')
text='Slope: %.2f\nR-Squared: %.3f\nP value: %.3f' %(gradient,r_value**2,p_value)
plt.text(35, -8, text)
ax1.set_title('Relationship between thrombolysis use and\ndecision making')
ax1.set_xlabel('Predicted 10k cohort thrombolysis rate')
ax1.set_ylabel('Difference in thrombolysis from mean')

ax2 = fig.add_subplot(132)
x = rx['hosp_speed']
y = rx['diff_from_mean']
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
y_fit=intercept + (x*gradient)
plt.plot(x, y, 'o', label='original data')
plt.plot(x, y_fit, 'r', label='fitted line')
text='Slope: %.2f\nR-Squared: %.3f\nP value: %.3f' %(gradient,r_value**2,p_value)
plt.text(77, 10, text)
ax2.set_title('Relationship between thrombolysis use and\npathway speed')
ax2.set_xlabel('Arrival to scan + scan to needle (mins)')
ax2.set_ylabel('Difference in thrombolysis from mean')

ax3 = fig.add_subplot(133)
x = rx['onset_known']
y = rx['diff_from_mean']
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
y_fit=intercept + (x*gradient)
plt.plot(x, y, 'o', label='original data')
plt.plot(x, y_fit, 'r', label='fitted line')
text='Slope: %.2f\nR-Squared: %.3f\nP value: %.3f' %(gradient,r_value**2,p_value)
plt.text(0.38,10, text)
ax3.set_title('Relationship between thrombolysis use and\ndetermination of stroke onset')
ax3.set_xlabel('Proportion patients with determined stroke onset')
ax3.set_ylabel('Difference in thrombolysis from mean')


plt.tight_layout(pad=2)
plt.savefig('./output/model_correlations.jpg', dpi=300)
plt.show()
../_images/explained_variance_in_thrombolysis_18_0.png

Conclusions#

  • In-hospital process parameters partly explain the inter-hospital variation in thrombolysis use.

  • The strongest relationship is between decision-making, as described by the predicted thrombolysis use of a standard 10k cohort of patients, with R-square of 0.40.

  • Pathway speed is the next strongest predictor of thrombolysis use, with an R-square of 0.25.

  • Determination of stroke onset time is the weakest predictor of thrombolysis use (R-square of 0.05), but is still statistically significant (p=0.012)