How much of the inter-hospital variation in thrombolysis use do in-hospital processes explain#

Aims:

Investigate the correlation (explained variance) between hospital model process parameters and the variation in use of thrombolysis between hospitals.

Load data and pivot by scenario#

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Load data
scenarios = pd.read_csv('output/key_scenario_results.csv')

# Performance data
performance = pd.read_csv(
    'hosp_performance_output/hospital_performance.csv', index_col='stroke_team')

performance['hosp_speed'] = (
    np.exp(performance['arrival_scan_arrival_mins_mu']) +
    np.exp(performance['scan_needle_mins_mu']))

# Decision data
decision =  pd.read_csv(
    '../random_forest/predictions/corhort_rates.csv', index_col='hospital')

scenarios.head()

	stroke_team	scenario	admissions	thrombolysis_rate	additional_good_outcomes_per_1000_patients	patients_receiving_thrombolysis	add_good_outcomes
0	AGNOF1041H	base	671.666667	15.11	12.72	101.488833	8.543600
1	AKCGO9726K	base	1143.333333	15.06	13.43	172.186000	15.354967
2	AOBTM3098N	base	500.666667	7.81	5.74	39.102067	2.873827
3	APXEE8191H	base	439.333333	10.08	7.35	44.284800	3.229100
4	ATDID5461S	base	275.666667	9.20	6.42	25.361333	1.769780

performance.head()

	thrombolysis_rate	admissions	80_plus	onset_known	known_arrival_within_4hrs	onset_arrival_mins_mu	onset_arrival_mins_sigma	scan_within_4_hrs	arrival_scan_arrival_mins_mu	arrival_scan_arrival_mins_sigma	onset_scan_4_hrs	eligable	scan_needle_mins_mu	scan_needle_mins_sigma	hosp_speed
stroke_team
AGNOF1041H	0.154839	671.666667	0.425459	0.635236	0.681250	4.576874	0.557598	0.965596	1.665700	1.497966	0.935867	0.388325	3.669602	0.664462	44.525658
AKCGO9726K	0.158892	1143.333333	0.395658	0.970845	0.428829	4.625486	0.597451	0.955882	2.834183	0.999719	0.908425	0.419355	2.904479	0.874818	35.272215
AOBTM3098N	0.085885	500.666667	0.485470	0.619174	0.629032	4.603918	0.584882	0.935043	3.471419	1.254744	0.846435	0.267819	3.694918	0.518929	72.424682
APXEE8191H	0.098634	439.333333	0.515679	0.716237	0.608051	4.590357	0.496452	0.966899	3.312930	0.714465	0.904505	0.258964	3.585094	0.751204	63.522230
ATDID5461S	0.090689	275.666667	0.533546	0.573156	0.660338	4.427826	0.591373	0.878594	4.125690	0.549301	0.865455	0.315126	3.497262	0.608126	94.935429

rx = scenarios.pivot(
    index='stroke_team', columns='scenario', values='thrombolysis_rate')

rx = rx.merge(performance[['hosp_speed', 'onset_known']], left_index=True, right_index=True)
rx = rx.merge(decision['cohort_rate'], left_index=True, right_index=True)

rx.head()

	base	benchmark	onset	onset_benchmark	same_patient_characteristics	speed	speed_benchmark	speed_onset	speed_onset_benchmark	hosp_speed	onset_known	cohort_rate
AGNOF1041H	15.11	20.38	18.17	24.09	11.04	15.21	20.14	17.89	23.90	44.525658	0.635236	27.76
AKCGO9726K	15.06	14.18	14.92	14.27	22.22	15.32	14.69	15.36	14.83	35.272215	0.970845	37.45
AOBTM3098N	7.81	11.88	9.39	14.52	8.74	9.39	13.78	11.40	17.17	72.424682	0.619174	26.00
APXEE8191H	10.08	13.06	10.76	13.35	13.24	10.15	12.80	10.90	13.54	63.522230	0.716237	29.97
ATDID5461S	9.20	9.92	11.81	13.35	7.60	11.10	11.79	14.64	15.95	94.935429	0.573156	25.92

Calculate difference between each hospital’s thrombolysis rate and the mean thrombolysis#

mean_rx = rx['base'].mean()
print (f'Mean thrombolysis: {mean_rx:0.2f}')

rx['diff_from_mean'] = rx['base'] - mean_rx

Mean thrombolysis: 11.22

rx.head()

	base	benchmark	onset	onset_benchmark	same_patient_characteristics	speed	speed_benchmark	speed_onset	speed_onset_benchmark	hosp_speed	onset_known	cohort_rate	diff_from_mean
AGNOF1041H	15.11	20.38	18.17	24.09	11.04	15.21	20.14	17.89	23.90	44.525658	0.635236	27.76	3.889545
AKCGO9726K	15.06	14.18	14.92	14.27	22.22	15.32	14.69	15.36	14.83	35.272215	0.970845	37.45	3.839545
AOBTM3098N	7.81	11.88	9.39	14.52	8.74	9.39	13.78	11.40	17.17	72.424682	0.619174	26.00	-3.410455
APXEE8191H	10.08	13.06	10.76	13.35	13.24	10.15	12.80	10.90	13.54	63.522230	0.716237	29.97	-1.140455
ATDID5461S	9.20	9.92	11.81	13.35	7.60	11.10	11.79	14.64	15.95	94.935429	0.573156	25.92	-2.020455

How much variation is explained by differences in decision making?#

diff_explained_by_decison_making = (np.corrcoef(
    rx['cohort_rate'], rx['diff_from_mean'])[1,0]) ** 2

print(f'{diff_explained_by_decison_making:0.3}')

0.399

How much variation is explained by differences in speed?#

diff_explained_by_speed = (np.corrcoef(
    rx['diff_from_mean'], rx['hosp_speed'])[1,0]) ** 2

print(f'{diff_explained_by_speed:0.3}')

0.254

How much variation is explained by determination of stroke onset time?#

diff_explained_by_determination_of_onset = (np.corrcoef(
    rx['diff_from_mean'], rx['onset_known'])[1,0]) ** 2

print(f'{diff_explained_by_determination_of_onset:0.3}')

0.0474

Plot relationships#

from scipy import stats
fig = plt.figure(figsize=(15,6))

ax1 = fig.add_subplot(131)
x = rx['cohort_rate']
y = rx['diff_from_mean']
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
y_fit=intercept + (x*gradient)
plt.plot(x, y, 'o', label='original data')
plt.plot(x, y_fit, 'r', label='fitted line')
text='Slope: %.2f\nR-Squared: %.3f\nP value: %.3f' %(gradient,r_value**2,p_value)
plt.text(35, -8, text)
ax1.set_title('Relationship between thrombolysis use and\ndecision making')
ax1.set_xlabel('Predicted 10k cohort thrombolysis rate')
ax1.set_ylabel('Difference in thrombolysis from mean')

ax2 = fig.add_subplot(132)
x = rx['hosp_speed']
y = rx['diff_from_mean']
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
y_fit=intercept + (x*gradient)
plt.plot(x, y, 'o', label='original data')
plt.plot(x, y_fit, 'r', label='fitted line')
text='Slope: %.2f\nR-Squared: %.3f\nP value: %.3f' %(gradient,r_value**2,p_value)
plt.text(77, 10, text)
ax2.set_title('Relationship between thrombolysis use and\npathway speed')
ax2.set_xlabel('Arrival to scan + scan to needle (mins)')
ax2.set_ylabel('Difference in thrombolysis from mean')

ax3 = fig.add_subplot(133)
x = rx['onset_known']
y = rx['diff_from_mean']
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
y_fit=intercept + (x*gradient)
plt.plot(x, y, 'o', label='original data')
plt.plot(x, y_fit, 'r', label='fitted line')
text='Slope: %.2f\nR-Squared: %.3f\nP value: %.3f' %(gradient,r_value**2,p_value)
plt.text(0.38,10, text)
ax3.set_title('Relationship between thrombolysis use and\ndetermination of stroke onset')
ax3.set_xlabel('Proportion patients with determined stroke onset')
ax3.set_ylabel('Difference in thrombolysis from mean')


plt.tight_layout(pad=2)
plt.savefig('./output/model_correlations.jpg', dpi=300)
plt.show()

../_images/explained_variance_in_thrombolysis_18_0.png

Conclusions#

In-hospital process parameters partly explain the inter-hospital variation in thrombolysis use.
The strongest relationship is between decision-making, as described by the predicted thrombolysis use of a standard 10k cohort of patients, with R-square of 0.40.
Pathway speed is the next strongest predictor of thrombolysis use, with an R-square of 0.25.
Determination of stroke onset time is the weakest predictor of thrombolysis use (R-square of 0.05), but is still statistically significant (p=0.012)

SAMueL Stroke Audit Machine Learning 1

How much of the inter-hospital variation in thrombolysis use do in-hospital processes explain

Contents

How much of the inter-hospital variation in thrombolysis use do in-hospital processes explain#

Load data and pivot by scenario#

Calculate difference between each hospital’s thrombolysis rate and the mean thrombolysis#

How much variation is explained by differences in decision making?#

How much variation is explained by differences in speed?#

How much variation is explained by determination of stroke onset time?#

Plot relationships#

Conclusions#