Modelling the same patient population attending all hospitals#

Aim:

  • To predict the thrombolysis rate at each hospital if the same patient population (based on the national average patient characteristics) attend each hospital.

Patient population distributions are set to have the national average of:

  • Arrival within 4 hours of stroke onset

  • Proportion aged 80+

  • Onset to arrival mean and standard deviation

The proportion of arrivals eligible for thrombolysis is set to the predicted 10k cohort rate for each hospital (adjusted to give thrombolysis use in those patients scanned within 4 hours of stroke onset).

The simulation model passes through 100 x 1k patient cohorts through each hospital model.

Load libraries#

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats

Load data#

# Scenario results
results = pd.read_csv('./output/scenario_results.csv')
# Pathway performance paramters used in scenarios
performance_base = pd.read_csv('./output/performance_base.csv')
performance_same_patients = \
    pd.read_csv('./output/same_patient_characteristics.csv')
performance_base
stroke_team thrombolysis_rate admissions 80_plus onset_known known_arrival_within_4hrs onset_arrival_mins_mu onset_arrival_mins_sigma scan_within_4_hrs arrival_scan_arrival_mins_mu arrival_scan_arrival_mins_sigma onset_scan_4_hrs eligable scan_needle_mins_mu scan_needle_mins_sigma
0 AGNOF1041H 0.154839 671.666667 0.425459 0.635236 0.681250 4.576874 0.557598 0.965596 1.665700 1.497966 0.935867 0.388325 3.669602 0.664462
1 AKCGO9726K 0.158892 1143.333333 0.395658 0.970845 0.428829 4.625486 0.597451 0.955882 2.834183 0.999719 0.908425 0.419355 2.904479 0.874818
2 AOBTM3098N 0.085885 500.666667 0.485470 0.619174 0.629032 4.603918 0.584882 0.935043 3.471419 1.254744 0.846435 0.267819 3.694918 0.518929
3 APXEE8191H 0.098634 439.333333 0.515679 0.716237 0.608051 4.590357 0.496452 0.966899 3.312930 0.714465 0.904505 0.258964 3.585094 0.751204
4 ATDID5461S 0.090689 275.666667 0.533546 0.573156 0.660338 4.427826 0.591373 0.878594 4.125690 0.549301 0.865455 0.315126 3.497262 0.608126
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
127 YPKYH1768F 0.105193 250.333333 0.321767 0.585885 0.720455 4.436404 0.569248 0.952681 3.779215 0.872809 0.844371 0.305882 3.982100 0.683223
128 YQMZV4284N 0.104186 358.333333 0.508511 0.945116 0.462598 4.664536 0.494740 0.948936 3.574735 0.912298 0.798206 0.308989 3.285165 0.463749
129 ZBVSO0975W 0.081602 449.333333 0.442130 0.465134 0.688995 4.562051 0.510524 0.972222 2.860226 0.990966 0.930952 0.273657 3.606046 0.575788
130 ZHCLE1578P 0.112647 796.000000 0.484694 0.733668 0.671233 4.606557 0.546648 0.949830 3.306916 0.842940 0.892569 0.262788 3.276043 0.795401
131 ZRRCV7012C 0.063058 597.333333 0.469504 0.779576 0.504653 4.636283 0.485394 0.977305 3.743456 0.662710 0.851959 0.189097 3.261270 0.803624

132 rows × 15 columns

Collate key results#

Collate key results together in a DataFrame.

# Add admission numbers to results
admissions = performance_base[['stroke_team', 'admissions']]
results = results.merge(
    admissions, how='left', left_on='stroke_team', right_on='stroke_team')

# Calculate numbers thrombolysed
results['thrombolysed'] = \
    results['admissions'] * results['Percent_Thrombolysis_(mean)'] / 100

# Calculate additional good outcomes
results['add_good_outcomes'] = (results['admissions'] * 
    results['Additional_good_outcomes_per_1000_patients_(mean)'] / 1000)

# Get key results
key_results = pd.DataFrame()
key_results['stroke_team'] = results['stroke_team']
key_results['scenario'] = results['scenario']
key_results['admissions'] = results['admissions']
key_results['thrombolysis_rate'] = results['Percent_Thrombolysis_(mean)']
key_results['additional_good_outcomes_per_1000_patients'] = \
    results['Additional_good_outcomes_per_1000_patients_(mean)']
key_results['patients_receiving_thrombolysis'] = results['thrombolysed']
key_results['add_good_outcomes'] = results['add_good_outcomes']
key_results
stroke_team scenario admissions thrombolysis_rate additional_good_outcomes_per_1000_patients patients_receiving_thrombolysis add_good_outcomes
0 AGNOF1041H base 671.666667 15.20 12.76 102.093333 8.570467
1 AKCGO9726K base 1143.333333 14.91 13.21 170.471000 15.103433
2 AOBTM3098N base 500.666667 7.80 5.67 39.052000 2.838780
3 APXEE8191H base 439.333333 10.40 7.59 45.690667 3.334540
4 ATDID5461S base 275.666667 9.17 6.28 25.278633 1.731187
... ... ... ... ... ... ... ...
1183 YPKYH1768F same_patient_characteristics 250.333333 7.47 4.98 18.699900 1.246660
1184 YQMZV4284N same_patient_characteristics 358.333333 15.92 12.87 57.046667 4.611750
1185 ZBVSO0975W same_patient_characteristics 449.333333 7.66 6.23 34.418933 2.799347
1186 ZHCLE1578P same_patient_characteristics 796.000000 10.76 8.78 85.649600 6.988880
1187 ZRRCV7012C same_patient_characteristics 597.333333 11.19 8.67 66.841600 5.178880

1188 rows × 7 columns

Summary stats#

comparison = pd.DataFrame()
comparison['actual'] = key_results[key_results['scenario']=='base']['thrombolysis_rate'].values
comparison['cohort'] = key_results[key_results['scenario']=='same_patient_characteristics']['thrombolysis_rate'].values

comparison.describe().round(2)
actual cohort
count 132.00 132.00
mean 11.23 11.08
std 3.37 3.91
min 1.69 2.65
25% 9.24 8.29
50% 10.92 10.96
75% 13.12 12.87
max 24.23 24.93

Plot results#

fig = plt.figure(figsize=(15,6))

ax1 = fig.add_subplot(131)
y = key_results[key_results['scenario']=='base']['thrombolysis_rate']
x = key_results[key_results['scenario']=='same_patient_characteristics']['thrombolysis_rate']
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
y_fit=intercept + (x*gradient)
ax1.plot(x, y, 'o', label='Data points')
ax1.plot(x, y_fit, 'r', label='Fitted regression')
ax1.plot([x.min(),x.max()],[x.min(),x.max()], 'b--', 
         label = 'Expected line if all variance\ndue to hospital processes')
mean_thrombolysis = comparison['actual'].mean()
ax1.plot([x.min(),x.max()],[mean_thrombolysis, mean_thrombolysis], 'g:', 
         label = 'Expected line if all variance\ndue to patient populations')

text='Slope: %.2f\nR-Squared: %.3f\nP value: %.3f' %(gradient,r_value**2,p_value)
ax1.set_xlim(0,26)
ax1.set_ylim(0,26)
ax1.text(17,2,text)
ax1.legend()
ax1.set_ylabel('Thrombolysis rate of 100k patient cohort')
ax1.set_xlabel('Actual thrombolysis rate')
ax1.set_title('Regression plot of actual vs cohort thrombolysis rate')

ax2 = fig.add_subplot(132)
zipped = zip(x,y)
for xx, yy in zipped:
    colour = 'red' if xx > yy else 'blue'
    ax2.plot([xx,xx], [xx,yy], 
             color=colour, lw=1, marker='o', alpha=0.6, markersize=4)
ax2.set_ylabel('Predicted thrombolysis rate of 100k patient cohort')
ax2.set_xlabel('Actual thrombolysis rate')
ax2.set_title('Scatter plot of actual vs cohort thrombolysis rate')
ax2.grid()


ax3 = fig.add_subplot(133)
bins = np.arange(1,25)
ax3.hist(y, bins = bins, color='k', linewidth=2,
           label='Cohort thrombolysis', histtype='step')
ax3.hist(x, bins = bins, color='0.7', linewidth=2,
           label='Actual thrombolysis', histtype='stepfilled')
ax3.set_xlabel('Thrombolysis rate (%)')
ax3.set_ylabel('Hospital count')
ax3.set_title('Histogram of actual vs cohort thrombolysis rates')
ax3.set_ylim(0,25)
ax3.legend()

plt.tight_layout(pad=2)
plt.savefig('./output/pathway_cohort.jpg', dpi=300)
plt.show()
../_images/scenario_analysis_1k_patient_cohort_12_0.png

Observations#

  • In this experiment we fix the patient population, so that all hospitals in the model see the same patients, and pass those patients through each hospital model using that hospital’s pathway characteristics and decision model.

  • When a standard (national average) patient population is passed through all hospitals the cohort thrombolysis rate at each hospital correlates with the actual thrombolysis use with R-square of 0.41, suggesting in-hospital processes and decision making account for about 40% of observed variance. Similarly, the regression fit between actual and cohort thrombolysis rate has a slope of 0.55, suggesting that 55% of the inter-hospital variance is due to hospital processes - when the patient population is unchanged there is, on average, a 0.55 percentage point difference in predicted cohort population thrombolysis rate for each 1 percentage point change in actual thrombolysis rate.

  • As a general guide it seems that about half of the inter-hospital variance in thrombolysis rate is explained by in-hospital process and decisions, and the other half is explained by differences in local populations.

  • As should be expected, the mean and distribution of the cohort thrombolysis use per hospital is very similar to the actual mean and distribution of thrombolysis use.

  • There is a tendency for lower thrombolysing units to do better with a standard cohort, and those with higher thrombolysis rates to do worse. So there is a general observation that hospitals with high thrombolysis use do so partly because they have a more thrombolysable population, and vice versa.