Demo: Continuous outcome model#
This notebook contains a basic example of running the continuous outcome model within the stroke-outcome
package.
Notebook setup#
The intended versions of these packages are given in the files environment.yml
and requirements.txt
.
# Import required packages
import numpy as np
import pandas as pd
import copy
# Imports from the stroke_outcome package:
from stroke_outcome.continuous_outcome import Continuous_outcome
import stroke_outcome.outcome_utilities as outcome_utilities
Create patient data#
The model needs the following data as inputs, with one value per patient in each array:
Data |
Units |
Data type |
Name |
---|---|---|---|
Stroke type code |
0=other, 1=nLVO, 2=LVO |
int |
|
Onset to needle time |
minutes |
float |
|
Whether IVT was chosen |
1=True, 0=False |
int or bool |
|
Onset to puncture time |
minutes |
float |
|
Whether MT was chosen |
1=True, 0=False |
int or bool |
|
These are expected in a dictionary (or a pandas DataFrame? check this). For example:
# Set random seed for repeatability:
np.random.seed(42)
# All patients share these same treatment times:
time_to_ivt_mins = 90.0
time_to_mt_mins = 120.0
# Numbers of patients with each stroke type:
n_nlvo = 65
n_lvo = 35
n_total = n_lvo + n_nlvo
# Store the patient details in this dictionary:
outcome_inputs_dict = dict(
# Mix of LVO and nLVO:
stroke_type_code=np.array([2]*n_lvo + [1]*n_nlvo),
# Onset to needle time is fixed to 90mins:
onset_to_needle_mins=np.full(n_total, time_to_ivt_mins),
# Randomly pick whether IVT is chosen with around 15% yes:
ivt_chosen_bool=np.random.binomial(1, 0.15, n_total) == 1,
# Onset to puncture time is fixed to 120mins:
onset_to_puncture_mins=np.full(n_total, time_to_mt_mins),
# Randomly pick whether MT is chosen for LVOs with around 30% yes:
mt_chosen_bool=np.concatenate(
(np.random.binomial(1, 0.3, n_lvo), [0]*n_nlvo)) == 1
)
Convert this to a DataFrame for easier viewing:
# Pop them into a dataframe:
outcome_inputs_df = pd.DataFrame(
np.array(list(outcome_inputs_dict.values())).T,#, dtype=object).T,
columns=outcome_inputs_dict.keys(),
)
# Show the first ten patients' details:
outcome_inputs_df.head(10).T
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
stroke_type_code | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 |
onset_to_needle_mins | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 |
ivt_chosen_bool | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
onset_to_puncture_mins | 120.0 | 120.0 | 120.0 | 120.0 | 120.0 | 120.0 | 120.0 | 120.0 | 120.0 | 120.0 |
mt_chosen_bool | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
Note: even though all patients here have valid treatment times, the model only uses the times for patients who answer True
in the treatment chosen arrays.
Alternative: Import data from file#
outcome_inputs_df = pd.read_csv('patients.csv')
outcome_inputs_df.T
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
stroke_type_code | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
onset_to_needle_mins | 0 | 0 | 378 | 0 | 0 | 0 | 378 | 0 | 0 | 378 | 0 | 0 | 0 | 378 |
ivt_chosen_bool | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
onset_to_puncture_mins | 0 | 0 | 0 | 0 | 480 | 0 | 480 | 0 | 0 | 0 | 0 | 480 | 0 | 480 |
mt_chosen_bool | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
Run the continuous outcome model#
Initiate the outcome model object:
continuous_outcome = Continuous_outcome()
Provide it with the data we created above:
continuous_outcome.assign_patients_to_trial(outcome_inputs_df)
The model is expecting to receive patient data for some of the keys in the trial
dictionary:
for key in continuous_outcome.trial.keys():
print(key)
stroke_type_code
stroke_type_code_on_input
onset_to_needle_mins
ivt_chosen_bool
ivt_no_effect_bool
onset_to_puncture_mins
mt_chosen_bool
mt_no_effect_bool
We don’t need to define ivt_no_effect_bool
and mt_no_effect_bool
because they are created when the outcomes are calculated. Those two arrays record whether each patient was treated after the time of no effect.
# Calculate outcomes:
patient_data_dict, outcomes_by_stroke_type, full_cohort_outcomes = (
continuous_outcome.calculate_outcomes())
# Make a copy of the results:
outcomes_by_stroke_type = copy.copy(outcomes_by_stroke_type)
full_cohort_outcomes = copy.copy(full_cohort_outcomes)
Notes:
A copy is made of the output results as currently the original results can be overwritten when a new instance of the
Continuous_outcome
class is created.
The model also produces statistics about the input patient population:
continuous_outcome.stroke_type_stats_df.T
Total | IVT | MT | IVT no effect | MT no effect | No treatment | |
---|---|---|---|---|---|---|
nLVO: Count | 3.000000 | 2.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 |
nLVO: Proportion of this stroke type | 1.000000 | 0.666667 | 0.000000 | 0.333333 | 0.000000 | 0.333333 |
nLVO: Proportion of full cohort | 0.214286 | 0.142857 | 0.000000 | 0.071429 | 0.000000 | 0.071429 |
LVO: Count | 11.000000 | 6.000000 | 8.000000 | 3.000000 | 4.000000 | 1.000000 |
LVO: Proportion of this stroke type | 1.000000 | 0.545455 | 0.727273 | 0.272727 | 0.363636 | 0.090909 |
LVO: Proportion of full cohort | 0.785714 | 0.428571 | 0.571429 | 0.214286 | 0.285714 | 0.071429 |
Other: Count | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Other: Proportion of this stroke type | 1.000000 | NaN | NaN | NaN | NaN | NaN |
Other: Proportion of full cohort | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Results#
The main function returns three sets of results, which we’ve called patient_data_dict
, outcomes_by_stroke_type
and full_cohort_outcomes
.
The first result is a dictionary of the input patient information. This is important for two reasons. Firsly, if not all of the data required for the model was provided by the user, then the output dictionary shows a record of the placeholder data that was used instead. Usually the placeholder data is all zeroes. Secondly, in special cases the input patient data is overwritten. The main special case is patients who have an nLVO and are given MT, where the model updates this so that they are assigned as LVO instead.
The other two are dictionaries and hold similar information. The important difference is that outcomes_by_stroke_type
contains separate information for each occlusion and treatment type, and that full_cohort_outcomes
contains one set of information that matches the input patient types.
The output patient data dictionary contains the following:
pd.DataFrame(patient_data_dict.values(), patient_data_dict.keys())
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
stroke_type_code | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
stroke_type_code_on_input | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
onset_to_needle_mins | 0 | 0 | 378 | 0 | 0 | 0 | 378 | 0 | 0 | 378 | 0 | 0 | 0 | 378 |
ivt_chosen_bool | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
ivt_no_effect_bool | False | False | True | False | False | False | True | False | False | True | False | False | False | True |
onset_to_puncture_mins | 0 | 0 | 0 | 0 | 480 | 0 | 480 | 0 | 0 | 0 | 0 | 480 | 0 | 480 |
mt_chosen_bool | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
mt_no_effect_bool | False | False | False | False | True | False | True | False | False | False | False | True | False | True |
outcomes_by_stroke_type
contains the following keys for each of lvo_ivt
, lvo_mt
, and nlvo_ivt
:
separate_keys = list(outcomes_by_stroke_type.keys())
for key in separate_keys[:len(separate_keys)//3]:
short_key = '_'.join(key.split('_')[2:]) # Remove lvo_ivt label
print('+ ' + short_key)
+ each_patient_mrs_dist_post_stroke
+ mrs_not_treated
+ mrs_no_effect
+ each_patient_mrs_post_stroke
+ each_patient_mrs_shift
+ utility_not_treated
+ utility_no_effect
+ each_patient_utility_post_stroke
+ each_patient_utility_shift
+ valid_patients_mean_mrs_shift
+ valid_patients_mean_utility_shift
+ treated_patients_mean_mrs_shift
+ treated_patients_mean_utility_shift
full_cohort_outcomes
combines the three subgroups back into one big array in the same order as the input data. It contains:
full_cohort_keys = list(full_cohort_outcomes.keys())
for key in full_cohort_keys:
print('+ ' + key)
+ each_patient_mrs_dist_post_stroke
+ each_patient_mrs_post_stroke
+ each_patient_mrs_shift
+ each_patient_utility_post_stroke
+ each_patient_utility_shift
+ mean_mrs_post_stroke
+ mean_mrs_shift
+ mean_utility
+ mean_utility_shift
Example: mean mRS shift#
The mean mRS shift is stored in various different ways depending on what exactly you want to measure. In the separate subgroups, you can see the mean mRS shift for only the patients who were treated:
# Separate subgroup results:
keys_to_print = [
'nlvo_ivt_treated_patients_mean_mrs_shift',
'lvo_ivt_treated_patients_mean_mrs_shift',
'lvo_mt_treated_patients_mean_mrs_shift',
]
for key in keys_to_print:
print(f'{outcomes_by_stroke_type[key]:.3f}')
-0.425
-0.190
-0.651
In the combined full cohort results, you can see the mean mRS shift for each patient separately:
# Combined full cohort results:
full_cohort_outcomes['each_patient_mrs_shift']
array([ 0. , -0.889, 0.039, -1.396, 0.093, -1.396, 0.083, 0. ,
-0.464, 0.083, -1.396, 0.093, -1.396, 0.083])
There is also a full cohort mean mRS shift value…
print(f"{full_cohort_outcomes['mean_mrs_shift']:.3f}")
-0.462
… which is the mean of all values in the previous array including the zeroes. This might not be as useful a metric as the three separate mean mRS shifts from the nLVO+IVT, LVO+IVT, and LVO+MT subgroups.
Controlling mRS distributions and utility weightings#
The model contains datasets for mRS distributions and utility weights. These are automatically loaded and used in the model. If you want to use different data, you can provide it when you create the model.
Custom times of no effect for thrombolysis and thrombectomy can be passed into the model. The default values match the mRS distributions imported from file.
Import reference data#
These functions import data from .csv
files that are included in the stroke-outcome
package. Each function imports both a copy of the data in the table and the notes in the header at the top of the file.
Data set 1: Cumulative probability distributions of modified Rankin scale (mRS):
mrs_dists, mrs_dists_notes = (
outcome_utilities.import_mrs_dists_from_file())
mrs_dists
mRS<=0 | mRS<=1 | mRS<=2 | mRS<=3 | mRS<=4 | mRS<=5 | mRS<=6 | |
---|---|---|---|---|---|---|---|
Stroke type | |||||||
pre_stroke_nlvo | 0.583 | 0.746 | 0.850 | 0.951 | 0.993 | 1.000 | 1 |
pre_stroke_lvo | 0.408 | 0.552 | 0.672 | 0.838 | 0.956 | 1.000 | 1 |
no_treatment_lvo | 0.050 | 0.129 | 0.265 | 0.429 | 0.676 | 0.811 | 1 |
no_treatment_nlvo | 0.198 | 0.460 | 0.580 | 0.708 | 0.856 | 0.918 | 1 |
no_effect_nlvo_ivt_deaths | 0.196 | 0.455 | 0.574 | 0.701 | 0.847 | 0.908 | 1 |
no_effect_lvo_ivt_deaths | 0.048 | 0.124 | 0.255 | 0.414 | 0.653 | 0.783 | 1 |
no_effect_lvo_mt_deaths | 0.048 | 0.124 | 0.255 | 0.412 | 0.649 | 0.779 | 1 |
t0_treatment_nlvo_ivt | 0.445 | 0.642 | 0.752 | 0.862 | 0.941 | 0.967 | 1 |
t0_treatment_lvo_ivt | 0.140 | 0.233 | 0.361 | 0.522 | 0.730 | 0.838 | 1 |
t0_treatment_lvo_mt | 0.306 | 0.429 | 0.548 | 0.707 | 0.851 | 0.915 | 1 |
print(mrs_dists_notes)
# If these change, please ensure the no-effect times are still correct.,,,,,,
#,,,,,,,
# Acronyms: No-effect times:,,,,,,,
# lvo: large-vessel occlusion IVT: 378mins (6.3hr),,,,,,,
# nlvo: non-large-vessel occlusion MT: 480mins (8hr),,,,,,,
# ivt: intra-veneous thrombolysis,,,,,,,
# mt: mechanical thrombectomy,,,,,,,
# t0: time zero, zero minutes after stroke onset.,,,,,,
#,,,,,,,
Data set 2: Utility score for each mRS level.
This does not have to be provided to the model. If it is not, default values are used. The default values are from the Wang et al. 2020 paper.
utility_dists, utility_dists_notes = (
outcome_utilities.import_utility_dists_from_file())
utility_dists
mRS=0 | mRS=1 | mRS=2 | mRS=3 | mRS=4 | mRS=5 | mRS=6 | |
---|---|---|---|---|---|---|---|
Name | |||||||
Wang2020 | 0.97 | 0.88 | 0.74 | 0.55 | 0.20 | -0.19 | 0.0 |
Dijkland2018 | 0.95 | 0.93 | 0.83 | 0.62 | 0.42 | 0.11 | 0.0 |
print(utility_dists_notes)
#
# References:
# Wang et al. 2020: Stroke. 2020;51:2411–2417
# Dijkland et al. 2018: Stroke. 2018;49:965–971
Choose one of these distributions to use as the utility weights:
utility_weights = utility_dists.loc['Wang2020'].values
utility_weights
array([ 0.97, 0.88, 0.74, 0.55, 0.2 , -0.19, 0. ])
Custom data requirements#
If you would rather use custom data for the mRS and utility distributions, you can skip the above imports. The outcome model can accept any data in these formats:
mRS distributions: must be a pandas DataFrame with:
columns for each mRS from 0 to 6 inclusive, named just as in the imported table above
rows for each group named just as in the imported table above
values between 0 and 1.
utility weights: must be a numpy array with:
shape (7,), i.e. a one-dimensional array containing one value for each mRS from 0 to 6 inclusive
values consistent with a utility of 1 for full health and a utility of 0 for death.
If the custom mRS distributions use different times of no effect, make sure that the new times are passed into the outcome model when it is initialised.
Running the model with custom data#
The following code creates another model object with the mRS distributions and utility weights given explicitly: