Demo: discrete outcome model

Demo: discrete outcome model#

This notebook contains a basic example of running the discrete outcome model within the stroke-outcome package.

Notebook setup#

The intended versions of these packages are given in the files environment.yml and requirements.txt.

# Import required packages
import numpy as np
import pandas as pd
import copy

# Imports from the stroke_outcome package:
from stroke_outcome.discrete_outcome import Discrete_outcome
import stroke_outcome.outcome_utilities as outcome_utilities

Create patient data#

The model needs the following data as inputs, with one value per patient in each array:

Data	Units	Data type	Name
Stroke type code	0=other, 1=nLVO, 2=LVO	int	`stroke_type_code`
Onset to needle time	minutes	float	`onset_to_needle_mins`
Whether IVT was chosen	1=True, 0=False	int or bool	`ivt_chosen_bool`
Onset to puncture time	minutes	float	`onset_to_puncture_mins`
Whether MT was chosen	1=True, 0=False	int or bool	`mt_chosen_bool`
* Pre-stroke mRS score	mRS score	int	`mrs_pre_stroke`
* Fixed probability	None	float	`x_pre_stroke`

* Either the pre-stroke mRS score or the fixed probability score should be provided. If both are given, the fixed probability score is prioritised and the mRS score is discarded.

These are expected in a dictionary (or a pandas DataFrame? check this). For example:

# Set random seed for repeatability:
np.random.seed(42)

# All patients share these same treatment times:
time_to_ivt_mins = 90.0
time_to_mt_mins = 120.0

# Numbers of patients with each stroke type:
n_nlvo = 65
n_lvo = 35
n_total = n_lvo + n_nlvo

# Store the patient details in this dictionary:
outcome_inputs_dict = dict(
    # Mix of LVO and nLVO:
    stroke_type_code=np.array([2]*n_lvo + [1]*n_nlvo),
    # Onset to needle time is fixed to 90mins:
    onset_to_needle_mins=np.full(n_total, time_to_ivt_mins),
    # Randomly pick whether IVT is chosen with around 15% yes: 
    ivt_chosen_bool=np.random.binomial(1, 0.15, n_total) == 1,
    # Onset to puncture time is fixed to 120mins:
    onset_to_puncture_mins=np.full(n_total, time_to_mt_mins),
    # Randomly pick whether MT is chosen for LVOs with around 30% yes:
    mt_chosen_bool=np.concatenate(
        (np.random.binomial(1, 0.3, n_lvo), [0]*n_nlvo)) == 1,
    # Randomly pick some pre-stroke mRS scores from 0 to 5:
    mrs_pre_stroke=np.random.choice(np.arange(6), size=n_total)
)

Convert this to a DataFrame for easier viewing:

# Pop them into a dataframe:
outcome_inputs_df = pd.DataFrame(
    np.array(list(outcome_inputs_dict.values())).T,
    columns=outcome_inputs_dict.keys(),
)

# Show the first ten patients' details:
outcome_inputs_df.head(10).T

	0	1	2	3	4	5	6	7	8	9
stroke_type_code	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0
onset_to_needle_mins	90.0	90.0	90.0	90.0	90.0	90.0	90.0	90.0	90.0	90.0
ivt_chosen_bool	0.0	1.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0
onset_to_puncture_mins	120.0	120.0	120.0	120.0	120.0	120.0	120.0	120.0	120.0	120.0
mt_chosen_bool	0.0	0.0	0.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0
mrs_pre_stroke	5.0	0.0	3.0	0.0	5.0	0.0	1.0	3.0	3.0	5.0

Note: even though all patients here have valid treatment times, the model only uses the times for patients who answer True in the treatment chosen arrays.

Alternative: Import data from file#

outcome_inputs_df = pd.read_csv('patients.csv')

outcome_inputs_df['mrs_pre_stroke'] = np.random.choice(np.arange(6), size=len(outcome_inputs_df))

outcome_inputs_df.T

	0	1	2	3	4	5	6	7	8	9	10	11	12	13
stroke_type_code	1	1	1	1	1	1	1	2	2	2	2	2	2	2
onset_to_needle_mins	0	0	378	0	0	0	378	0	0	378	0	0	0	378
ivt_chosen_bool	0	1	1	0	0	1	1	0	1	1	0	0	1	1
onset_to_puncture_mins	0	0	0	0	480	0	480	0	0	0	0	480	0	480
mt_chosen_bool	0	0	0	1	1	1	1	0	0	0	1	1	1	1
mrs_pre_stroke	2	5	0	2	0	4	1	5	1	1	5	2	4	0

Run the discrete outcome model#

Initiate the outcome model object:

discrete_outcome = Discrete_outcome()

Provide it with the data we created above:

discrete_outcome.assign_patients_to_trial(outcome_inputs_df)

The model is expecting to receive patient data for some of the keys in the trial dictionary:

for key in discrete_outcome.trial.keys():
    print(key)

stroke_type_code
stroke_type_code_on_input
onset_to_needle_mins
ivt_chosen_bool
ivt_no_effect_bool
onset_to_puncture_mins
mt_chosen_bool
mt_no_effect_bool
mrs_pre_stroke
fixed_prob_pre_stroke

We don’t need to define ivt_no_effect_bool and mt_no_effect_bool because they are created when the outcomes are calculated. Those two arrays record whether each patient was treated after the time of no effect.

If x_pre_stroke is provided, it will be used to calculate new values for mrs_pre_stroke. Otherwise, if x_pre_stroke is not provided, it will be calculated from the mrs_pre_stroke values.

# Calculate outcomes:
patient_data_dict, outcomes_by_stroke_type, full_cohort_outcomes = (
    discrete_outcome.calculate_outcomes())

# Make a copy of the results:
outcomes_by_stroke_type = copy.copy(outcomes_by_stroke_type)
full_cohort_outcomes = copy.copy(full_cohort_outcomes)

/home/anna/miniconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/home/anna/miniconda3/lib/python3.9/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)

Notes:

A copy is made of the output results as currently the original results can be overwritten when a new instance of the Discrete_outcome class is created.

The model also produces statistics about the input patient population:

discrete_outcome.stroke_type_stats_df.T

	Total	IVT	MT	IVT no effect	MT no effect	No treatment
nLVO: Count	3.000000	2.000000	0.000000	1.000000	0.000000	1.000000
nLVO: Proportion of this stroke type	1.000000	0.666667	0.000000	0.333333	0.000000	0.333333
nLVO: Proportion of full cohort	0.214286	0.142857	0.000000	0.071429	0.000000	0.071429
LVO: Count	11.000000	6.000000	8.000000	3.000000	4.000000	1.000000
LVO: Proportion of this stroke type	1.000000	0.545455	0.727273	0.272727	0.363636	0.090909
LVO: Proportion of full cohort	0.785714	0.428571	0.571429	0.214286	0.285714	0.071429
Other: Count	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
Other: Proportion of this stroke type	1.000000	NaN	NaN	NaN	NaN	NaN
Other: Proportion of full cohort	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000

Results#

The main function returns three sets of results, which we’ve called patient_data_dict, outcomes_by_stroke_type and full_cohort_outcomes.

The first result is a dictionary of the input patient information. This is important for two reasons. Firsly, if not all of the data required for the model was provided by the user, then the output dictionary shows a record of the placeholder data that was used instead. Usually the placeholder data is all zeroes. Secondly, in special cases the input patient data is overwritten. The main special case is patients who have an nLVO and are given MT, where the model updates this so that they are assigned as LVO instead.

The other two are dictionaries and hold similar information. The important difference is that outcomes_by_stroke_type contains separate information for each occlusion and treatment type, and that full_cohort_outcomes contains one set of information that matches the input patient types.

The output patient data dictionary contains the following:

pd.DataFrame(patient_data_dict.values(), patient_data_dict.keys())

	0	1	2	3	4	5	6	7	8	9	10	11	12	13
stroke_type_code	1	1	1	2	2	2	2	2	2	2	2	2	2	2
stroke_type_code_on_input	1	1	1	1	1	1	1	2	2	2	2	2	2	2
onset_to_needle_mins	0	0	378	0	0	0	378	0	0	378	0	0	0	378
ivt_chosen_bool	0	1	1	0	0	1	1	0	1	1	0	0	1	1
ivt_no_effect_bool	False	False	True	False	False	False	True	False	False	True	False	False	False	True
onset_to_puncture_mins	0	0	0	0	480	0	480	0	0	0	0	480	0	480
mt_chosen_bool	0	0	0	1	1	1	1	0	0	0	1	1	1	1
mt_no_effect_bool	False	False	False	False	True	False	True	False	False	False	False	True	False	True
mrs_pre_stroke	2	5	0	2	0	4	1	5	1	1	5	2	4	0
fixed_prob_pre_stroke	0.747289	0.996485	0.102543	0.625106	0.074729	0.915893	0.52428	0.982475	0.520959	0.461934	0.999098	0.621112	0.87199	0.153145

outcomes_by_stroke_type contains the following keys for each of lvo_ivt, lvo_mt, and nlvo_ivt:

separate_keys = list(outcomes_by_stroke_type.keys())

for key in separate_keys[:len(separate_keys)//3]:
    short_key = '_'.join(key.split('_')[2:])  # Remove lvo_ivt label
    print('+ ' + short_key) 

+ each_patient_mrs_dist_post_stroke
+ each_patient_mrs_not_treated
+ each_patient_mrs_post_stroke
+ each_patient_utility_not_treated
+ each_patient_utility_post_stroke
+ each_patient_mrs_shift
+ each_patient_utility_shift
+ valid_patients_mean_mrs_post_stroke
+ valid_patients_mean_mrs_not_treated
+ valid_patients_mean_mrs_shift
+ valid_patients_mean_utility_post_stroke
+ valid_patients_mean_utility_not_treated
+ valid_patients_mean_utility_shift
+ treated_patients_mean_mrs_post_stroke
+ treated_patients_mean_mrs_shift
+ treated_patients_mean_utility_post_stroke
+ treated_patients_mean_utility_shift
+ improved_patients_mean_mrs_post_stroke
+ improved_patients_mean_mrs_shift
+ improved_patients_mean_utility_post_stroke
+ improved_patients_mean_utility_shift
+ proportion_of_valid_patients_who_improved
+ proportion_of_treated_patients_who_improved

full_cohort_outcomes combines the three subgroups back into one big array in the same order as the input data. It contains:

full_cohort_keys = list(full_cohort_outcomes.keys())

for key in full_cohort_keys:
    print('+ ' + key)

+ each_patient_mrs_dist_post_stroke
+ each_patient_mrs_post_stroke
+ each_patient_mrs_not_treated
+ each_patient_mrs_shift
+ each_patient_utility_post_stroke
+ each_patient_utility_not_treated
+ each_patient_utility_shift
+ mean_mrs_post_stroke
+ mean_mrs_not_treated
+ mean_mrs_shift
+ mean_utility
+ mean_utility_not_treated
+ mean_utility_shift

Example: mean mRS shift#

The mean mRS shift is stored in various different ways depending on what exactly you want to measure. In the separate subgroups, you can see the mean mRS shift for only the patients who were treated:

# Separate subgroup results:
keys_to_print = [
    'nlvo_ivt_treated_patients_mean_mrs_shift',
    'lvo_ivt_treated_patients_mean_mrs_shift',
    'lvo_mt_treated_patients_mean_mrs_shift',
]
for key in keys_to_print:
    print(f'{outcomes_by_stroke_type[key]:.3f}')

0.000
0.000
-0.375

In the combined full cohort results, you can see the mean mRS shift for each patient separately:

# Combined full cohort results:
full_cohort_outcomes['each_patient_mrs_shift']

array([ 0.,  0.,  0., -1.,  0., -1.,  0.,  0.,  0.,  0.,  0.,  0., -1.,
        0.])

There is also a full cohort mean mRS shift value…

print(f"{full_cohort_outcomes['mean_mrs_shift']:.3f}")

-0.214

… which is the mean of all values in the previous array including the zeroes. This might not be as useful a metric as the three separate mean mRS shifts from the nLVO+IVT, LVO+IVT, and LVO+MT subgroups.

Controlling mRS distributions and utility weightings#

The model contains datasets for mRS distributions and utility weights. These are automatically loaded and used in the model. If you want to use different data, you can provide it when you create the model.

Custom times of no effect for thrombolysis and thrombectomy can be passed into the model. The default values match the mRS distributions imported from file.

Import reference data#

These functions import data from .csv files that are included in the stroke-outcome package. Each function imports both a copy of the data in the table and the notes in the header at the top of the file.

Data set 1: Cumulative probability distributions of modified Rankin scale (mRS):

mrs_dists, mrs_dists_notes = (
    outcome_utilities.import_mrs_dists_from_file())

mrs_dists

	mRS<=0	mRS<=1	mRS<=2	mRS<=3	mRS<=4	mRS<=5	mRS<=6
Stroke type
pre_stroke_nlvo	0.582881	0.745419	0.848859	0.951082	0.993055	1.000000	1.0
pre_stroke_nlvo_ivt_deaths	0.576469	0.737219	0.839522	0.940620	0.982131	0.989000	1.0
pre_stroke_lvo	0.417894	0.560853	0.679283	0.843494	0.957269	1.000000	1.0
pre_stroke_lvo_ivt_deaths	0.403644	0.541728	0.656119	0.814731	0.924626	0.965900	1.0
pre_stroke_lvo_mt_deaths	0.402850	0.540662	0.654829	0.813128	0.922807	0.964000	1.0
no_treatment_nlvo	0.197144	0.460000	0.580032	0.707768	0.855677	0.917702	1.0
no_effect_nlvo_ivt_deaths	0.197271	0.460000	0.577583	0.702252	0.845244	0.904454	1.0
t0_treatment_nlvo_ivt	0.429808	0.630000	0.738212	0.848427	0.929188	0.956300	1.0
no_treatment_lvo	0.050000	0.129000	0.265000	0.429000	0.676000	0.811000	1.0
no_effect_lvo_ivt_deaths	0.047898	0.123576	0.253858	0.410962	0.647576	0.776900	1.0
no_effect_lvo_mt_deaths	0.047781	0.123274	0.253237	0.409957	0.645993	0.775000	1.0
t0_treatment_lvo_ivt	0.112916	0.200000	0.327377	0.484757	0.698212	0.811443	1.0
t0_treatment_lvo_mt	0.314082	0.436315	0.554431	0.712335	0.853604	0.916750	1.0

print(mrs_dists_notes)

# If these change, please ensure the no-effect times are still correct.
#
# Acronyms:                                      No-effect times:
#   lvo:  large-vessel occlusion                 IVT: 378mins (6.3hr)
#   nlvo: non-large-vessel occlusion             MT:  480mins (8hr)
#   ivt:  intra-veneous thrombolysis
#   mt:   mechanical thrombectomy
#   t0:   time zero, zero minutes after stroke onset.
#

OPTIONAL Data set 2: Utility score for each mRS level.

This does not have to be provided to the model. If it is not, default values are used. The default values are from the Wang et al. 2020 paper.

utility_dists, utility_dists_notes = (
    outcome_utilities.import_utility_dists_from_file())

utility_dists

	mRS=0	mRS=1	mRS=2	mRS=3	mRS=4	mRS=5	mRS=6
Name
Wang2020	0.97	0.88	0.74	0.55	0.20	-0.19	0.0
Dijkland2018	0.95	0.93	0.83	0.62	0.42	0.11	0.0

print(utility_dists_notes)

# 
# References:
#   Wang et al. 2020:     Stroke. 2020;51:2411–2417
#   Dijkland et al. 2018: Stroke. 2018;49:965–971

Choose one of these distributions to use as the utility weights:

utility_weights = utility_dists.loc['Wang2020'].values

utility_weights

array([ 0.97,  0.88,  0.74,  0.55,  0.2 , -0.19,  0.  ])

Custom data requirements#

If you would rather use custom data for the mRS and utility distributions, you can skip the above imports. The outcome model can accept any data in these formats:

mRS distributions: must be a pandas DataFrame with:
- columns for each mRS from 0 to 6 inclusive, named just as in the imported table above
- rows for each group named just as in the imported table above
- values between 0 and 1.
utility weights: must be a numpy array with:
- shape (7,), i.e. a one-dimensional array containing one value for each mRS from 0 to 6 inclusive
- values consistent with a utility of 1 for full health and a utility of 0 for death.

If the custom mRS distributions use different times of no effect, make sure that the new times are passed into the outcome model when it is initialised.

Running the model with custom data#

The following code creates another model object with the mRS distributions and utility weights given explicitly: