Investigating the output of neural net embedding subnets - with 2D subnet output#

Aims#

To investigate the output of the hospital and clinical subnets of the embedding neural network.
To examine the link between hospital subnet output and use of thrombolysis in hospitals - both the actual thrombolysis use, and the predicted thrombolysis use of a 10k set of patients passed through all hopsital moodels.
To examine the link between the patient clinical feature subnet output and the use of thrombolysis, and the link between patient features and the clinical feature subnet output

Neural Network structure#

The model contains three subnets that take portions of the data. The output of these subnets is an n-dimensional vector. In this case the output is a 2D vector, that is each subnet is reduced to a single value output. The subnets created are for:

Patient clinical data: Age, gender, ethnicity, disability before stroke, stroke scale data. Pass through one hidden layer (with 2x neurons as input features) and then to single neuron with sigmoid activation.
Pathway process data: Times of arrival and scan, time of day, day of week. Pass through one hidden layer (with 2x neurons as input features) and then to single neuron with sigmoid activation.
Hospital ID (one-hot encoded): Connect input directly to single neuron with sigmoid activation.

The outputs of the three subnet outputs are then passed to a single neuron with sigmoid activation for final output.

Fitting of model#

The model has been pretrained (see the notebook Modular TensorFlow model with 1D embedding - Train and save model for 10k patient subset)

Load libraries#

# Turn warnings off to keep notebook tidy
import warnings
warnings.filterwarnings("ignore")

import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import numpy as np
import pandas as pd

# sklearn for pre-processing
from sklearn.preprocessing import MinMaxScaler

# TensorFlow api model
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K
from tensorflow.keras.losses import binary_crossentropy

Define function to scale data#

Scale input data 0-1 (MinMax scaling).

def scale_data(X_train, X_test):
    """Scale data 0-1 based on min and max in training set"""
    
    # Initialise a new scaling object for normalising input data
    sc = MinMaxScaler()

    # Set up the scaler just on the training set
    sc.fit(X_train)

    # Apply the scaler to the training and test sets
    train_sc = sc.transform(X_train)
    test_sc = sc.transform(X_test)
    
    return train_sc, test_sc

Get model outputs for test data#

Get prediction probabilities for the test 10k training set. Training data is used only to scale test set X values.

This prediction run is used to check model, and get accuracy.

# Load data
train = pd.read_csv(f'../data/10k_training_test/cohort_10000_train.csv')
test = pd.read_csv(f'../data/10k_training_test/cohort_10000_test.csv')

all_test = test.copy()

# Get data subgroups
subgroups = pd.read_csv('../data/subnet.csv', index_col='Item')
# Get list of clinical items
clinical_subgroup = subgroups.loc[subgroups['Subnet']=='clinical']
clinical_subgroup = list(clinical_subgroup.index)
# Get list of pathway items
pathway_subgroup = subgroups.loc[subgroups['Subnet']=='pathway']
pathway_subgroup = list(pathway_subgroup.index)
# Get list of hospital items
hospital_subgroup = subgroups.loc[subgroups['Subnet']=='hospital']
hospital_subgroup = list(hospital_subgroup.index)

# OneHot encode stroke team
coded = pd.get_dummies(train['StrokeTeam'])
train = pd.concat([train, coded], axis=1)
train.drop('StrokeTeam', inplace=True, axis=1)
coded = pd.get_dummies(test['StrokeTeam'])
test = pd.concat([test, coded], axis=1)
test.drop('StrokeTeam', inplace=True, axis=1)

# Split into X, y
X_train_df = train.drop('S2Thrombolysis',axis=1) 
y_train_df = train['S2Thrombolysis']
X_test_df = test.drop('S2Thrombolysis',axis=1) 
y_test_df = test['S2Thrombolysis'] 

# Split train and test data by subgroups
X_train_patients = X_train_df[clinical_subgroup]
X_test_patients = X_test_df[clinical_subgroup]
X_train_pathway = X_train_df[pathway_subgroup]
X_test_pathway = X_test_df[pathway_subgroup]
X_train_hospitals = X_train_df[hospital_subgroup]
X_test_hospitals = X_test_df[hospital_subgroup]

# Convert to NumPy
X_train = X_train_df.values
X_test = X_test_df.values
y_train = y_train_df.values
y_test = y_test_df.values

# Scale data
X_train_patients_sc, X_test_patients_sc = \
    scale_data(X_train_patients, X_test_patients)

X_train_pathway_sc, X_test_pathway_sc = \
    scale_data(X_train_pathway, X_test_pathway)

X_train_hospitals_sc, X_test_hospitals_sc = \
    scale_data(X_train_hospitals, X_test_hospitals)

# Load model
path = './saved_models/2d_for_10k/'
filename = f'{path}model.h5'
model = keras.models.load_model(filename)

# Test model
probability = model.predict(
    [X_test_patients_sc, X_test_pathway_sc, X_test_hospitals_sc])
y_pred_test = probability >= 0.5
y_pred_test = y_pred_test.flatten()
accuracy_test = np.mean(y_pred_test == y_test)
print(f'Accuracy test {accuracy_test:0.3f}')

all_test['model_prob'] = probability.flatten()
all_test['prediction'] = y_pred_test

Accuracy test 0.843

Get predictions for thrombolysis use of 10k set of patients at each hospital#

Here we ask the counter-factual question - “what treatment would a patient be expected to receive at each of the 132 hospitals?”.

Hospital is one-hot encoded as input to the hospital subnet. To make a prediction of treatment at different hospitals we change the one-hot encoding of the hospital when making prediction.

For each hospital we pass through the 10k test set, and record the proportion of the patients receiving thrombolysis at that hospital.

# Get number of hospitals
num_hospitals = len(X_test_hospitals_sc[0])
# Create test array for changing hospital ID
X_hospitals_alter = X_test_hospitals_sc.copy()
# List for all patient classifications at each hospital
patient_results = []

# Loop through setting hospital
hospital_results = []
for hosp in range(num_hospitals):
    # Set all hospitals to zero
    X_hospitals_alter[:,:] = 0
    # Set test hospital to 1
    X_hospitals_alter[:,hosp] = 1
    # Get probability of thrombolysis
    probability = model.predict(
        [X_test_patients_sc, X_test_pathway_sc, X_hospitals_alter])
    # Classify
    classified = probability >= 0.5
    # Get average thrombolysis (we are not stroring all individual results)
    thrombolysis = classified.mean()
    hospital_results.append(thrombolysis)
    patient_results.append(classified)

# Put results in DataFrame 
predicted_thrombolysis = pd.DataFrame()
predicted_thrombolysis['hospital'] = hospital_subgroup
predicted_thrombolysis['10k_thrombolysis'] = hospital_results

# Show DataFrame
predicted_thrombolysis

	hospital	10k_thrombolysis
0	AGNOF1041H	0.2886
1	AKCGO9726K	0.3829
2	AOBTM3098N	0.2320
3	APXEE8191H	0.2909
4	ATDID5461S	0.2728
...	...	...
127	YPKYH1768F	0.2620
128	YQMZV4284N	0.3341
129	ZBVSO0975W	0.2301
130	ZHCLE1578P	0.3126
131	ZRRCV7012C	0.1883

132 rows × 2 columns

patient_thromb_results = np.sum(patient_results, axis=0).flatten()
all_test['num_hosp_thrombolysing'] = patient_thromb_results
all_test

	StrokeTeam	S1AgeOnArrival	S1OnsetToArrival_min	S2RankinBeforeStroke	Loc	LocQuestions	LocCommands	BestGaze	Visual	FacialPalsy	...	S2StrokeType_Primary Intracerebral Haemorrhage	S2StrokeType_missing	S2TIAInLastMonth_No	S2TIAInLastMonth_No but	S2TIAInLastMonth_Yes	S2TIAInLastMonth_missing	S2Thrombolysis	model_prob	prediction	num_hosp_thrombolysing
0	LGNPK4211W	67.5	193.0	1	0	2.0	0.0	0.0	0.0	0.0	...	1	0	0	0	0	1	0	0.000038	False	0
1	LZGVG8257A	62.5	54.0	2	0	0.0	0.0	0.0	0.0	0.0	...	0	0	0	0	0	1	0	0.455021	False	36
2	DNOYM6465G	82.5	173.0	0	0	0.0	0.0	0.0	0.0	1.0	...	0	0	0	0	0	1	0	0.074149	False	0
3	ISIZF6614O	72.5	159.0	1	0	2.0	0.0	0.0	2.0	0.0	...	0	0	0	0	0	1	0	0.556605	True	113
4	NGKDE7265L	87.5	145.0	0	0	0.0	0.0	0.0	0.0	1.0	...	0	0	0	0	0	1	0	0.137400	False	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
9995	NFBUF0424E	57.5	99.0	0	1	2.0	2.0	1.0	2.0	2.0	...	1	0	0	0	0	1	0	0.002355	False	0
9996	UJETD9177J	87.5	159.0	3	0	2.0	2.0	0.0	0.0	0.0	...	0	0	0	0	0	1	1	0.214499	False	1
9997	BICAW1125K	67.5	142.0	0	0	0.0	0.0	0.0	2.0	0.0	...	0	0	0	0	0	1	0	0.730262	True	132
9998	CYVHC2532V	72.5	101.0	0	0	0.0	0.0	0.0	0.0	1.0	...	0	0	0	0	0	1	0	0.000762	False	0
9999	FCCJC8768V	87.5	106.0	2	1	1.0	1.0	0.0	0.0	1.0	...	1	0	0	0	0	1	0	0.005808	False	0

10000 rows × 104 columns

Get subnet outputs#

Get hospital subnet output#

results = []
for hosp in range(num_hospitals):
    # Set all hospitals to zero
    X_hospitals_alter[:,:] = 0
    # Set test hospital to 1
    X_hospitals_alter[:,hosp] = 1
    # Get hospital subnet output
    layer_name = 'hospital_encode'
    hospital_encode_model = keras.Model(
        inputs=model.input,outputs=model.get_layer(layer_name).output)
    hospital_encode_output = hospital_encode_model([
        X_test_patients_sc, X_test_pathway_sc, X_hospitals_alter])
    results.append(np.mean(hospital_encode_output.numpy(),axis=0))

results = np.array(results)
hospital_encoding = pd.DataFrame(index=hospital_subgroup)
hospital_encoding['hosp_encode_x'] = results[:, 0]
hospital_encoding['hosp_encode_y'] = results[:, 1]
all_test = all_test.merge(
    hospital_encoding, left_on='StrokeTeam', right_index=True, how='left')

Get patient subnet output#

# Get patient subnet output
layer_name = 'patient_encode'
patient_encode_model = keras.Model(
    inputs=model.input,outputs=model.get_layer(layer_name).output)
patient_encode_output = patient_encode_model([
    X_test_patients_sc, X_test_pathway_sc, X_test_hospitals_sc])
patient_encode_output = patient_encode_output.numpy()

patient_encode_output

array([[0.9715982 , 0.0169461 ],
       [0.28645337, 0.6916805 ],
       [0.22771867, 0.24591173],
       ...,
       [0.14738613, 0.97659886],
       [0.910986  , 0.4587859 ],
       [0.9470407 , 0.01368953]], dtype=float32)

all_test['patient_encode_x'] = patient_encode_output[:, 0]
all_test['patient_encode_y'] = patient_encode_output[:, 1]

Get pathway subnet output#

# Get patient subnet output
layer_name = 'pathway_encode'
pathway_encode_model = keras.Model(
    inputs=model.input,outputs=model.get_layer(layer_name).output)
pathway_encode_output = pathway_encode_model([
    X_test_patients_sc, X_test_pathway_sc, X_test_hospitals_sc])
pathway_encode_output = pathway_encode_output.numpy()

pathway_encode_output

array([[9.0462186e-30, 4.6191800e-02],
       [1.0000000e+00, 9.1275048e-01],
       [4.0270866e-23, 8.5490012e-01],
       ...,
       [1.0000000e+00, 9.2618495e-01],
       [1.8072895e-21, 2.0514728e-01],
       [1.0000000e+00, 9.5640117e-01]], dtype=float32)

all_test['pathway_encode_x'] = pathway_encode_output[:, 0]
all_test['pathway_encode_y'] = pathway_encode_output[:, 1]
all_test.to_csv('./output/2d_outputs.csv')
all_test

	StrokeTeam	S1AgeOnArrival	S1OnsetToArrival_min	S2RankinBeforeStroke	Loc	LocQuestions	LocCommands	BestGaze	Visual	FacialPalsy	...	S2Thrombolysis	model_prob	prediction	num_hosp_thrombolysing	hosp_encode_x	hosp_encode_y	patient_encode_x	patient_encode_y	pathway_encode_x	pathway_encode_y
0	LGNPK4211W	67.5	193.0	1	0	2.0	0.0	0.0	0.0	0.0	...	0	0.000038	False	0	0.994444	0.999999	0.971598	0.016946	9.046219e-30	0.046192
1	LZGVG8257A	62.5	54.0	2	0	0.0	0.0	0.0	0.0	0.0	...	0	0.455021	False	36	0.054258	0.999681	0.286453	0.691680	1.000000e+00	0.912750
2	DNOYM6465G	82.5	173.0	0	0	0.0	0.0	0.0	0.0	1.0	...	0	0.074149	False	0	0.000313	0.962226	0.227719	0.245912	4.027087e-23	0.854900
3	ISIZF6614O	72.5	159.0	1	0	2.0	0.0	0.0	2.0	0.0	...	0	0.556605	True	113	0.359619	0.999920	0.134090	0.811445	1.000000e+00	0.839783
4	NGKDE7265L	87.5	145.0	0	0	0.0	0.0	0.0	0.0	1.0	...	0	0.137400	False	0	0.000113	0.842160	0.935024	0.970843	1.000000e+00	0.926450
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
9995	NFBUF0424E	57.5	99.0	0	1	2.0	2.0	1.0	2.0	2.0	...	0	0.002355	False	0	0.679200	0.999900	0.973709	0.005022	1.000000e+00	0.846838
9996	UJETD9177J	87.5	159.0	3	0	2.0	2.0	0.0	0.0	0.0	...	1	0.214499	False	1	0.000235	0.951178	0.063271	0.441744	5.232738e-21	0.836204
9997	BICAW1125K	67.5	142.0	0	0	0.0	0.0	0.0	2.0	0.0	...	0	0.730262	True	132	0.277610	0.999906	0.147386	0.976599	1.000000e+00	0.926185
9998	CYVHC2532V	72.5	101.0	0	0	0.0	0.0	0.0	0.0	1.0	...	0	0.000762	False	0	0.147991	0.999695	0.910986	0.458786	1.807289e-21	0.205147
9999	FCCJC8768V	87.5	106.0	2	1	1.0	1.0	0.0	0.0	1.0	...	0	0.005808	False	0	0.323247	0.999919	0.947041	0.013690	1.000000e+00	0.956401

10000 rows × 110 columns

Comparing hospital subnet activation with thrombolysis use at each hospital#

The hospital subnet outputs a single value (in the range 0-1) for each hospital.

Here we compare the hospital subnet output for each hospital with:

The actual use of thrombolysis for 10k test-set patients at their own hospital only.
The expected use of thrombolysis in the 10k test-set for each hospital. The 10k patient data set is passed through all hospital models (by changing the one-hot hospital encoding).

# Set up figure
fig = plt.figure(figsize=(6,6))

# Plot 1: actual vs subnet
ax1 = fig.add_subplot(111)
cmap = plt.cm.viridis
min_val = predicted_thrombolysis['10k_thrombolysis'].min()
max_val = predicted_thrombolysis['10k_thrombolysis'].max()
im = ax1.scatter(hospital_encoding['hosp_encode_x'],
            hospital_encoding['hosp_encode_y'],
            c=predicted_thrombolysis['10k_thrombolysis'],
            vmin=min_val, vmax=max_val, s=35, cmap=cmap)
cbar = plt.colorbar(im, ax=ax1)
cbar.set_label('Hospital thrombolysis use', rotation=90, labelpad=10)
ax1.set_xlabel('Hospital subnet embedding output 1')
ax1.set_ylabel('Hospital subnet embedding output 2')
plt.savefig('./output/hospital_subnet_scatter.jpg', dpi=300)
plt.show()

../_images/005b_analyse_10k_2d_modular_21_0.png

In this output, all hospitals appear to sit on \(x=0\) or \(y=1\), with use of thrombolysis in the hospital increasing along the \(y\) axis and then the \(x\) axis.

Comparing clinical subnet activation with the number of hospitals that are predicted to give thrombolysis to each patient#

# Set up figure
fig = plt.figure(figsize=(6,6))

# Plot 1: actual vs subnet
cmap = plt.cm.viridis
ax1 = fig.add_subplot(111)
im = ax1.scatter(all_test['patient_encode_x'],
            all_test['patient_encode_y'],
            c=all_test['num_hosp_thrombolysing'],
            vmin=0, vmax=132, s=25, cmap=cmap, alpha=0.5)
cbar = plt.colorbar(im, ax=ax1)
cbar.set_label('Number of hospitals predicted to give thrombolysis', rotation=90, labelpad=10)
ax1.set_xlabel('Clinical subnet embedding output 1')
ax1.set_ylabel('Clinical subnet embedding output 2')
plt.savefig('./output/clinical_subnet_scatter.jpg', dpi=300)
plt.show()

../_images/005b_analyse_10k_2d_modular_24_0.png

Plots as heat map of average values

def plot_mean_values(df_input, x1_col, x2_col, y_col, 
                     x1_label, x2_label, y_label,
                     number_bins=20):
    
    df = df_input.copy(deep=True)
    mean_values = pd.DataFrame()
    
    # Get data
    x1 = df[x1_col].values
    x2 = df[x2_col].values
    y = df[y_col].values
    
    # Bin data (assume original data is 0-1)
    bins = np.linspace(0, 1, number_bins)
    x1_binned = np.digitize(x1, bins)
    x2_binned = np.digitize(x2, bins)
    
    # Allocate group based on bins
    df['group'] = list(zip(x1_binned, x2_binned))
    mean_values = df[['group',y_col]].groupby(by='group').mean()
    mean_values['group'] = mean_values.index
    mean_values['x1_bin_index'] = mean_values['group'].apply(lambda x: x[0]) - 1
    mean_values['x2_bin_index'] = mean_values['group'].apply(lambda x: x[1])
    mean_values['x1_bin'] = mean_values['x1_bin_index'].apply(lambda x: bins[x])
    mean_values['x2_bin'] = mean_values['x2_bin_index'].apply(lambda x: bins[x])
    
    # Put in array
    z = np.empty((number_bins,number_bins))
    for index, value in mean_values.iterrows():
        x1_ind = value['x1_bin_index']
        x2_ind = value['x2_bin_index']
        z[x1_ind, x2_ind] = value[y_col]
    
    fig = plt.figure(figsize=(6,6))
    ax = fig.add_subplot(111)
    cmap = plt.cm.viridis
    im = ax.imshow(z,interpolation='nearest', cmap=cmap, origin='upper')
    ticks = np.arange(0, number_bins, 2)
    ax.invert_xaxis()
    ax.set_xticks(ticks)
    ax.set_xticklabels(np.round(1 - (ticks/number_bins),1))
    ax.set_yticks(ticks)
    ax.set_yticklabels(np.round(1 - (ticks/number_bins),1))
    ax.set_xlabel(x1_label)
    ax.set_ylabel(x2_label)
    cbar = plt.colorbar(im, ax=ax, shrink = 0.8)
    cbar.set_label(y_label, rotation=90, labelpad=10)

    name = f'./output/heat_{y_col}.jpg'
    plt.savefig(name, dpi=300)
    plt.show()
    
    return

plot_mean_values(
    all_test, 'patient_encode_x', 'patient_encode_y', 'num_hosp_thrombolysing',
    'Clinical subnet embedding output 1', 'Clinical subnet embedding output 2',
    'Number of hospitals predicted to give thrombolysis')

../_images/005b_analyse_10k_2d_modular_27_0.png

The patients most likely to be given thrombolysis are those in the top left of the scatter plot (\(x\) close to 0, and \(y\) close to 1. Predicted use of thrombolysis falls as a function of distance from that corner. Use of thrombolysis will also be effected by pathway data, but this scatter plot shows a dominant effect of clinical subnet output.

Patients appear to cluster mostly towards the edges of the plot.

In the following analysis, we will look at the characteristics of patients in different parts of the scatter plot.

Analysis by stroke type#

Plot the location of stroke type by haemorrhage (red) or infarction (blue).

bleed = all_test['S2StrokeType_Primary Intracerebral Haemorrhage'] 

# Set up figure
cmap = plt.cm.bwr
fig = plt.figure(figsize=(6,6))

# Plot 1: actual vs subnet
ax1 = fig.add_subplot(111)
im = ax1.scatter(all_test['patient_encode_x'],
           all_test['patient_encode_y'],
           c=bleed,
        vmin=0, vmax=1, s=25, cmap=cmap, alpha=0.5)
#plt.colorbar(im, ax=ax1)
ax1.set_xlabel('Clinical subnet embedding output 1')
ax1.set_ylabel('Clinical subnet embedding output 2')

custom_lines = [Line2D([0], [0], color='r', marker='o', lw=0, markerfacecolor='r', markersize=8),
                Line2D([0], [0], color='b', marker='o', lw=0, markerfacecolor='b', markersize=8)]

plt.legend(custom_lines, ['Haemorrhagic stroke', 'Non-haemorrhagic stroke'],
          loc='upper center', framealpha=1)


plt.savefig('./output/clinical_subnet_scatter_stroke_type.jpg', dpi=300)
plt.show()

../_images/005b_analyse_10k_2d_modular_30_0.png

We observe that haemorrhagic stroke patients are clustered in one corner of the chart, suggesting that clinical subnet output may be used to compare similarity between patients (driven by reasons to give or not give thrombolysis).

Summarise patients in corners of chart#

Summarise key characteristics of patients in the four corners of the chart.

paient_encode_regions = pd.DataFrame()

# Low x and low y
mask = (all_test['patient_encode_x'] < 0.2) & (all_test['patient_encode_y'] < 0.2)
paient_encode_regions['low_x_low_y'] = all_test[mask].mean()

# Low x and high y
mask = (all_test['patient_encode_x'] < 0.2) & (all_test['patient_encode_y'] > 0.8)
paient_encode_regions['low_x_high_y'] = all_test[mask].mean()

# High x and low y
mask = (all_test['patient_encode_x'] > 0.8) & (all_test['patient_encode_y'] <0.2)
paient_encode_regions['high_x_low_y'] = all_test[mask].mean()

# High x and high y
mask = (all_test['patient_encode_x'] > 0.8) & (all_test['patient_encode_y'] > 0.8)
paient_encode_regions['high_x_high_y'] = all_test[mask].mean()

paient_encode_regions

	low_x_low_y	low_x_high_y	high_x_low_y	high_x_high_y
S1AgeOnArrival	83.731767	70.882673	75.556063	71.761321
S1OnsetToArrival_min	108.585089	103.129677	109.464960	122.270973
S2RankinBeforeStroke	2.871961	0.241918	1.139549	0.493690
Loc	0.769854	0.157646	0.671542	0.007424
LocQuestions	1.520259	0.779876	0.840951	0.030438
...	...	...	...	...
hosp_encode_y	0.912189	0.909141	0.917274	0.906201
patient_encode_x	0.066940	0.048527	0.966237	0.939780
patient_encode_y	0.072550	0.937993	0.029484	0.947498
pathway_encode_x	0.541331	0.716123	0.574025	0.598690
pathway_encode_y	0.744999	0.814333	0.744484	0.593244

109 rows × 4 columns

paient_encode_regions.to_csv('./output/paient_encode_regions.csv')

Show specific rows

rows_to_show = [
    'S2RankinBeforeStroke',
    'S2NihssArrival',
    'S2StrokeType_Primary Intracerebral Haemorrhage',
    'num_hosp_thrombolysing']

paient_encode_regions.loc[rows_to_show].round(2)

	low_x_low_y	low_x_high_y	high_x_low_y	high_x_high_y
S2RankinBeforeStroke	2.87	0.24	1.14	0.49
S2NihssArrival	18.30	11.81	12.55	0.94
S2StrokeType_Primary Intracerebral Haemorrhage	0.00	0.00	0.95	0.00
num_hosp_thrombolysing	3.91	102.04	0.00	0.01

We see different types of patients in different quadrants of the clinical subnet output scatter chart:

Low x and low y: Largely non-thrombolysed patients with infarction stroke, and severe stroke and/or higher disability before stroke.
Low x and high y: Largely thrombolysed patients.
High x and low y: Largely non-thrombolysed patients with haemorrhagic stroke.
High x and high y: Largely non-thrombolysed patients with infarction stroke and very mild stroke.

Observations#

2D clinical subnet analysis allows leads to potentially useful embedding/encoding of patients whereby similar patients are clustered together.

SAMueL Stroke Audit Machine Learning 1

Investigating the output of neural net embedding subnets - with 2D subnet output

Contents