{ "cells": [ { "cell_type": "markdown", "id": "abf84a78", "metadata": {}, "source": [ "# Grouping hospitals by similarities in decision making" ] }, { "cell_type": "markdown", "id": "070f3663", "metadata": {}, "source": [ "## Aims" ] }, { "cell_type": "markdown", "id": "69ce97a3", "metadata": {}, "source": [ "* To place hospitals into groups according to their decision making such that hospitals in the same group make similar decisions. This is done based on Hamming distance - the proportion of patients with an agreed decision between two hospitals.\n", "\n", "To identify groups of similar hospitals we:\n", "\n", "- Use pre-trained hospital models\n", "- Put the unseen cohort of 10,000 patients through all hospital models\n", "- Find the hamming distance between predicted outcomes in the cohort for every pair of hospitals and store in a distance matrix $D$\n", "- Seriate $D$ to visualise hospital groups \n", "\n", "We follow this by repeating the analysis a subgroup of hospitals - those with 30-70% agreement to thrombolyse between hospitals (so we remove those patients high agreement on thrombolysis decision)." ] }, { "cell_type": "markdown", "id": "82736fb6", "metadata": {}, "source": [ "## Import libraries " ] }, { "cell_type": "code", "execution_count": 1, "id": "21fd17cb", "metadata": {}, "outputs": [], "source": [ "# Turn warnings off to keep notebook tidy\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "8f0ae9e7", "metadata": {}, "outputs": [], "source": [ "import os\n", "import pickle as pkl\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "from scipy.spatial.distance import hamming\n", "from seriate import seriate" ] }, { "cell_type": "markdown", "id": "483dc86f", "metadata": {}, "source": [ "## Load pre-trained hospital models " ] }, { "cell_type": "code", "execution_count": 3, "id": "a50daf52", "metadata": {}, "outputs": [], "source": [ "with open ('./models/trained_hospital_models_for _cohort.pkl', 'rb') as f:\n", " \n", " hospital_models = pkl.load(f)" ] }, { "cell_type": "markdown", "id": "02ffdade", "metadata": {}, "source": [ "## Load cohort \n", "\n", "The 10k test cohort was not used in training the models used." ] }, { "cell_type": "code", "execution_count": 4, "id": "bde5328f", "metadata": {}, "outputs": [], "source": [ "data_loc = '../data/10k_training_test/'\n", "cohort = pd.read_csv(data_loc + 'cohort_10000_test.csv')" ] }, { "cell_type": "code", "execution_count": 5, "id": "81186886", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | StrokeTeam | \n", "S1AgeOnArrival | \n", "S1OnsetToArrival_min | \n", "S2RankinBeforeStroke | \n", "Loc | \n", "LocQuestions | \n", "LocCommands | \n", "BestGaze | \n", "Visual | \n", "FacialPalsy | \n", "... | \n", "S2NewAFDiagnosis_Yes | \n", "S2NewAFDiagnosis_missing | \n", "S2StrokeType_Infarction | \n", "S2StrokeType_Primary Intracerebral Haemorrhage | \n", "S2StrokeType_missing | \n", "S2TIAInLastMonth_No | \n", "S2TIAInLastMonth_No but | \n", "S2TIAInLastMonth_Yes | \n", "S2TIAInLastMonth_missing | \n", "S2Thrombolysis | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "LGNPK4211W | \n", "67.5 | \n", "193.0 | \n", "1 | \n", "0 | \n", "2.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
1 | \n", "LZGVG8257A | \n", "62.5 | \n", "54.0 | \n", "2 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
2 | \n", "DNOYM6465G | \n", "82.5 | \n", "173.0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "... | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
3 | \n", "ISIZF6614O | \n", "72.5 | \n", "159.0 | \n", "1 | \n", "0 | \n", "2.0 | \n", "0.0 | \n", "0.0 | \n", "2.0 | \n", "0.0 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
4 | \n", "NGKDE7265L | \n", "87.5 | \n", "145.0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "... | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
9995 | \n", "NFBUF0424E | \n", "57.5 | \n", "99.0 | \n", "0 | \n", "1 | \n", "2.0 | \n", "2.0 | \n", "1.0 | \n", "2.0 | \n", "2.0 | \n", "... | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
9996 | \n", "UJETD9177J | \n", "87.5 | \n", "159.0 | \n", "3 | \n", "0 | \n", "2.0 | \n", "2.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
9997 | \n", "BICAW1125K | \n", "67.5 | \n", "142.0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "2.0 | \n", "0.0 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
9998 | \n", "CYVHC2532V | \n", "72.5 | \n", "101.0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "... | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
9999 | \n", "FCCJC8768V | \n", "87.5 | \n", "106.0 | \n", "2 | \n", "1 | \n", "1.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
10000 rows × 101 columns
\n", "\n", " | WJHSV5358P | \n", "TPFFP4410O | \n", "YQMZV4284N | \n", "SJVFI6669M | \n", "ISIZF6614O | \n", "OKVRY7006H | \n", "QOAPO4699N | \n", "OFKDF3720W | \n", "NTPQZ0829K | \n", "HBFCN1575G | \n", "... | \n", "PDNWI2057P | \n", "HYNBH3271L | \n", "TFSJP6914B | \n", "AKCGO9726K | \n", "LFPMM4706C | \n", "EQZZZ5658G | \n", "UIWEN7236N | \n", "sum | \n", "percent | \n", "percent_agree | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.000000 | \n", "1.000000 | \n", "
1 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "... | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "46.0 | \n", "0.348485 | \n", "0.651515 | \n", "
2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.000000 | \n", "1.000000 | \n", "
3 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "61.0 | \n", "0.462121 | \n", "0.537879 | \n", "
4 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "4.0 | \n", "0.030303 | \n", "0.969697 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
9995 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1.0 | \n", "0.007576 | \n", "0.992424 | \n", "
9996 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "6.0 | \n", "0.045455 | \n", "0.954545 | \n", "
9997 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "... | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "56.0 | \n", "0.424242 | \n", "0.575758 | \n", "
9998 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.000000 | \n", "1.000000 | \n", "
9999 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.000000 | \n", "1.000000 | \n", "
10000 rows × 135 columns
\n", "