{ "cells": [ { "cell_type": "markdown", "id": "eda9541c-e979-4793-a33a-8c4f1f6b441b", "metadata": {}, "source": [ "# Measuring the covariance/correlation between features\n", "\n", "In this notebook we measure the correlation between features." ] }, { "cell_type": "markdown", "id": "8381082a-ade7-44ab-a330-7baff65ba052", "metadata": {}, "source": [ "## Import libraries and data\n", "\n", "Data has been restricted to stroke teams with at least 300 admissions, with at least 10 patients receiving thrombolysis, over three years." ] }, { "cell_type": "code", "execution_count": 1, "id": "7e6b6213-d9ca-49bb-b712-4424f89ecf5a", "metadata": {}, "outputs": [], "source": [ "# import libraries\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "from matplotlib import cm\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "# Import data (combine all data)\n", "train = pd.read_csv('../data/10k_training_test/cohort_10000_train.csv')\n", "test = pd.read_csv('../data/10k_training_test/cohort_10000_test.csv')\n", "data = pd.concat([train, test], axis=0)\n", "data.drop('StrokeTeam', axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": 2, "id": "312df9f0-45c6-4728-8dfa-e7566e4a6733", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
S1AgeOnArrivalS1OnsetToArrival_minS2RankinBeforeStrokeLocLocQuestionsLocCommandsBestGazeVisualFacialPalsyMotorArmLeft...S2NewAFDiagnosis_YesS2NewAFDiagnosis_missingS2StrokeType_InfarctionS2StrokeType_Primary Intracerebral HaemorrhageS2StrokeType_missingS2TIAInLastMonth_NoS2TIAInLastMonth_No butS2TIAInLastMonth_YesS2TIAInLastMonth_missingS2Thrombolysis
072.549.0100.00.00.00.03.04.0...0101000010
177.596.0001.00.00.00.00.00.0...0010000010
277.577.0002.01.01.02.01.00.0...0110000011
382.5142.0000.00.00.00.01.00.0...0110000011
487.5170.0000.00.01.01.02.04.0...0010000011
..................................................................
999557.599.0012.02.01.02.02.00.0...0101000010
999687.5159.0302.02.00.00.00.00.0...0110000011
999767.5142.0000.00.00.02.00.00.0...0010000010
999872.5101.0000.00.00.00.01.00.0...0110000010
999987.5106.0211.01.00.00.01.00.0...0001000010
\n", "

88928 rows × 100 columns

\n", "
" ], "text/plain": [ " S1AgeOnArrival S1OnsetToArrival_min S2RankinBeforeStroke Loc \\\n", "0 72.5 49.0 1 0 \n", "1 77.5 96.0 0 0 \n", "2 77.5 77.0 0 0 \n", "3 82.5 142.0 0 0 \n", "4 87.5 170.0 0 0 \n", "... ... ... ... ... \n", "9995 57.5 99.0 0 1 \n", "9996 87.5 159.0 3 0 \n", "9997 67.5 142.0 0 0 \n", "9998 72.5 101.0 0 0 \n", "9999 87.5 106.0 2 1 \n", "\n", " LocQuestions LocCommands BestGaze Visual FacialPalsy MotorArmLeft \\\n", "0 0.0 0.0 0.0 0.0 3.0 4.0 \n", "1 1.0 0.0 0.0 0.0 0.0 0.0 \n", "2 2.0 1.0 1.0 2.0 1.0 0.0 \n", "3 0.0 0.0 0.0 0.0 1.0 0.0 \n", "4 0.0 0.0 1.0 1.0 2.0 4.0 \n", "... ... ... ... ... ... ... \n", "9995 2.0 2.0 1.0 2.0 2.0 0.0 \n", "9996 2.0 2.0 0.0 0.0 0.0 0.0 \n", "9997 0.0 0.0 0.0 2.0 0.0 0.0 \n", "9998 0.0 0.0 0.0 0.0 1.0 0.0 \n", "9999 1.0 1.0 0.0 0.0 1.0 0.0 \n", "\n", " ... S2NewAFDiagnosis_Yes S2NewAFDiagnosis_missing \\\n", "0 ... 0 1 \n", "1 ... 0 0 \n", "2 ... 0 1 \n", "3 ... 0 1 \n", "4 ... 0 0 \n", "... ... ... ... \n", "9995 ... 0 1 \n", "9996 ... 0 1 \n", "9997 ... 0 0 \n", "9998 ... 0 1 \n", "9999 ... 0 0 \n", "\n", " S2StrokeType_Infarction S2StrokeType_Primary Intracerebral Haemorrhage \\\n", "0 0 1 \n", "1 1 0 \n", "2 1 0 \n", "3 1 0 \n", "4 1 0 \n", "... ... ... \n", "9995 0 1 \n", "9996 1 0 \n", "9997 1 0 \n", "9998 1 0 \n", "9999 0 1 \n", "\n", " S2StrokeType_missing S2TIAInLastMonth_No S2TIAInLastMonth_No but \\\n", "0 0 0 0 \n", "1 0 0 0 \n", "2 0 0 0 \n", "3 0 0 0 \n", "4 0 0 0 \n", "... ... ... ... \n", "9995 0 0 0 \n", "9996 0 0 0 \n", "9997 0 0 0 \n", "9998 0 0 0 \n", "9999 0 0 0 \n", "\n", " S2TIAInLastMonth_Yes S2TIAInLastMonth_missing S2Thrombolysis \n", "0 0 1 0 \n", "1 0 1 0 \n", "2 0 1 1 \n", "3 0 1 1 \n", "4 0 1 1 \n", "... ... ... ... \n", "9995 0 1 0 \n", "9996 0 1 1 \n", "9997 0 1 0 \n", "9998 0 1 0 \n", "9999 0 1 0 \n", "\n", "[88928 rows x 100 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "id": "31474768-f1c8-4a52-9b42-8085c3f6a971", "metadata": {}, "source": [ "## Scale data\n", "\n", "After scaling data, the reported covariance will be the correlation between data features." ] }, { "cell_type": "code", "execution_count": 3, "id": "47686c35-d198-4c8a-afa6-08b7b5b42e30", "metadata": {}, "outputs": [], "source": [ "sc=StandardScaler() \n", "sc.fit(data)\n", "data_std=sc.transform(data)\n", "data_std = pd.DataFrame(data_std, columns =list(data))" ] }, { "cell_type": "markdown", "id": "3994def8-5c5f-4fbb-8496-0a99dfba407f", "metadata": {}, "source": [ "## Get covariance of scaled data (correlation)" ] }, { "cell_type": "code", "execution_count": 4, "id": "ad566b96-86fb-4ab9-b67f-509156301d8c", "metadata": {}, "outputs": [], "source": [ "# Get covariance\n", "cov = data_std.cov()\n", "\n", "# Convert from wide to tall\n", "cov = cov.melt(ignore_index=False)\n", "\n", "# Remove self-correlation\n", "mask = cov.index != cov['variable']\n", "cov = cov[mask]\n", "\n", "# Add absolute value\n", "cov['abs_value'] = np.abs(cov['value'])\n", "\n", "# Add R-squared\n", "cov['r-squared'] = cov['value'] ** 2\n", "\n", "# Sort by absolute covariance\n", "cov.sort_values('abs_value', inplace=True, ascending=False)\n", "\n", "# Round to four decimal places\n", "cov = cov.round(4)\n", "\n", "# Label rows where one of the feature pairs tags data as 'missing'\n", "result = []\n", "for index, values in cov.iterrows():\n", " if index[-7:] == 'missing' or values['variable'][-7:] == 'missing':\n", " result.append(True)\n", " else:\n", " result.append(False)\n", "cov['missing'] = result\n", "\n", "# Remove duplicate pairs of features\n", "result = []\n", "for index, values in cov.iterrows():\n", " combination = [index, values['variable']]\n", " combination.sort()\n", " string = combination[0] + \"-\" + combination[1]\n", " result.append(string)\n", "cov['pair'] = result\n", "cov.sort_values('pair', inplace=True)\n", "cov.drop_duplicates(subset=['pair'], inplace=True)\n", "\n", "# Sort by r-squared\n", "cov.sort_values('r-squared', ascending=False, inplace=True)" ] }, { "cell_type": "code", "execution_count": 5, "id": "6b551674-bcca-47d7-8445-349a2e211660", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablevalueabs_valuer-squaredmissingpair
AFAnticoagulentHeparin_missingAFAnticoagulentDOAC_missing1.00001.00001.0TrueAFAnticoagulentDOAC_missing-AFAnticoagulentHep...
Hypertension_YesHypertension_No-1.00001.00001.0FalseHypertension_No-Hypertension_Yes
AFAnticoagulentHeparin_missingAFAnticoagulentVitK_missing1.00001.00001.0TrueAFAnticoagulentHeparin_missing-AFAnticoagulent...
AFAnticoagulentDOAC_missingAFAnticoagulentVitK_missing1.00001.00001.0TrueAFAnticoagulentDOAC_missing-AFAnticoagulentVit...
S1ArriveByAmbulance_YesS1ArriveByAmbulance_No-1.00001.00001.0FalseS1ArriveByAmbulance_No-S1ArriveByAmbulance_Yes
.....................
Hypertension_NoS1OnsetInHospital_Yes0.00000.00000.0FalseHypertension_No-S1OnsetInHospital_Yes
S1OnsetTimeType_Not knownHypertension_No0.00000.00000.0FalseHypertension_No-S1OnsetTimeType_Not known
S2BrainImagingTime_minHypertension_No0.00660.00660.0FalseHypertension_No-S2BrainImagingTime_min
Hypertension_NoS2NihssArrival-0.00090.00090.0FalseHypertension_No-S2NihssArrival
StrokeTIA_YesVisual-0.00560.00560.0FalseStrokeTIA_Yes-Visual
\n", "

4950 rows × 6 columns

\n", "
" ], "text/plain": [ " variable value \\\n", "AFAnticoagulentHeparin_missing AFAnticoagulentDOAC_missing 1.0000 \n", "Hypertension_Yes Hypertension_No -1.0000 \n", "AFAnticoagulentHeparin_missing AFAnticoagulentVitK_missing 1.0000 \n", "AFAnticoagulentDOAC_missing AFAnticoagulentVitK_missing 1.0000 \n", "S1ArriveByAmbulance_Yes S1ArriveByAmbulance_No -1.0000 \n", "... ... ... \n", "Hypertension_No S1OnsetInHospital_Yes 0.0000 \n", "S1OnsetTimeType_Not known Hypertension_No 0.0000 \n", "S2BrainImagingTime_min Hypertension_No 0.0066 \n", "Hypertension_No S2NihssArrival -0.0009 \n", "StrokeTIA_Yes Visual -0.0056 \n", "\n", " abs_value r-squared missing \\\n", "AFAnticoagulentHeparin_missing 1.0000 1.0 True \n", "Hypertension_Yes 1.0000 1.0 False \n", "AFAnticoagulentHeparin_missing 1.0000 1.0 True \n", "AFAnticoagulentDOAC_missing 1.0000 1.0 True \n", "S1ArriveByAmbulance_Yes 1.0000 1.0 False \n", "... ... ... ... \n", "Hypertension_No 0.0000 0.0 False \n", "S1OnsetTimeType_Not known 0.0000 0.0 False \n", "S2BrainImagingTime_min 0.0066 0.0 False \n", "Hypertension_No 0.0009 0.0 False \n", "StrokeTIA_Yes 0.0056 0.0 False \n", "\n", " pair \n", "AFAnticoagulentHeparin_missing AFAnticoagulentDOAC_missing-AFAnticoagulentHep... \n", "Hypertension_Yes Hypertension_No-Hypertension_Yes \n", "AFAnticoagulentHeparin_missing AFAnticoagulentHeparin_missing-AFAnticoagulent... \n", "AFAnticoagulentDOAC_missing AFAnticoagulentDOAC_missing-AFAnticoagulentVit... \n", "S1ArriveByAmbulance_Yes S1ArriveByAmbulance_No-S1ArriveByAmbulance_Yes \n", "... ... \n", "Hypertension_No Hypertension_No-S1OnsetInHospital_Yes \n", "S1OnsetTimeType_Not known Hypertension_No-S1OnsetTimeType_Not known \n", "S2BrainImagingTime_min Hypertension_No-S2BrainImagingTime_min \n", "Hypertension_No Hypertension_No-S2NihssArrival \n", "StrokeTIA_Yes StrokeTIA_Yes-Visual \n", "\n", "[4950 rows x 6 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cov" ] }, { "cell_type": "code", "execution_count": 6, "id": "d66945d6-e9c2-40a1-ad69-a74f96059b54", "metadata": {}, "outputs": [], "source": [ "# Save results\n", "cov.to_csv('./output/feature_correlation.csv')" ] }, { "cell_type": "markdown", "id": "efe716ea-3df2-4bf4-9d74-f77ae991254e", "metadata": {}, "source": [ "## Show histogram and counts of correlations" ] }, { "cell_type": "code", "execution_count": 7, "id": "affcd3ef-8d65-4272-ad56-f99c4751f950", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAE9CAYAAAD6c07jAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVG0lEQVR4nO3df7RlZX3f8feHQflRQ4AyENYMOGgnKlBBGCmt2igkZdQqmoaU1ApxkUxraGKadlWwVpN2sRZdXW0stWDQKqCtBKOVCUq7kBStBsRBfv8KEyEwhSUjaYQghYDf/nGesafDnXn2wN3n3st9v9Y66+z97B/n+3CH87n7x312qgpJknZmt4UuQJK0+BkWkqQuw0KS1GVYSJK6DAtJUpdhIUnq2n2hCxjLAQccUGvWrFnoMiRpSbnhhhu+V1Urt29/wYbFmjVr2LRp00KXIUlLSpI/mavd01CSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6XrBjQz0fa8760o+m7zv3rQtYiSQtDh5ZSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoaPSySrEhyY5Ir2vz+Sa5Kck97329q3bOTbE5yd5KTptqPTXJrW3ZekoxdtyTp/5nFkcX7gDun5s8Crq6qtcDVbZ4khwOnAkcA64Hzk6xo21wAbADWttf6GdQtSWpGDYskq4G3Ap+Yaj4ZuLhNXwy8Y6r90qp6sqruBTYDxyU5GNinqq6tqgIumdpGkjQDYx9ZfAT4Z8APp9oOqqqHANr7ga19FfDA1HpbWtuqNr19uyRpRkYLiyR/G3i4qm4YuskcbbWT9rk+c0OSTUk2bd26deDHSpJ6xjyyeB3w9iT3AZcCJyT5DPDddmqJ9v5wW38LcMjU9quBB1v76jnan6WqLqyqdVW1buXKlfPZF0la1kYLi6o6u6pWV9UaJheu/6Cq/j6wETi9rXY6cHmb3gicmmSPJIcxuZB9fTtV9ViS49tdUKdNbSNJmoHdF+AzzwUuS3IGcD9wCkBV3Z7kMuAO4GngzKp6pm3zXuAiYC/gyvaSJM3ITMKiqq4BrmnTjwAn7mC9c4Bz5mjfBBw5XoWSpJ3xL7glSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpa7SwSLJnkuuT3Jzk9iS/1dr3T3JVknva+35T25ydZHOSu5OcNNV+bJJb27LzkmSsuiVJzzbmkcWTwAlVdRRwNLA+yfHAWcDVVbUWuLrNk+Rw4FTgCGA9cH6SFW1fFwAbgLXttX7EuiVJ2xktLGriz9vsi9qrgJOBi1v7xcA72vTJwKVV9WRV3QtsBo5LcjCwT1VdW1UFXDK1jSRpBka9ZpFkRZKbgIeBq6rqm8BBVfUQQHs/sK2+CnhgavMtrW1Vm96+XZI0I6OGRVU9U1VHA6uZHCUcuZPV57oOUTtpf/YOkg1JNiXZtHXr1l2uV5I0t5ncDVVVfwZcw+Raw3fbqSXa+8NttS3AIVObrQYebO2r52if63MurKp1VbVu5cqV89kFSVrWxrwbamWSfdv0XsBPA3cBG4HT22qnA5e36Y3AqUn2SHIYkwvZ17dTVY8lOb7dBXXa1DaSpBnYfcR9Hwxc3O5o2g24rKquSHItcFmSM4D7gVMAqur2JJcBdwBPA2dW1TNtX+8FLgL2Aq5sL0nSjIwWFlV1C/CaOdofAU7cwTbnAOfM0b4J2Nn1DknSiPwLbklSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2DwqLzHApJ0gvc0COLjyW5PsmvbBt2XJK0fAwKi6p6PfAuJg8n2pTkvyT5mVErkyQtGoOvWVTVPcAHgfcDPwWcl+SuJD87VnGSpMVh6DWLVyf5beBO4ATgbVX1qjb92yPWJ0laBIY+/OijwMeBD1TVE9saq+rBJB8cpTJJ0qIxNCzeAjyx7TGnSXYD9qyqH1TVp0erTpK0KAy9ZvEVJs+/3mbv1iZJWgaGhsWeVfXn22ba9N7jlCRJWmyGhsXjSY7ZNpPkWOCJnawvSXoBGXrN4teBzyV5sM0fDPzdUSqSJC06g8Kiqr6V5JXAK4AAd1XVX4xamSRp0Rh6ZAHwWmBN2+Y1SaiqS0apSpK0qAwKiySfBl4O3AQ805oLMCwkaRkYemSxDji8qmrMYiRJi9PQu6FuA35izEIkSYvX0COLA4A7klwPPLmtsarePkpVkqRFZWhY/OaYRUiSFreht85+NclLgbVV9ZUkewMrxi1NkrRYDB2i/JeB3wN+pzWtAr44Uk2SpEVm6AXuM4HXAY/Cjx6EdOBYRUmSFpehYfFkVT21bSbJ7kz+zkKStAwMDYuvJvkAsFd79vbngN8fryxJ0mIyNCzOArYCtwL/APgyk+dxS5KWgaF3Q/2QyWNVPz5uOZKkxWjo2FD3Msc1iqp62bxXJEladHZlbKht9gROAfaf/3IkSYvRoGsWVfXI1Ot/VdVHgBPGLU2StFgMPQ11zNTsbkyONH5slIokSYvO0NNQ/3Zq+mngPuDn570aSdKiNPRuqDeNXYgkafEaehrqN3a2vKr+3fyUI0lajHblbqjXAhvb/NuArwEPjFGUJGlx2ZWHHx1TVY8BJPlN4HNV9UtjFSZJWjyGDvdxKPDU1PxTwJp5r0aStCgNPbL4NHB9kv/K5C+53wlcMlpVkqRFZejdUOckuRJ4Q2t6T1XdOF5ZkqTFZOhpKIC9gUer6t8DW5IcNlJNkqRFZuhjVT8MvB84uzW9CPjMWEVJkhaXoUcW7wTeDjwOUFUP0hnuI8khSf5HkjuT3J7kfa19/yRXJbmnve83tc3ZSTYnuTvJSVPtxya5tS07L0l2taOSpOduaFg8VVVFG6Y8yV8asM3TwD+pqlcBxwNnJjmcyYOUrq6qtcDVbZ627FTgCGA9cH6SFW1fFwAbgLXttX5g3ZKkeTA0LC5L8jvAvkl+GfgKnQchVdVDVfXtNv0YcCewCjgZuLitdjHwjjZ9MnBpVT1ZVfcCm4HjkhwM7FNV17bAumRqG0nSDHTvhmqnfH4XeCXwKPAK4ENVddXQD0myBngN8E3goKp6CCaBkuTAttoq4Lqpzba0tr9o09u3z/U5G5gcgXDooYcOLU+S1NENi6qqJF+sqmOBwQGxTZKXAJ8Hfr2qHt3J5Ya5FtRO2ueq9ULgQoB169bNuY4kadcNPQ11XZLX7urOk7yISVD856r6Qmv+bju1RHt/uLVvAQ6Z2nw18GBrXz1HuyRpRoaGxZuYBMYfJ7ml3Zl0y842aKev/hNw53aj0m4ETm/TpwOXT7WfmmSP9jcca4Hr2ymrx5Ic3/Z52tQ2kqQZ2OlpqCSHVtX9wJufw75fB7wbuDXJTa3tA8C5TC6YnwHcz+R53lTV7UkuA+5gcifVmVX1TNvuvcBFwF7Ale0lSZqR3jWLLzIZbfZPkny+qv7O0B1X1deZ+3oDwIk72OYc4Jw52jcBRw79bEnS/Oqdhpr+sn/ZmIVIkhavXljUDqYlSctI7zTUUUkeZXKEsVebps1XVe0zanWSpEVhp2FRVSt2tlyStDzsyhDlkqRlyrCQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6RguLJJ9M8nCS26ba9k9yVZJ72vt+U8vOTrI5yd1JTppqPzbJrW3ZeUkyVs2SpLmNeWRxEbB+u7azgKurai1wdZsnyeHAqcARbZvzk6xo21wAbADWttf2+5QkjWy0sKiqrwF/ul3zycDFbfpi4B1T7ZdW1ZNVdS+wGTguycHAPlV1bVUVcMnUNpKkGZn1NYuDquohgPZ+YGtfBTwwtd6W1raqTW/fLkmaocVygXuu6xC1k/a5d5JsSLIpyaatW7fOW3GStNzNOiy+204t0d4fbu1bgEOm1lsNPNjaV8/RPqequrCq1lXVupUrV85r4ZK0nM06LDYCp7fp04HLp9pPTbJHksOYXMi+vp2qeizJ8e0uqNOmtpEkzcjuY+04yWeBNwIHJNkCfBg4F7gsyRnA/cApAFV1e5LLgDuAp4Ezq+qZtqv3Mrmzai/gyvaSJM3QaGFRVb+wg0Un7mD9c4Bz5mjfBBw5j6VJknbRYrnALUlaxAwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHXtvtAFLHZrzvrSj6bvO/etC1iJJC0cjywkSV2GhSSpy9NQu8BTUpKWqyVzZJFkfZK7k2xOctZC1yNJy8mSOLJIsgL4j8DPAFuAbyXZWFV3LFRNHmVIWk6WRFgAxwGbq+o7AEkuBU4GFiwsphkckl7olkpYrAIemJrfAvy1Baplp6aDYwzPN4zGCLYXclgulb4tlTo1jln8/FNVo+x4PiU5BTipqn6pzb8bOK6qfnW79TYAG9rsK4C7n+NHHgB87zluu1TZ5+VhufV5ufUXnn+fX1pVK7dvXCpHFluAQ6bmVwMPbr9SVV0IXPh8PyzJpqpa93z3s5TY5+VhufV5ufUXxuvzUrkb6lvA2iSHJXkxcCqwcYFrkqRlY0kcWVTV00n+EfDfgRXAJ6vq9gUuS5KWjSURFgBV9WXgyzP6uOd9KmsJss/Lw3Lr83LrL4zU5yVxgVuStLCWyjULSdICWtZh0RtCJBPnteW3JDlmIeqcLwP6+67Wz1uS/GGSoxaizvk0dJiYJK9N8kySn5tlfWMY0uckb0xyU5Lbk3x11jXOtwH/tn88ye8nubn1+T0LUed8SfLJJA8nuW0Hy+f/u6uqluWLyYXyPwZeBrwYuBk4fLt13gJcCQQ4HvjmQtc9cn//BrBfm37zUu7v0D5PrfcHTK6J/dxC1z2Dn/O+TEY/OLTNH7jQdc+gzx8A/nWbXgn8KfDiha79efT5bwLHALftYPm8f3ct5yOLHw0hUlVPAduGEJl2MnBJTVwH7Jvk4FkXOk+6/a2qP6yq/91mr2Py9yxL2ZCfMcCvAp8HHp5lcSMZ0ue/B3yhqu4HqKql3u8hfS7gx5IEeAmTsHh6tmXOn6r6GpM+7Mi8f3ct57CYawiRVc9hnaViV/tyBpPfTJaybp+TrALeCXxshnWNacjP+SeB/ZJck+SGJKfNrLpxDOnzR4FXMflj3luB91XVD2dT3oKY9++uJXPr7AgyR9v2t4YNWWepGNyXJG9iEhavH7Wi8Q3p80eA91fVM5NfOpe8IX3eHTgWOBHYC7g2yXVV9UdjFzeSIX0+CbgJOAF4OXBVkv9ZVY+OXNtCmffvruUcFkOGEBk0zMgSMagvSV4NfAJ4c1U9MqPaxjKkz+uAS1tQHAC8JcnTVfXFmVQ4/4b+u/5eVT0OPJ7ka8BRwFINiyF9fg9wbk1O6G9Oci/wSuD62ZQ4c/P+3bWcT0MNGUJkI3Bau7PgeOD7VfXQrAudJ93+JjkU+ALw7iX8W+a0bp+r6rCqWlNVa4DfA35lCQcFDPt3fTnwhiS7J9mbyQjOd864zvk0pM/3MzmSIslBTAYa/c5Mq5ytef/uWrZHFrWDIUSS/MO2/GNM7o55C7AZ+AGT306WpIH9/RDwl4Hz22/aT9cSHoRtYJ9fUIb0uaruTPLfgFuAHwKfqKo5b8FcCgb+nP8VcFGSW5mconl/VS3Z0WiTfBZ4I3BAki3Ah4EXwXjfXf4FtySpazmfhpIkDWRYSJK6DAtJUpdhIUnqMiwkSV2GhbSdNvrsTUluayOV7rvQNQ3VRpO9YqHr0AuPYSE92xNVdXRVHclksLYzF7qgJCsWugYtb4aFtHPXMscAbElWJLmoHX3cmuQft/Zj2zMTrk3yb7Y9byDJLyb56NT2VyR5Y5u+IMmm9pyF35pa574kH0rydeCUJH+r7ffbST6X5CVtvfVJ7mrr/eyI/y20jBkW0g603+ZP5NlDRwAcDayqqiOr6q8Cn2rtnwJ+rar++i581D9vfyn/auCn2vhc2/yfqno98BXgg8BPV9UxwCbgN5LsCXwceBvwBuAnduFzpcEMC+nZ9kpyE/AIsD9w1RzrfAd4WZL/kGQ98GiSHwf2raptT5779MDP+/kk3wZuBI4ADp9a9rvt/fjW/o1W2+nAS5kMhndvVd3TBsn7zMDPlHaJYSE92xNVdTSTL+MXA2e20043tde/bA+JOgq4hsk1jU8wGXNoR+PnPM3////bngBJDgP+KXBiVb0a+NK2Zc3j7T3AVe1aytFVdXhVndGWOWaPRmdYSDtQVd8Hfo3Jl/luU1/UH0pyQGv7PPAvgGOq6s+A7yfZ9hyQd03t7j7g6CS7JTmEydPdAPZhEgjfb6OhvnkH5VwHvC7JXwFIsneSnwTuAg5L8vK23i88/55Lz7ZsR52VhqiqG5PczGTY6+nTSquATyXZ9gvX2e39PcAnk/yAySio23wDuJfJU9puA77d9n9zkhuB25mc2vrGDurYmuQXgc8m2aM1f7Cq/ijJBuBLSb4HfB048vn0WZqLo85KI0myBrii3YIrLWmehpIkdXlkIUnq8shCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqev/AhcU7j3x8GYXAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Histogram of covariance/correlation\n", "fig = plt.figure(figsize=(6,5))\n", "ax = fig.add_subplot(111)\n", "bins = np.arange(0, 1.01, 0.01)\n", "ax.hist(cov['r-squared'], bins=bins, rwidth=1)\n", "ax.set_xlabel('R-squared') \n", "ax.set_ylabel('Frequency')\n", "plt.savefig('output/covariance.jpg', dpi=300)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "9ac9ed29-ef14-4bd6-b152-4ea65cecda33", "metadata": {}, "source": [ "Show proportion of feature correlations (R-sqaured)in key bins" ] }, { "cell_type": "code", "execution_count": 8, "id": "49361a0b-7ad8-4b21-8760-be72d09d8939", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ProportionCumulative Proportion
R-squared
<0.100.9608080.960808
0.1 to 0.250.0155560.976364
0.25 to 0.500.0103030.986667
0.50 to 0.750.0066670.993333
0.75 to 0.9990.0032320.996566
10.0034341.000000
\n", "
" ], "text/plain": [ " Proportion Cumulative Proportion\n", "R-squared \n", "<0.10 0.960808 0.960808\n", "0.1 to 0.25 0.015556 0.976364\n", "0.25 to 0.50 0.010303 0.986667\n", "0.50 to 0.75 0.006667 0.993333\n", "0.75 to 0.999 0.003232 0.996566\n", "1 0.003434 1.000000" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bins = [0, 0.10, 0.25, 0.5, 0.75, 0.99, 1.1]\n", "counts = np.histogram(cov['r-squared'], bins=bins)[0]\n", "counts = counts / counts.sum()\n", "\n", "labels = ['<0.10', '0.1 to 0.25', '0.25 to 0.50', '0.50 to 0.75', '0.75 to 0.999', '1']\n", "counts_df = pd.DataFrame(index=labels)\n", "counts_df['Proportion'] = counts\n", "counts_df['Cumulative Proportion'] = counts.cumsum()\n", "counts_df.index.name = 'R-squared'\n", "counts_df" ] }, { "cell_type": "markdown", "id": "39acc7a6-9aad-4ab8-ae06-b1cf23b7199b", "metadata": {}, "source": [ "## Show highly correlated features\n", "\n", "### Perfectly correlated features" ] }, { "cell_type": "code", "execution_count": 9, "id": "2bf253e8-923c-4539-bd58-b432847b0bae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablevalueabs_valuer-squaredmissingpair
AFAnticoagulentHeparin_missingAFAnticoagulentDOAC_missing1.01.01.0TrueAFAnticoagulentDOAC_missing-AFAnticoagulentHep...
Hypertension_YesHypertension_No-1.01.01.0FalseHypertension_No-Hypertension_Yes
AFAnticoagulentHeparin_missingAFAnticoagulentVitK_missing1.01.01.0TrueAFAnticoagulentHeparin_missing-AFAnticoagulent...
AFAnticoagulentDOAC_missingAFAnticoagulentVitK_missing1.01.01.0TrueAFAnticoagulentDOAC_missing-AFAnticoagulentVit...
S1ArriveByAmbulance_YesS1ArriveByAmbulance_No-1.01.01.0FalseS1ArriveByAmbulance_No-S1ArriveByAmbulance_Yes
MoreEqual80y_NoMoreEqual80y_Yes-1.01.01.0FalseMoreEqual80y_No-MoreEqual80y_Yes
Diabetes_YesDiabetes_No-1.01.01.0FalseDiabetes_No-Diabetes_Yes
S1Gender_MaleS1Gender_Female-1.01.01.0FalseS1Gender_Female-S1Gender_Male
CongestiveHeartFailure_YesCongestiveHeartFailure_No-1.01.01.0FalseCongestiveHeartFailure_No-CongestiveHeartFailu...
AtrialFibrillation_YesAtrialFibrillation_No-1.01.01.0FalseAtrialFibrillation_No-AtrialFibrillation_Yes
StrokeTIA_NoStrokeTIA_Yes-1.01.01.0FalseStrokeTIA_No-StrokeTIA_Yes
AtrialFibrillation_YesAFAntiplatelet_missing-1.01.01.0TrueAFAntiplatelet_missing-AtrialFibrillation_Yes
AFAntiplatelet_missingAtrialFibrillation_No1.01.01.0TrueAFAntiplatelet_missing-AtrialFibrillation_No
S1OnsetTimeType_PreciseS1OnsetTimeType_Best estimate-1.01.01.0FalseS1OnsetTimeType_Best estimate-S1OnsetTimeType_...
\n", "
" ], "text/plain": [ " variable value \\\n", "AFAnticoagulentHeparin_missing AFAnticoagulentDOAC_missing 1.0 \n", "Hypertension_Yes Hypertension_No -1.0 \n", "AFAnticoagulentHeparin_missing AFAnticoagulentVitK_missing 1.0 \n", "AFAnticoagulentDOAC_missing AFAnticoagulentVitK_missing 1.0 \n", "S1ArriveByAmbulance_Yes S1ArriveByAmbulance_No -1.0 \n", "MoreEqual80y_No MoreEqual80y_Yes -1.0 \n", "Diabetes_Yes Diabetes_No -1.0 \n", "S1Gender_Male S1Gender_Female -1.0 \n", "CongestiveHeartFailure_Yes CongestiveHeartFailure_No -1.0 \n", "AtrialFibrillation_Yes AtrialFibrillation_No -1.0 \n", "StrokeTIA_No StrokeTIA_Yes -1.0 \n", "AtrialFibrillation_Yes AFAntiplatelet_missing -1.0 \n", "AFAntiplatelet_missing AtrialFibrillation_No 1.0 \n", "S1OnsetTimeType_Precise S1OnsetTimeType_Best estimate -1.0 \n", "\n", " abs_value r-squared missing \\\n", "AFAnticoagulentHeparin_missing 1.0 1.0 True \n", "Hypertension_Yes 1.0 1.0 False \n", "AFAnticoagulentHeparin_missing 1.0 1.0 True \n", "AFAnticoagulentDOAC_missing 1.0 1.0 True \n", "S1ArriveByAmbulance_Yes 1.0 1.0 False \n", "MoreEqual80y_No 1.0 1.0 False \n", "Diabetes_Yes 1.0 1.0 False \n", "S1Gender_Male 1.0 1.0 False \n", "CongestiveHeartFailure_Yes 1.0 1.0 False \n", "AtrialFibrillation_Yes 1.0 1.0 False \n", "StrokeTIA_No 1.0 1.0 False \n", "AtrialFibrillation_Yes 1.0 1.0 True \n", "AFAntiplatelet_missing 1.0 1.0 True \n", "S1OnsetTimeType_Precise 1.0 1.0 False \n", "\n", " pair \n", "AFAnticoagulentHeparin_missing AFAnticoagulentDOAC_missing-AFAnticoagulentHep... \n", "Hypertension_Yes Hypertension_No-Hypertension_Yes \n", "AFAnticoagulentHeparin_missing AFAnticoagulentHeparin_missing-AFAnticoagulent... \n", "AFAnticoagulentDOAC_missing AFAnticoagulentDOAC_missing-AFAnticoagulentVit... \n", "S1ArriveByAmbulance_Yes S1ArriveByAmbulance_No-S1ArriveByAmbulance_Yes \n", "MoreEqual80y_No MoreEqual80y_No-MoreEqual80y_Yes \n", "Diabetes_Yes Diabetes_No-Diabetes_Yes \n", "S1Gender_Male S1Gender_Female-S1Gender_Male \n", "CongestiveHeartFailure_Yes CongestiveHeartFailure_No-CongestiveHeartFailu... \n", "AtrialFibrillation_Yes AtrialFibrillation_No-AtrialFibrillation_Yes \n", "StrokeTIA_No StrokeTIA_No-StrokeTIA_Yes \n", "AtrialFibrillation_Yes AFAntiplatelet_missing-AtrialFibrillation_Yes \n", "AFAntiplatelet_missing AFAntiplatelet_missing-AtrialFibrillation_No \n", "S1OnsetTimeType_Precise S1OnsetTimeType_Best estimate-S1OnsetTimeType_... " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get perfectly correlated data (covariance > 0.999)\n", "mask = cov['r-squared'] > 0.999\n", "cov[mask]" ] }, { "cell_type": "markdown", "id": "e0257a7a-5f69-4d08-9efd-a7f432245003", "metadata": {}, "source": [ "### Highly correlated features\n", "\n", "R-squared between 0.5 and 0.999" ] }, { "cell_type": "code", "execution_count": 10, "id": "2343d071-7c29-4172-88ee-07faff474a75", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablevalueabs_valuer-squaredmissingpair
AFAnticoagulentHeparin_NoAFAnticoagulentHeparin_missing-0.99840.99840.9969TrueAFAnticoagulentHeparin_No-AFAnticoagulentHepar...
AFAnticoagulentDOAC_missingAFAnticoagulentHeparin_No-0.99840.99840.9969TrueAFAnticoagulentDOAC_missing-AFAnticoagulentHep...
AFAnticoagulentVitK_missingAFAnticoagulentHeparin_No-0.99840.99840.9969TrueAFAnticoagulentHeparin_No-AFAnticoagulentVitK_...
S2StrokeType_InfarctionS2StrokeType_Primary Intracerebral Haemorrhage-0.99400.99400.9881FalseS2StrokeType_Infarction-S2StrokeType_Primary I...
AFAnticoagulentVitK_missingAFAnticoagulentVitK_No-0.95900.95900.9198TrueAFAnticoagulentVitK_No-AFAnticoagulentVitK_mis...
AFAnticoagulentVitK_NoAFAnticoagulentHeparin_missing-0.95900.95900.9198TrueAFAnticoagulentHeparin_missing-AFAnticoagulent...
AFAnticoagulentVitK_NoAFAnticoagulentDOAC_missing-0.95900.95900.9198TrueAFAnticoagulentDOAC_missing-AFAnticoagulentVit...
AFAnticoagulentHeparin_NoAFAnticoagulentVitK_No0.95750.95750.9168FalseAFAnticoagulentHeparin_No-AFAnticoagulentVitK_No
S2NewAFDiagnosis_NoS2NewAFDiagnosis_missing-0.94230.94230.8880TrueS2NewAFDiagnosis_No-S2NewAFDiagnosis_missing
AFAnticoagulentDOAC_NoAFAnticoagulentDOAC_missing-0.93390.93390.8721TrueAFAnticoagulentDOAC_No-AFAnticoagulentDOAC_mis...
AFAnticoagulentDOAC_NoAFAnticoagulentVitK_missing-0.93390.93390.8721TrueAFAnticoagulentDOAC_No-AFAnticoagulentVitK_mis...
AFAnticoagulentDOAC_NoAFAnticoagulentHeparin_missing-0.93390.93390.8721TrueAFAnticoagulentDOAC_No-AFAnticoagulentHeparin_...
AFAnticoagulentDOAC_NoAFAnticoagulentHeparin_No0.93220.93220.8690FalseAFAnticoagulentDOAC_No-AFAnticoagulentHeparin_No
S2TIAInLastMonth_NoS2TIAInLastMonth_missing-0.90320.90320.8158TrueS2TIAInLastMonth_No-S2TIAInLastMonth_missing
AFAnticoagulentDOAC_NoAFAnticoagulentVitK_No0.88890.88890.7902FalseAFAnticoagulentDOAC_No-AFAnticoagulentVitK_No
AFAnticoagulentDOAC_missingS1AdmissionYear_2018-0.87750.87750.7700TrueAFAnticoagulentDOAC_missing-S1AdmissionYear_2018
S1AdmissionYear_2018AFAnticoagulentHeparin_missing-0.87750.87750.7700TrueAFAnticoagulentHeparin_missing-S1AdmissionYear...
S1AdmissionYear_2018AFAnticoagulentVitK_missing-0.87750.87750.7700TrueAFAnticoagulentVitK_missing-S1AdmissionYear_2018
S1AdmissionYear_2018AFAnticoagulentHeparin_No0.87580.87580.7671FalseAFAnticoagulentHeparin_No-S1AdmissionYear_2018
MotorLegRightMotorArmRight0.84350.84350.7115FalseMotorArmRight-MotorLegRight
AFAnticoagulentVitK_NoS1AdmissionYear_20180.83700.83700.7006FalseAFAnticoagulentVitK_No-S1AdmissionYear_2018
AFAnticoagulentDOAC_NoS2NewAFDiagnosis_missing-0.83420.83420.6959TrueAFAnticoagulentDOAC_No-S2NewAFDiagnosis_missing
MotorArmLeftMotorLegLeft0.83260.83260.6933FalseMotorArmLeft-MotorLegLeft
AFAnticoagulentVitK_NoS2NewAFDiagnosis_missing-0.82240.82240.6764TrueAFAnticoagulentVitK_No-S2NewAFDiagnosis_missing
S1AdmissionYear_2018AFAnticoagulentDOAC_No0.81080.81080.6574FalseAFAnticoagulentDOAC_No-S1AdmissionYear_2018
AFAnticoagulentVitK_missingS2NewAFDiagnosis_missing0.80460.80460.6474TrueAFAnticoagulentVitK_missing-S2NewAFDiagnosis_m...
AFAnticoagulentHeparin_missingS2NewAFDiagnosis_missing0.80460.80460.6474TrueAFAnticoagulentHeparin_missing-S2NewAFDiagnosi...
S2NewAFDiagnosis_missingAFAnticoagulentDOAC_missing0.80460.80460.6474TrueAFAnticoagulentDOAC_missing-S2NewAFDiagnosis_m...
S2NewAFDiagnosis_missingAFAnticoagulentHeparin_No-0.80350.80350.6456TrueAFAnticoagulentHeparin_No-S2NewAFDiagnosis_mis...
S2NewAFDiagnosis_NoAFAnticoagulentDOAC_No0.78590.78590.6176FalseAFAnticoagulentDOAC_No-S2NewAFDiagnosis_No
S1AdmissionYear_2018S2NewAFDiagnosis_missing-0.78050.78050.6092TrueS1AdmissionYear_2018-S2NewAFDiagnosis_missing
S2NewAFDiagnosis_NoAFAnticoagulentVitK_No0.77490.77490.6004FalseAFAnticoagulentVitK_No-S2NewAFDiagnosis_No
AFAntiplatelet_NoAFAntiplatelet_missing-0.77360.77360.5984TrueAFAntiplatelet_No-AFAntiplatelet_missing
AtrialFibrillation_NoAFAntiplatelet_No-0.77360.77360.5984FalseAFAntiplatelet_No-AtrialFibrillation_No
AFAntiplatelet_NoAtrialFibrillation_Yes0.77360.77360.5984FalseAFAntiplatelet_No-AtrialFibrillation_Yes
S1Ethnicity_WhiteS1Ethnicity_Other-0.76070.76070.5787FalseS1Ethnicity_Other-S1Ethnicity_White
S2NewAFDiagnosis_NoAFAnticoagulentDOAC_missing-0.75820.75820.5749TrueAFAnticoagulentDOAC_missing-S2NewAFDiagnosis_No
S2NewAFDiagnosis_NoAFAnticoagulentHeparin_missing-0.75820.75820.5749TrueAFAnticoagulentHeparin_missing-S2NewAFDiagnosi...
AFAnticoagulentVitK_missingS2NewAFDiagnosis_No-0.75820.75820.5749TrueAFAnticoagulentVitK_missing-S2NewAFDiagnosis_No
S2NewAFDiagnosis_NoAFAnticoagulentHeparin_No0.75720.75720.5734FalseAFAnticoagulentHeparin_No-S2NewAFDiagnosis_No
MoreEqual80y_YesS1AgeOnArrival0.75560.75560.5709FalseMoreEqual80y_Yes-S1AgeOnArrival
MoreEqual80y_NoS1AgeOnArrival-0.75560.75560.5709FalseMoreEqual80y_No-S1AgeOnArrival
S2NewAFDiagnosis_missingAFAnticoagulent_No-0.75020.75020.5628TrueAFAnticoagulent_No-S2NewAFDiagnosis_missing
S1AdmissionYear_2018S2NewAFDiagnosis_No0.73570.73570.5413FalseS1AdmissionYear_2018-S2NewAFDiagnosis_No
S1OnsetDateType_PreciseS1OnsetDateType_Best estimate-0.72670.72670.5281FalseS1OnsetDateType_Best estimate-S1OnsetDateType_...
AFAnticoagulent_NoAFAnticoagulentDOAC_No0.72120.72120.5202FalseAFAnticoagulentDOAC_No-AFAnticoagulent_No
S1AdmissionYear_2018AFAnticoagulent_missing-0.71490.71490.5111TrueAFAnticoagulent_missing-S1AdmissionYear_2018
AFAnticoagulentHeparin_missingAFAnticoagulent_missing0.70930.70930.5030TrueAFAnticoagulentHeparin_missing-AFAnticoagulent...
AFAnticoagulentDOAC_missingAFAnticoagulent_missing0.70930.70930.5030TrueAFAnticoagulentDOAC_missing-AFAnticoagulent_mi...
AFAnticoagulentVitK_missingAFAnticoagulent_missing0.70930.70930.5030TrueAFAnticoagulentVitK_missing-AFAnticoagulent_mi...
S2NewAFDiagnosis_NoAFAnticoagulent_No0.70920.70920.5030FalseAFAnticoagulent_No-S2NewAFDiagnosis_No
AFAnticoagulentHeparin_NoAFAnticoagulent_missing-0.70800.70800.5012TrueAFAnticoagulentHeparin_No-AFAnticoagulent_missing
AFAnticoagulent_missingAFAnticoagulent_No-0.70350.70350.4949TrueAFAnticoagulent_No-AFAnticoagulent_missing
AFAnticoagulent_NoAFAnticoagulentVitK_No0.69790.69790.4871FalseAFAnticoagulentVitK_No-AFAnticoagulent_No
BestLanguageLocQuestions0.69790.69790.4870FalseBestLanguage-LocQuestions
S2NihssArrivalExtinctionInattention0.67950.67950.4617FalseExtinctionInattention-S2NihssArrival
AFAnticoagulentVitK_NoAFAnticoagulent_missing-0.67630.67630.4574TrueAFAnticoagulentVitK_No-AFAnticoagulent_missing
BestGazeS2NihssArrival0.67560.67560.4564FalseBestGaze-S2NihssArrival
BestLanguageS2NihssArrival0.67430.67430.4547FalseBestLanguage-S2NihssArrival
AFAnticoagulent_NoS1AdmissionYear_20180.66940.66940.4480FalseAFAnticoagulent_No-S1AdmissionYear_2018
AFAnticoagulentHeparin_NoAFAnticoagulent_No0.66360.66360.4403FalseAFAnticoagulentHeparin_No-AFAnticoagulent_No
AFAnticoagulent_NoAFAnticoagulentDOAC_missing-0.66220.66220.4386TrueAFAnticoagulentDOAC_missing-AFAnticoagulent_No
AFAnticoagulentVitK_missingAFAnticoagulent_No-0.66220.66220.4386TrueAFAnticoagulentVitK_missing-AFAnticoagulent_No
AFAnticoagulentHeparin_missingAFAnticoagulent_No-0.66220.66220.4386TrueAFAnticoagulentHeparin_missing-AFAnticoagulent_No
LocCommandsLocQuestions0.65760.65760.4325FalseLocCommands-LocQuestions
S2NihssArrivalLocCommands0.65600.65600.4303FalseLocCommands-S2NihssArrival
AFAnticoagulent_missingAFAnticoagulentDOAC_No-0.65600.65600.4303TrueAFAnticoagulentDOAC_No-AFAnticoagulent_missing
AtrialFibrillation_YesAFAnticoagulent_Yes0.65160.65160.4245FalseAFAnticoagulent_Yes-AtrialFibrillation_Yes
AFAnticoagulent_YesAtrialFibrillation_No-0.65160.65160.4245FalseAFAnticoagulent_Yes-AtrialFibrillation_No
AFAnticoagulent_YesAFAntiplatelet_missing-0.65160.65160.4245TrueAFAnticoagulent_Yes-AFAntiplatelet_missing
S2NihssArrivalMotorLegRight0.64930.64930.4216FalseMotorLegRight-S2NihssArrival
S1OnsetDateType_Stroke during sleepS1OnsetDateType_Precise-0.64690.64690.4185FalseS1OnsetDateType_Precise-S1OnsetDateType_Stroke...
LocQuestionsS2NihssArrival0.63570.63570.4041FalseLocQuestions-S2NihssArrival
LocCommandsBestLanguage0.63450.63450.4026FalseBestLanguage-LocCommands
AFAnticoagulent_missingS2NewAFDiagnosis_missing0.63050.63050.3976TrueAFAnticoagulent_missing-S2NewAFDiagnosis_missing
S2NihssArrivalMotorArmRight0.62120.62120.3858FalseMotorArmRight-S2NihssArrival
AFAnticoagulent_YesAFAntiplatelet_No0.60130.60130.3615FalseAFAnticoagulent_Yes-AFAntiplatelet_No
VisualS2NihssArrival0.59860.59860.3583FalseS2NihssArrival-Visual
AFAnticoagulent_missingS2NewAFDiagnosis_No-0.59420.59420.3530TrueAFAnticoagulent_missing-S2NewAFDiagnosis_No
DysarthriaS2NihssArrival0.58400.58400.3410FalseDysarthria-S2NihssArrival
BestGazeExtinctionInattention0.57320.57320.3286FalseBestGaze-ExtinctionInattention
VisualExtinctionInattention0.56260.56260.3165FalseExtinctionInattention-Visual
S2NihssArrivalLoc0.55680.55680.3100FalseLoc-S2NihssArrival
S2NihssArrivalFacialPalsy0.55480.55480.3078FalseFacialPalsy-S2NihssArrival
S2NihssArrivalMotorLegLeft0.55070.55070.3033FalseMotorLegLeft-S2NihssArrival
S2NihssArrivalSensory0.55030.55030.3028FalseS2NihssArrival-Sensory
VisualBestGaze0.54420.54420.2961FalseBestGaze-Visual
S1AdmissionYear_2016AFAnticoagulentDOAC_missing0.54410.54410.2960TrueAFAnticoagulentDOAC_missing-S1AdmissionYear_2016
AFAnticoagulentVitK_missingS1AdmissionYear_20160.54410.54410.2960TrueAFAnticoagulentVitK_missing-S1AdmissionYear_2016
S1AdmissionYear_2016AFAnticoagulentHeparin_missing0.54410.54410.2960TrueAFAnticoagulentHeparin_missing-S1AdmissionYear...
S1AdmissionYear_2016AFAnticoagulentHeparin_No-0.54320.54320.2951FalseAFAnticoagulentHeparin_No-S1AdmissionYear_2016
MotorArmRightBestLanguage0.52390.52390.2745FalseBestLanguage-MotorArmRight
S1AdmissionYear_2016AFAnticoagulentVitK_No-0.52170.52170.2722FalseAFAnticoagulentVitK_No-S1AdmissionYear_2016
AtrialFibrillation_YesAFAnticoagulent_missing-0.51710.51710.2674TrueAFAnticoagulent_missing-AtrialFibrillation_Yes
AFAnticoagulent_missingAtrialFibrillation_No0.51710.51710.2674TrueAFAnticoagulent_missing-AtrialFibrillation_No
AFAnticoagulent_missingAFAntiplatelet_missing0.51710.51710.2674TrueAFAnticoagulent_missing-AFAntiplatelet_missing
S1AdmissionYear_2016AFAnticoagulentDOAC_No-0.50800.50800.2581FalseAFAnticoagulentDOAC_No-S1AdmissionYear_2016
ExtinctionInattentionSensory0.50720.50720.2573FalseExtinctionInattention-Sensory
StrokeTIA_NoS2TIAInLastMonth_missing0.50560.50560.2556TrueS2TIAInLastMonth_missing-StrokeTIA_No
S2TIAInLastMonth_missingStrokeTIA_Yes-0.50560.50560.2556TrueS2TIAInLastMonth_missing-StrokeTIA_Yes
MotorLegRightBestLanguage0.50520.50520.2553FalseBestLanguage-MotorLegRight
S1AdmissionYear_2017S1AdmissionYear_2018-0.50260.50260.2526FalseS1AdmissionYear_2017-S1AdmissionYear_2018
S1AdmissionYear_2016S1AdmissionYear_2017-0.50020.50020.2502FalseS1AdmissionYear_2016-S1AdmissionYear_2017
\n", "
" ], "text/plain": [ " variable \\\n", "AFAnticoagulentHeparin_No AFAnticoagulentHeparin_missing \n", "AFAnticoagulentDOAC_missing AFAnticoagulentHeparin_No \n", "AFAnticoagulentVitK_missing AFAnticoagulentHeparin_No \n", "S2StrokeType_Infarction S2StrokeType_Primary Intracerebral Haemorrhage \n", "AFAnticoagulentVitK_missing AFAnticoagulentVitK_No \n", "AFAnticoagulentVitK_No AFAnticoagulentHeparin_missing \n", "AFAnticoagulentVitK_No AFAnticoagulentDOAC_missing \n", "AFAnticoagulentHeparin_No AFAnticoagulentVitK_No \n", "S2NewAFDiagnosis_No S2NewAFDiagnosis_missing \n", "AFAnticoagulentDOAC_No AFAnticoagulentDOAC_missing \n", "AFAnticoagulentDOAC_No AFAnticoagulentVitK_missing \n", "AFAnticoagulentDOAC_No AFAnticoagulentHeparin_missing \n", "AFAnticoagulentDOAC_No AFAnticoagulentHeparin_No \n", "S2TIAInLastMonth_No S2TIAInLastMonth_missing \n", "AFAnticoagulentDOAC_No AFAnticoagulentVitK_No \n", "AFAnticoagulentDOAC_missing S1AdmissionYear_2018 \n", "S1AdmissionYear_2018 AFAnticoagulentHeparin_missing \n", "S1AdmissionYear_2018 AFAnticoagulentVitK_missing \n", "S1AdmissionYear_2018 AFAnticoagulentHeparin_No \n", "MotorLegRight MotorArmRight \n", "AFAnticoagulentVitK_No S1AdmissionYear_2018 \n", "AFAnticoagulentDOAC_No S2NewAFDiagnosis_missing \n", "MotorArmLeft MotorLegLeft \n", "AFAnticoagulentVitK_No S2NewAFDiagnosis_missing \n", "S1AdmissionYear_2018 AFAnticoagulentDOAC_No \n", "AFAnticoagulentVitK_missing S2NewAFDiagnosis_missing \n", "AFAnticoagulentHeparin_missing S2NewAFDiagnosis_missing \n", "S2NewAFDiagnosis_missing AFAnticoagulentDOAC_missing \n", "S2NewAFDiagnosis_missing AFAnticoagulentHeparin_No \n", "S2NewAFDiagnosis_No AFAnticoagulentDOAC_No \n", "S1AdmissionYear_2018 S2NewAFDiagnosis_missing \n", "S2NewAFDiagnosis_No AFAnticoagulentVitK_No \n", "AFAntiplatelet_No AFAntiplatelet_missing \n", "AtrialFibrillation_No AFAntiplatelet_No \n", "AFAntiplatelet_No AtrialFibrillation_Yes \n", "S1Ethnicity_White S1Ethnicity_Other \n", "S2NewAFDiagnosis_No AFAnticoagulentDOAC_missing \n", "S2NewAFDiagnosis_No AFAnticoagulentHeparin_missing \n", "AFAnticoagulentVitK_missing S2NewAFDiagnosis_No \n", "S2NewAFDiagnosis_No AFAnticoagulentHeparin_No \n", "MoreEqual80y_Yes S1AgeOnArrival \n", "MoreEqual80y_No S1AgeOnArrival \n", "S2NewAFDiagnosis_missing AFAnticoagulent_No \n", "S1AdmissionYear_2018 S2NewAFDiagnosis_No \n", "S1OnsetDateType_Precise S1OnsetDateType_Best estimate \n", "AFAnticoagulent_No AFAnticoagulentDOAC_No \n", "S1AdmissionYear_2018 AFAnticoagulent_missing \n", "AFAnticoagulentHeparin_missing AFAnticoagulent_missing \n", "AFAnticoagulentDOAC_missing AFAnticoagulent_missing \n", "AFAnticoagulentVitK_missing AFAnticoagulent_missing \n", "S2NewAFDiagnosis_No AFAnticoagulent_No \n", "AFAnticoagulentHeparin_No AFAnticoagulent_missing \n", "AFAnticoagulent_missing AFAnticoagulent_No \n", "AFAnticoagulent_No AFAnticoagulentVitK_No \n", "BestLanguage LocQuestions \n", "S2NihssArrival ExtinctionInattention \n", "AFAnticoagulentVitK_No AFAnticoagulent_missing \n", "BestGaze S2NihssArrival \n", "BestLanguage S2NihssArrival \n", "AFAnticoagulent_No S1AdmissionYear_2018 \n", "AFAnticoagulentHeparin_No AFAnticoagulent_No \n", "AFAnticoagulent_No AFAnticoagulentDOAC_missing \n", "AFAnticoagulentVitK_missing AFAnticoagulent_No \n", "AFAnticoagulentHeparin_missing AFAnticoagulent_No \n", "LocCommands LocQuestions \n", "S2NihssArrival LocCommands \n", "AFAnticoagulent_missing AFAnticoagulentDOAC_No \n", "AtrialFibrillation_Yes AFAnticoagulent_Yes \n", "AFAnticoagulent_Yes AtrialFibrillation_No \n", "AFAnticoagulent_Yes AFAntiplatelet_missing \n", "S2NihssArrival MotorLegRight \n", "S1OnsetDateType_Stroke during sleep S1OnsetDateType_Precise \n", "LocQuestions S2NihssArrival \n", "LocCommands BestLanguage \n", "AFAnticoagulent_missing S2NewAFDiagnosis_missing \n", "S2NihssArrival MotorArmRight \n", "AFAnticoagulent_Yes AFAntiplatelet_No \n", "Visual S2NihssArrival \n", "AFAnticoagulent_missing S2NewAFDiagnosis_No \n", "Dysarthria S2NihssArrival \n", "BestGaze ExtinctionInattention \n", "Visual ExtinctionInattention \n", "S2NihssArrival Loc \n", "S2NihssArrival FacialPalsy \n", "S2NihssArrival MotorLegLeft \n", "S2NihssArrival Sensory \n", "Visual BestGaze \n", "S1AdmissionYear_2016 AFAnticoagulentDOAC_missing \n", "AFAnticoagulentVitK_missing S1AdmissionYear_2016 \n", "S1AdmissionYear_2016 AFAnticoagulentHeparin_missing \n", "S1AdmissionYear_2016 AFAnticoagulentHeparin_No \n", "MotorArmRight BestLanguage \n", "S1AdmissionYear_2016 AFAnticoagulentVitK_No \n", "AtrialFibrillation_Yes AFAnticoagulent_missing \n", "AFAnticoagulent_missing AtrialFibrillation_No \n", "AFAnticoagulent_missing AFAntiplatelet_missing \n", "S1AdmissionYear_2016 AFAnticoagulentDOAC_No \n", "ExtinctionInattention Sensory \n", "StrokeTIA_No S2TIAInLastMonth_missing \n", "S2TIAInLastMonth_missing StrokeTIA_Yes \n", "MotorLegRight BestLanguage \n", "S1AdmissionYear_2017 S1AdmissionYear_2018 \n", "S1AdmissionYear_2016 S1AdmissionYear_2017 \n", "\n", " value abs_value r-squared missing \\\n", "AFAnticoagulentHeparin_No -0.9984 0.9984 0.9969 True \n", "AFAnticoagulentDOAC_missing -0.9984 0.9984 0.9969 True \n", "AFAnticoagulentVitK_missing -0.9984 0.9984 0.9969 True \n", "S2StrokeType_Infarction -0.9940 0.9940 0.9881 False \n", "AFAnticoagulentVitK_missing -0.9590 0.9590 0.9198 True \n", "AFAnticoagulentVitK_No -0.9590 0.9590 0.9198 True \n", "AFAnticoagulentVitK_No -0.9590 0.9590 0.9198 True \n", "AFAnticoagulentHeparin_No 0.9575 0.9575 0.9168 False \n", "S2NewAFDiagnosis_No -0.9423 0.9423 0.8880 True \n", "AFAnticoagulentDOAC_No -0.9339 0.9339 0.8721 True \n", "AFAnticoagulentDOAC_No -0.9339 0.9339 0.8721 True \n", "AFAnticoagulentDOAC_No -0.9339 0.9339 0.8721 True \n", "AFAnticoagulentDOAC_No 0.9322 0.9322 0.8690 False \n", "S2TIAInLastMonth_No -0.9032 0.9032 0.8158 True \n", "AFAnticoagulentDOAC_No 0.8889 0.8889 0.7902 False \n", "AFAnticoagulentDOAC_missing -0.8775 0.8775 0.7700 True \n", "S1AdmissionYear_2018 -0.8775 0.8775 0.7700 True \n", "S1AdmissionYear_2018 -0.8775 0.8775 0.7700 True \n", "S1AdmissionYear_2018 0.8758 0.8758 0.7671 False \n", "MotorLegRight 0.8435 0.8435 0.7115 False \n", "AFAnticoagulentVitK_No 0.8370 0.8370 0.7006 False \n", "AFAnticoagulentDOAC_No -0.8342 0.8342 0.6959 True \n", "MotorArmLeft 0.8326 0.8326 0.6933 False \n", "AFAnticoagulentVitK_No -0.8224 0.8224 0.6764 True \n", "S1AdmissionYear_2018 0.8108 0.8108 0.6574 False \n", "AFAnticoagulentVitK_missing 0.8046 0.8046 0.6474 True \n", "AFAnticoagulentHeparin_missing 0.8046 0.8046 0.6474 True \n", "S2NewAFDiagnosis_missing 0.8046 0.8046 0.6474 True \n", "S2NewAFDiagnosis_missing -0.8035 0.8035 0.6456 True \n", "S2NewAFDiagnosis_No 0.7859 0.7859 0.6176 False \n", "S1AdmissionYear_2018 -0.7805 0.7805 0.6092 True \n", "S2NewAFDiagnosis_No 0.7749 0.7749 0.6004 False \n", "AFAntiplatelet_No -0.7736 0.7736 0.5984 True \n", "AtrialFibrillation_No -0.7736 0.7736 0.5984 False \n", "AFAntiplatelet_No 0.7736 0.7736 0.5984 False \n", "S1Ethnicity_White -0.7607 0.7607 0.5787 False \n", "S2NewAFDiagnosis_No -0.7582 0.7582 0.5749 True \n", "S2NewAFDiagnosis_No -0.7582 0.7582 0.5749 True \n", "AFAnticoagulentVitK_missing -0.7582 0.7582 0.5749 True \n", "S2NewAFDiagnosis_No 0.7572 0.7572 0.5734 False \n", "MoreEqual80y_Yes 0.7556 0.7556 0.5709 False \n", "MoreEqual80y_No -0.7556 0.7556 0.5709 False \n", "S2NewAFDiagnosis_missing -0.7502 0.7502 0.5628 True \n", "S1AdmissionYear_2018 0.7357 0.7357 0.5413 False \n", "S1OnsetDateType_Precise -0.7267 0.7267 0.5281 False \n", "AFAnticoagulent_No 0.7212 0.7212 0.5202 False \n", "S1AdmissionYear_2018 -0.7149 0.7149 0.5111 True \n", "AFAnticoagulentHeparin_missing 0.7093 0.7093 0.5030 True \n", "AFAnticoagulentDOAC_missing 0.7093 0.7093 0.5030 True \n", "AFAnticoagulentVitK_missing 0.7093 0.7093 0.5030 True \n", "S2NewAFDiagnosis_No 0.7092 0.7092 0.5030 False \n", "AFAnticoagulentHeparin_No -0.7080 0.7080 0.5012 True \n", "AFAnticoagulent_missing -0.7035 0.7035 0.4949 True \n", "AFAnticoagulent_No 0.6979 0.6979 0.4871 False \n", "BestLanguage 0.6979 0.6979 0.4870 False \n", "S2NihssArrival 0.6795 0.6795 0.4617 False \n", "AFAnticoagulentVitK_No -0.6763 0.6763 0.4574 True \n", "BestGaze 0.6756 0.6756 0.4564 False \n", "BestLanguage 0.6743 0.6743 0.4547 False \n", "AFAnticoagulent_No 0.6694 0.6694 0.4480 False \n", "AFAnticoagulentHeparin_No 0.6636 0.6636 0.4403 False \n", "AFAnticoagulent_No -0.6622 0.6622 0.4386 True \n", "AFAnticoagulentVitK_missing -0.6622 0.6622 0.4386 True \n", "AFAnticoagulentHeparin_missing -0.6622 0.6622 0.4386 True \n", "LocCommands 0.6576 0.6576 0.4325 False \n", "S2NihssArrival 0.6560 0.6560 0.4303 False \n", "AFAnticoagulent_missing -0.6560 0.6560 0.4303 True \n", "AtrialFibrillation_Yes 0.6516 0.6516 0.4245 False \n", "AFAnticoagulent_Yes -0.6516 0.6516 0.4245 False \n", "AFAnticoagulent_Yes -0.6516 0.6516 0.4245 True \n", "S2NihssArrival 0.6493 0.6493 0.4216 False \n", "S1OnsetDateType_Stroke during sleep -0.6469 0.6469 0.4185 False \n", "LocQuestions 0.6357 0.6357 0.4041 False \n", "LocCommands 0.6345 0.6345 0.4026 False \n", "AFAnticoagulent_missing 0.6305 0.6305 0.3976 True \n", "S2NihssArrival 0.6212 0.6212 0.3858 False \n", "AFAnticoagulent_Yes 0.6013 0.6013 0.3615 False \n", "Visual 0.5986 0.5986 0.3583 False \n", "AFAnticoagulent_missing -0.5942 0.5942 0.3530 True \n", "Dysarthria 0.5840 0.5840 0.3410 False \n", "BestGaze 0.5732 0.5732 0.3286 False \n", "Visual 0.5626 0.5626 0.3165 False \n", "S2NihssArrival 0.5568 0.5568 0.3100 False \n", "S2NihssArrival 0.5548 0.5548 0.3078 False \n", "S2NihssArrival 0.5507 0.5507 0.3033 False \n", "S2NihssArrival 0.5503 0.5503 0.3028 False \n", "Visual 0.5442 0.5442 0.2961 False \n", "S1AdmissionYear_2016 0.5441 0.5441 0.2960 True \n", "AFAnticoagulentVitK_missing 0.5441 0.5441 0.2960 True \n", "S1AdmissionYear_2016 0.5441 0.5441 0.2960 True \n", "S1AdmissionYear_2016 -0.5432 0.5432 0.2951 False \n", "MotorArmRight 0.5239 0.5239 0.2745 False \n", "S1AdmissionYear_2016 -0.5217 0.5217 0.2722 False \n", "AtrialFibrillation_Yes -0.5171 0.5171 0.2674 True \n", "AFAnticoagulent_missing 0.5171 0.5171 0.2674 True \n", "AFAnticoagulent_missing 0.5171 0.5171 0.2674 True \n", "S1AdmissionYear_2016 -0.5080 0.5080 0.2581 False \n", "ExtinctionInattention 0.5072 0.5072 0.2573 False \n", "StrokeTIA_No 0.5056 0.5056 0.2556 True \n", "S2TIAInLastMonth_missing -0.5056 0.5056 0.2556 True \n", "MotorLegRight 0.5052 0.5052 0.2553 False \n", "S1AdmissionYear_2017 -0.5026 0.5026 0.2526 False \n", "S1AdmissionYear_2016 -0.5002 0.5002 0.2502 False \n", "\n", " pair \n", "AFAnticoagulentHeparin_No AFAnticoagulentHeparin_No-AFAnticoagulentHepar... \n", "AFAnticoagulentDOAC_missing AFAnticoagulentDOAC_missing-AFAnticoagulentHep... \n", "AFAnticoagulentVitK_missing AFAnticoagulentHeparin_No-AFAnticoagulentVitK_... \n", "S2StrokeType_Infarction S2StrokeType_Infarction-S2StrokeType_Primary I... \n", "AFAnticoagulentVitK_missing AFAnticoagulentVitK_No-AFAnticoagulentVitK_mis... \n", "AFAnticoagulentVitK_No AFAnticoagulentHeparin_missing-AFAnticoagulent... \n", "AFAnticoagulentVitK_No AFAnticoagulentDOAC_missing-AFAnticoagulentVit... \n", "AFAnticoagulentHeparin_No AFAnticoagulentHeparin_No-AFAnticoagulentVitK_No \n", "S2NewAFDiagnosis_No S2NewAFDiagnosis_No-S2NewAFDiagnosis_missing \n", "AFAnticoagulentDOAC_No AFAnticoagulentDOAC_No-AFAnticoagulentDOAC_mis... \n", "AFAnticoagulentDOAC_No AFAnticoagulentDOAC_No-AFAnticoagulentVitK_mis... \n", "AFAnticoagulentDOAC_No AFAnticoagulentDOAC_No-AFAnticoagulentHeparin_... \n", "AFAnticoagulentDOAC_No AFAnticoagulentDOAC_No-AFAnticoagulentHeparin_No \n", "S2TIAInLastMonth_No S2TIAInLastMonth_No-S2TIAInLastMonth_missing \n", "AFAnticoagulentDOAC_No AFAnticoagulentDOAC_No-AFAnticoagulentVitK_No \n", "AFAnticoagulentDOAC_missing AFAnticoagulentDOAC_missing-S1AdmissionYear_2018 \n", "S1AdmissionYear_2018 AFAnticoagulentHeparin_missing-S1AdmissionYear... \n", "S1AdmissionYear_2018 AFAnticoagulentVitK_missing-S1AdmissionYear_2018 \n", "S1AdmissionYear_2018 AFAnticoagulentHeparin_No-S1AdmissionYear_2018 \n", "MotorLegRight MotorArmRight-MotorLegRight \n", "AFAnticoagulentVitK_No AFAnticoagulentVitK_No-S1AdmissionYear_2018 \n", "AFAnticoagulentDOAC_No AFAnticoagulentDOAC_No-S2NewAFDiagnosis_missing \n", "MotorArmLeft MotorArmLeft-MotorLegLeft \n", "AFAnticoagulentVitK_No AFAnticoagulentVitK_No-S2NewAFDiagnosis_missing \n", "S1AdmissionYear_2018 AFAnticoagulentDOAC_No-S1AdmissionYear_2018 \n", "AFAnticoagulentVitK_missing AFAnticoagulentVitK_missing-S2NewAFDiagnosis_m... \n", "AFAnticoagulentHeparin_missing AFAnticoagulentHeparin_missing-S2NewAFDiagnosi... \n", "S2NewAFDiagnosis_missing AFAnticoagulentDOAC_missing-S2NewAFDiagnosis_m... \n", "S2NewAFDiagnosis_missing AFAnticoagulentHeparin_No-S2NewAFDiagnosis_mis... \n", "S2NewAFDiagnosis_No AFAnticoagulentDOAC_No-S2NewAFDiagnosis_No \n", "S1AdmissionYear_2018 S1AdmissionYear_2018-S2NewAFDiagnosis_missing \n", "S2NewAFDiagnosis_No AFAnticoagulentVitK_No-S2NewAFDiagnosis_No \n", "AFAntiplatelet_No AFAntiplatelet_No-AFAntiplatelet_missing \n", "AtrialFibrillation_No AFAntiplatelet_No-AtrialFibrillation_No \n", "AFAntiplatelet_No AFAntiplatelet_No-AtrialFibrillation_Yes \n", "S1Ethnicity_White S1Ethnicity_Other-S1Ethnicity_White \n", "S2NewAFDiagnosis_No AFAnticoagulentDOAC_missing-S2NewAFDiagnosis_No \n", "S2NewAFDiagnosis_No AFAnticoagulentHeparin_missing-S2NewAFDiagnosi... \n", "AFAnticoagulentVitK_missing AFAnticoagulentVitK_missing-S2NewAFDiagnosis_No \n", "S2NewAFDiagnosis_No AFAnticoagulentHeparin_No-S2NewAFDiagnosis_No \n", "MoreEqual80y_Yes MoreEqual80y_Yes-S1AgeOnArrival \n", "MoreEqual80y_No MoreEqual80y_No-S1AgeOnArrival \n", "S2NewAFDiagnosis_missing AFAnticoagulent_No-S2NewAFDiagnosis_missing \n", "S1AdmissionYear_2018 S1AdmissionYear_2018-S2NewAFDiagnosis_No \n", "S1OnsetDateType_Precise S1OnsetDateType_Best estimate-S1OnsetDateType_... \n", "AFAnticoagulent_No AFAnticoagulentDOAC_No-AFAnticoagulent_No \n", "S1AdmissionYear_2018 AFAnticoagulent_missing-S1AdmissionYear_2018 \n", "AFAnticoagulentHeparin_missing AFAnticoagulentHeparin_missing-AFAnticoagulent... \n", "AFAnticoagulentDOAC_missing AFAnticoagulentDOAC_missing-AFAnticoagulent_mi... \n", "AFAnticoagulentVitK_missing AFAnticoagulentVitK_missing-AFAnticoagulent_mi... \n", "S2NewAFDiagnosis_No AFAnticoagulent_No-S2NewAFDiagnosis_No \n", "AFAnticoagulentHeparin_No AFAnticoagulentHeparin_No-AFAnticoagulent_missing \n", "AFAnticoagulent_missing AFAnticoagulent_No-AFAnticoagulent_missing \n", "AFAnticoagulent_No AFAnticoagulentVitK_No-AFAnticoagulent_No \n", "BestLanguage BestLanguage-LocQuestions \n", "S2NihssArrival ExtinctionInattention-S2NihssArrival \n", "AFAnticoagulentVitK_No AFAnticoagulentVitK_No-AFAnticoagulent_missing \n", "BestGaze BestGaze-S2NihssArrival \n", "BestLanguage BestLanguage-S2NihssArrival \n", "AFAnticoagulent_No AFAnticoagulent_No-S1AdmissionYear_2018 \n", "AFAnticoagulentHeparin_No AFAnticoagulentHeparin_No-AFAnticoagulent_No \n", "AFAnticoagulent_No AFAnticoagulentDOAC_missing-AFAnticoagulent_No \n", "AFAnticoagulentVitK_missing AFAnticoagulentVitK_missing-AFAnticoagulent_No \n", "AFAnticoagulentHeparin_missing AFAnticoagulentHeparin_missing-AFAnticoagulent_No \n", "LocCommands LocCommands-LocQuestions \n", "S2NihssArrival LocCommands-S2NihssArrival \n", "AFAnticoagulent_missing AFAnticoagulentDOAC_No-AFAnticoagulent_missing \n", "AtrialFibrillation_Yes AFAnticoagulent_Yes-AtrialFibrillation_Yes \n", "AFAnticoagulent_Yes AFAnticoagulent_Yes-AtrialFibrillation_No \n", "AFAnticoagulent_Yes AFAnticoagulent_Yes-AFAntiplatelet_missing \n", "S2NihssArrival MotorLegRight-S2NihssArrival \n", "S1OnsetDateType_Stroke during sleep S1OnsetDateType_Precise-S1OnsetDateType_Stroke... \n", "LocQuestions LocQuestions-S2NihssArrival \n", "LocCommands BestLanguage-LocCommands \n", "AFAnticoagulent_missing AFAnticoagulent_missing-S2NewAFDiagnosis_missing \n", "S2NihssArrival MotorArmRight-S2NihssArrival \n", "AFAnticoagulent_Yes AFAnticoagulent_Yes-AFAntiplatelet_No \n", "Visual S2NihssArrival-Visual \n", "AFAnticoagulent_missing AFAnticoagulent_missing-S2NewAFDiagnosis_No \n", "Dysarthria Dysarthria-S2NihssArrival \n", "BestGaze BestGaze-ExtinctionInattention \n", "Visual ExtinctionInattention-Visual \n", "S2NihssArrival Loc-S2NihssArrival \n", "S2NihssArrival FacialPalsy-S2NihssArrival \n", "S2NihssArrival MotorLegLeft-S2NihssArrival \n", "S2NihssArrival S2NihssArrival-Sensory \n", "Visual BestGaze-Visual \n", "S1AdmissionYear_2016 AFAnticoagulentDOAC_missing-S1AdmissionYear_2016 \n", "AFAnticoagulentVitK_missing AFAnticoagulentVitK_missing-S1AdmissionYear_2016 \n", "S1AdmissionYear_2016 AFAnticoagulentHeparin_missing-S1AdmissionYear... \n", "S1AdmissionYear_2016 AFAnticoagulentHeparin_No-S1AdmissionYear_2016 \n", "MotorArmRight BestLanguage-MotorArmRight \n", "S1AdmissionYear_2016 AFAnticoagulentVitK_No-S1AdmissionYear_2016 \n", "AtrialFibrillation_Yes AFAnticoagulent_missing-AtrialFibrillation_Yes \n", "AFAnticoagulent_missing AFAnticoagulent_missing-AtrialFibrillation_No \n", "AFAnticoagulent_missing AFAnticoagulent_missing-AFAntiplatelet_missing \n", "S1AdmissionYear_2016 AFAnticoagulentDOAC_No-S1AdmissionYear_2016 \n", "ExtinctionInattention ExtinctionInattention-Sensory \n", "StrokeTIA_No S2TIAInLastMonth_missing-StrokeTIA_No \n", "S2TIAInLastMonth_missing S2TIAInLastMonth_missing-StrokeTIA_Yes \n", "MotorLegRight BestLanguage-MotorLegRight \n", "S1AdmissionYear_2017 S1AdmissionYear_2017-S1AdmissionYear_2018 \n", "S1AdmissionYear_2016 S1AdmissionYear_2016-S1AdmissionYear_2017 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get highly correlated data (covariance between 0.50 and 0.999)\n", "pd.set_option('display.max_rows', None)\n", "mask = (cov['abs_value'] <= 0.999) & (cov['abs_value'] >= 0.50)\n", "cov[mask]" ] }, { "cell_type": "markdown", "id": "391e025f-e085-494e-a8b5-6ed0143b5299", "metadata": {}, "source": [ "Repeat (covariance between 0.50 and 0.999), but exclude when one of the data pairs is tagged as 'missing' data." ] }, { "cell_type": "code", "execution_count": 11, "id": "375ff285-a8e6-43b2-90df-3977566c74e5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablevalueabs_valuer-squaredmissingpair
S2StrokeType_InfarctionS2StrokeType_Primary Intracerebral Haemorrhage-0.99400.99400.9881FalseS2StrokeType_Infarction-S2StrokeType_Primary I...
AFAnticoagulentHeparin_NoAFAnticoagulentVitK_No0.95750.95750.9168FalseAFAnticoagulentHeparin_No-AFAnticoagulentVitK_No
AFAnticoagulentDOAC_NoAFAnticoagulentHeparin_No0.93220.93220.8690FalseAFAnticoagulentDOAC_No-AFAnticoagulentHeparin_No
AFAnticoagulentDOAC_NoAFAnticoagulentVitK_No0.88890.88890.7902FalseAFAnticoagulentDOAC_No-AFAnticoagulentVitK_No
S1AdmissionYear_2018AFAnticoagulentHeparin_No0.87580.87580.7671FalseAFAnticoagulentHeparin_No-S1AdmissionYear_2018
MotorLegRightMotorArmRight0.84350.84350.7115FalseMotorArmRight-MotorLegRight
AFAnticoagulentVitK_NoS1AdmissionYear_20180.83700.83700.7006FalseAFAnticoagulentVitK_No-S1AdmissionYear_2018
MotorArmLeftMotorLegLeft0.83260.83260.6933FalseMotorArmLeft-MotorLegLeft
S1AdmissionYear_2018AFAnticoagulentDOAC_No0.81080.81080.6574FalseAFAnticoagulentDOAC_No-S1AdmissionYear_2018
S2NewAFDiagnosis_NoAFAnticoagulentDOAC_No0.78590.78590.6176FalseAFAnticoagulentDOAC_No-S2NewAFDiagnosis_No
S2NewAFDiagnosis_NoAFAnticoagulentVitK_No0.77490.77490.6004FalseAFAnticoagulentVitK_No-S2NewAFDiagnosis_No
AtrialFibrillation_NoAFAntiplatelet_No-0.77360.77360.5984FalseAFAntiplatelet_No-AtrialFibrillation_No
AFAntiplatelet_NoAtrialFibrillation_Yes0.77360.77360.5984FalseAFAntiplatelet_No-AtrialFibrillation_Yes
S1Ethnicity_WhiteS1Ethnicity_Other-0.76070.76070.5787FalseS1Ethnicity_Other-S1Ethnicity_White
S2NewAFDiagnosis_NoAFAnticoagulentHeparin_No0.75720.75720.5734FalseAFAnticoagulentHeparin_No-S2NewAFDiagnosis_No
MoreEqual80y_YesS1AgeOnArrival0.75560.75560.5709FalseMoreEqual80y_Yes-S1AgeOnArrival
MoreEqual80y_NoS1AgeOnArrival-0.75560.75560.5709FalseMoreEqual80y_No-S1AgeOnArrival
S1AdmissionYear_2018S2NewAFDiagnosis_No0.73570.73570.5413FalseS1AdmissionYear_2018-S2NewAFDiagnosis_No
S1OnsetDateType_PreciseS1OnsetDateType_Best estimate-0.72670.72670.5281FalseS1OnsetDateType_Best estimate-S1OnsetDateType_...
AFAnticoagulent_NoAFAnticoagulentDOAC_No0.72120.72120.5202FalseAFAnticoagulentDOAC_No-AFAnticoagulent_No
S2NewAFDiagnosis_NoAFAnticoagulent_No0.70920.70920.5030FalseAFAnticoagulent_No-S2NewAFDiagnosis_No
AFAnticoagulent_NoAFAnticoagulentVitK_No0.69790.69790.4871FalseAFAnticoagulentVitK_No-AFAnticoagulent_No
BestLanguageLocQuestions0.69790.69790.4870FalseBestLanguage-LocQuestions
S2NihssArrivalExtinctionInattention0.67950.67950.4617FalseExtinctionInattention-S2NihssArrival
BestGazeS2NihssArrival0.67560.67560.4564FalseBestGaze-S2NihssArrival
BestLanguageS2NihssArrival0.67430.67430.4547FalseBestLanguage-S2NihssArrival
AFAnticoagulent_NoS1AdmissionYear_20180.66940.66940.4480FalseAFAnticoagulent_No-S1AdmissionYear_2018
AFAnticoagulentHeparin_NoAFAnticoagulent_No0.66360.66360.4403FalseAFAnticoagulentHeparin_No-AFAnticoagulent_No
LocCommandsLocQuestions0.65760.65760.4325FalseLocCommands-LocQuestions
S2NihssArrivalLocCommands0.65600.65600.4303FalseLocCommands-S2NihssArrival
AtrialFibrillation_YesAFAnticoagulent_Yes0.65160.65160.4245FalseAFAnticoagulent_Yes-AtrialFibrillation_Yes
AFAnticoagulent_YesAtrialFibrillation_No-0.65160.65160.4245FalseAFAnticoagulent_Yes-AtrialFibrillation_No
S2NihssArrivalMotorLegRight0.64930.64930.4216FalseMotorLegRight-S2NihssArrival
S1OnsetDateType_Stroke during sleepS1OnsetDateType_Precise-0.64690.64690.4185FalseS1OnsetDateType_Precise-S1OnsetDateType_Stroke...
LocQuestionsS2NihssArrival0.63570.63570.4041FalseLocQuestions-S2NihssArrival
LocCommandsBestLanguage0.63450.63450.4026FalseBestLanguage-LocCommands
S2NihssArrivalMotorArmRight0.62120.62120.3858FalseMotorArmRight-S2NihssArrival
AFAnticoagulent_YesAFAntiplatelet_No0.60130.60130.3615FalseAFAnticoagulent_Yes-AFAntiplatelet_No
VisualS2NihssArrival0.59860.59860.3583FalseS2NihssArrival-Visual
DysarthriaS2NihssArrival0.58400.58400.3410FalseDysarthria-S2NihssArrival
BestGazeExtinctionInattention0.57320.57320.3286FalseBestGaze-ExtinctionInattention
VisualExtinctionInattention0.56260.56260.3165FalseExtinctionInattention-Visual
S2NihssArrivalLoc0.55680.55680.3100FalseLoc-S2NihssArrival
S2NihssArrivalFacialPalsy0.55480.55480.3078FalseFacialPalsy-S2NihssArrival
S2NihssArrivalMotorLegLeft0.55070.55070.3033FalseMotorLegLeft-S2NihssArrival
S2NihssArrivalSensory0.55030.55030.3028FalseS2NihssArrival-Sensory
VisualBestGaze0.54420.54420.2961FalseBestGaze-Visual
S1AdmissionYear_2016AFAnticoagulentHeparin_No-0.54320.54320.2951FalseAFAnticoagulentHeparin_No-S1AdmissionYear_2016
MotorArmRightBestLanguage0.52390.52390.2745FalseBestLanguage-MotorArmRight
S1AdmissionYear_2016AFAnticoagulentVitK_No-0.52170.52170.2722FalseAFAnticoagulentVitK_No-S1AdmissionYear_2016
S1AdmissionYear_2016AFAnticoagulentDOAC_No-0.50800.50800.2581FalseAFAnticoagulentDOAC_No-S1AdmissionYear_2016
ExtinctionInattentionSensory0.50720.50720.2573FalseExtinctionInattention-Sensory
MotorLegRightBestLanguage0.50520.50520.2553FalseBestLanguage-MotorLegRight
S1AdmissionYear_2017S1AdmissionYear_2018-0.50260.50260.2526FalseS1AdmissionYear_2017-S1AdmissionYear_2018
S1AdmissionYear_2016S1AdmissionYear_2017-0.50020.50020.2502FalseS1AdmissionYear_2016-S1AdmissionYear_2017
\n", "
" ], "text/plain": [ " variable \\\n", "S2StrokeType_Infarction S2StrokeType_Primary Intracerebral Haemorrhage \n", "AFAnticoagulentHeparin_No AFAnticoagulentVitK_No \n", "AFAnticoagulentDOAC_No AFAnticoagulentHeparin_No \n", "AFAnticoagulentDOAC_No AFAnticoagulentVitK_No \n", "S1AdmissionYear_2018 AFAnticoagulentHeparin_No \n", "MotorLegRight MotorArmRight \n", "AFAnticoagulentVitK_No S1AdmissionYear_2018 \n", "MotorArmLeft MotorLegLeft \n", "S1AdmissionYear_2018 AFAnticoagulentDOAC_No \n", "S2NewAFDiagnosis_No AFAnticoagulentDOAC_No \n", "S2NewAFDiagnosis_No AFAnticoagulentVitK_No \n", "AtrialFibrillation_No AFAntiplatelet_No \n", "AFAntiplatelet_No AtrialFibrillation_Yes \n", "S1Ethnicity_White S1Ethnicity_Other \n", "S2NewAFDiagnosis_No AFAnticoagulentHeparin_No \n", "MoreEqual80y_Yes S1AgeOnArrival \n", "MoreEqual80y_No S1AgeOnArrival \n", "S1AdmissionYear_2018 S2NewAFDiagnosis_No \n", "S1OnsetDateType_Precise S1OnsetDateType_Best estimate \n", "AFAnticoagulent_No AFAnticoagulentDOAC_No \n", "S2NewAFDiagnosis_No AFAnticoagulent_No \n", "AFAnticoagulent_No AFAnticoagulentVitK_No \n", "BestLanguage LocQuestions \n", "S2NihssArrival ExtinctionInattention \n", "BestGaze S2NihssArrival \n", "BestLanguage S2NihssArrival \n", "AFAnticoagulent_No S1AdmissionYear_2018 \n", "AFAnticoagulentHeparin_No AFAnticoagulent_No \n", "LocCommands LocQuestions \n", "S2NihssArrival LocCommands \n", "AtrialFibrillation_Yes AFAnticoagulent_Yes \n", "AFAnticoagulent_Yes AtrialFibrillation_No \n", "S2NihssArrival MotorLegRight \n", "S1OnsetDateType_Stroke during sleep S1OnsetDateType_Precise \n", "LocQuestions S2NihssArrival \n", "LocCommands BestLanguage \n", "S2NihssArrival MotorArmRight \n", "AFAnticoagulent_Yes AFAntiplatelet_No \n", "Visual S2NihssArrival \n", "Dysarthria S2NihssArrival \n", "BestGaze ExtinctionInattention \n", "Visual ExtinctionInattention \n", "S2NihssArrival Loc \n", "S2NihssArrival FacialPalsy \n", "S2NihssArrival MotorLegLeft \n", "S2NihssArrival Sensory \n", "Visual BestGaze \n", "S1AdmissionYear_2016 AFAnticoagulentHeparin_No \n", "MotorArmRight BestLanguage \n", "S1AdmissionYear_2016 AFAnticoagulentVitK_No \n", "S1AdmissionYear_2016 AFAnticoagulentDOAC_No \n", "ExtinctionInattention Sensory \n", "MotorLegRight BestLanguage \n", "S1AdmissionYear_2017 S1AdmissionYear_2018 \n", "S1AdmissionYear_2016 S1AdmissionYear_2017 \n", "\n", " value abs_value r-squared missing \\\n", "S2StrokeType_Infarction -0.9940 0.9940 0.9881 False \n", "AFAnticoagulentHeparin_No 0.9575 0.9575 0.9168 False \n", "AFAnticoagulentDOAC_No 0.9322 0.9322 0.8690 False \n", "AFAnticoagulentDOAC_No 0.8889 0.8889 0.7902 False \n", "S1AdmissionYear_2018 0.8758 0.8758 0.7671 False \n", "MotorLegRight 0.8435 0.8435 0.7115 False \n", "AFAnticoagulentVitK_No 0.8370 0.8370 0.7006 False \n", "MotorArmLeft 0.8326 0.8326 0.6933 False \n", "S1AdmissionYear_2018 0.8108 0.8108 0.6574 False \n", "S2NewAFDiagnosis_No 0.7859 0.7859 0.6176 False \n", "S2NewAFDiagnosis_No 0.7749 0.7749 0.6004 False \n", "AtrialFibrillation_No -0.7736 0.7736 0.5984 False \n", "AFAntiplatelet_No 0.7736 0.7736 0.5984 False \n", "S1Ethnicity_White -0.7607 0.7607 0.5787 False \n", "S2NewAFDiagnosis_No 0.7572 0.7572 0.5734 False \n", "MoreEqual80y_Yes 0.7556 0.7556 0.5709 False \n", "MoreEqual80y_No -0.7556 0.7556 0.5709 False \n", "S1AdmissionYear_2018 0.7357 0.7357 0.5413 False \n", "S1OnsetDateType_Precise -0.7267 0.7267 0.5281 False \n", "AFAnticoagulent_No 0.7212 0.7212 0.5202 False \n", "S2NewAFDiagnosis_No 0.7092 0.7092 0.5030 False \n", "AFAnticoagulent_No 0.6979 0.6979 0.4871 False \n", "BestLanguage 0.6979 0.6979 0.4870 False \n", "S2NihssArrival 0.6795 0.6795 0.4617 False \n", "BestGaze 0.6756 0.6756 0.4564 False \n", "BestLanguage 0.6743 0.6743 0.4547 False \n", "AFAnticoagulent_No 0.6694 0.6694 0.4480 False \n", "AFAnticoagulentHeparin_No 0.6636 0.6636 0.4403 False \n", "LocCommands 0.6576 0.6576 0.4325 False \n", "S2NihssArrival 0.6560 0.6560 0.4303 False \n", "AtrialFibrillation_Yes 0.6516 0.6516 0.4245 False \n", "AFAnticoagulent_Yes -0.6516 0.6516 0.4245 False \n", "S2NihssArrival 0.6493 0.6493 0.4216 False \n", "S1OnsetDateType_Stroke during sleep -0.6469 0.6469 0.4185 False \n", "LocQuestions 0.6357 0.6357 0.4041 False \n", "LocCommands 0.6345 0.6345 0.4026 False \n", "S2NihssArrival 0.6212 0.6212 0.3858 False \n", "AFAnticoagulent_Yes 0.6013 0.6013 0.3615 False \n", "Visual 0.5986 0.5986 0.3583 False \n", "Dysarthria 0.5840 0.5840 0.3410 False \n", "BestGaze 0.5732 0.5732 0.3286 False \n", "Visual 0.5626 0.5626 0.3165 False \n", "S2NihssArrival 0.5568 0.5568 0.3100 False \n", "S2NihssArrival 0.5548 0.5548 0.3078 False \n", "S2NihssArrival 0.5507 0.5507 0.3033 False \n", "S2NihssArrival 0.5503 0.5503 0.3028 False \n", "Visual 0.5442 0.5442 0.2961 False \n", "S1AdmissionYear_2016 -0.5432 0.5432 0.2951 False \n", "MotorArmRight 0.5239 0.5239 0.2745 False \n", "S1AdmissionYear_2016 -0.5217 0.5217 0.2722 False \n", "S1AdmissionYear_2016 -0.5080 0.5080 0.2581 False \n", "ExtinctionInattention 0.5072 0.5072 0.2573 False \n", "MotorLegRight 0.5052 0.5052 0.2553 False \n", "S1AdmissionYear_2017 -0.5026 0.5026 0.2526 False \n", "S1AdmissionYear_2016 -0.5002 0.5002 0.2502 False \n", "\n", " pair \n", "S2StrokeType_Infarction S2StrokeType_Infarction-S2StrokeType_Primary I... \n", "AFAnticoagulentHeparin_No AFAnticoagulentHeparin_No-AFAnticoagulentVitK_No \n", "AFAnticoagulentDOAC_No AFAnticoagulentDOAC_No-AFAnticoagulentHeparin_No \n", "AFAnticoagulentDOAC_No AFAnticoagulentDOAC_No-AFAnticoagulentVitK_No \n", "S1AdmissionYear_2018 AFAnticoagulentHeparin_No-S1AdmissionYear_2018 \n", "MotorLegRight MotorArmRight-MotorLegRight \n", "AFAnticoagulentVitK_No AFAnticoagulentVitK_No-S1AdmissionYear_2018 \n", "MotorArmLeft MotorArmLeft-MotorLegLeft \n", "S1AdmissionYear_2018 AFAnticoagulentDOAC_No-S1AdmissionYear_2018 \n", "S2NewAFDiagnosis_No AFAnticoagulentDOAC_No-S2NewAFDiagnosis_No \n", "S2NewAFDiagnosis_No AFAnticoagulentVitK_No-S2NewAFDiagnosis_No \n", "AtrialFibrillation_No AFAntiplatelet_No-AtrialFibrillation_No \n", "AFAntiplatelet_No AFAntiplatelet_No-AtrialFibrillation_Yes \n", "S1Ethnicity_White S1Ethnicity_Other-S1Ethnicity_White \n", "S2NewAFDiagnosis_No AFAnticoagulentHeparin_No-S2NewAFDiagnosis_No \n", "MoreEqual80y_Yes MoreEqual80y_Yes-S1AgeOnArrival \n", "MoreEqual80y_No MoreEqual80y_No-S1AgeOnArrival \n", "S1AdmissionYear_2018 S1AdmissionYear_2018-S2NewAFDiagnosis_No \n", "S1OnsetDateType_Precise S1OnsetDateType_Best estimate-S1OnsetDateType_... \n", "AFAnticoagulent_No AFAnticoagulentDOAC_No-AFAnticoagulent_No \n", "S2NewAFDiagnosis_No AFAnticoagulent_No-S2NewAFDiagnosis_No \n", "AFAnticoagulent_No AFAnticoagulentVitK_No-AFAnticoagulent_No \n", "BestLanguage BestLanguage-LocQuestions \n", "S2NihssArrival ExtinctionInattention-S2NihssArrival \n", "BestGaze BestGaze-S2NihssArrival \n", "BestLanguage BestLanguage-S2NihssArrival \n", "AFAnticoagulent_No AFAnticoagulent_No-S1AdmissionYear_2018 \n", "AFAnticoagulentHeparin_No AFAnticoagulentHeparin_No-AFAnticoagulent_No \n", "LocCommands LocCommands-LocQuestions \n", "S2NihssArrival LocCommands-S2NihssArrival \n", "AtrialFibrillation_Yes AFAnticoagulent_Yes-AtrialFibrillation_Yes \n", "AFAnticoagulent_Yes AFAnticoagulent_Yes-AtrialFibrillation_No \n", "S2NihssArrival MotorLegRight-S2NihssArrival \n", "S1OnsetDateType_Stroke during sleep S1OnsetDateType_Precise-S1OnsetDateType_Stroke... \n", "LocQuestions LocQuestions-S2NihssArrival \n", "LocCommands BestLanguage-LocCommands \n", "S2NihssArrival MotorArmRight-S2NihssArrival \n", "AFAnticoagulent_Yes AFAnticoagulent_Yes-AFAntiplatelet_No \n", "Visual S2NihssArrival-Visual \n", "Dysarthria Dysarthria-S2NihssArrival \n", "BestGaze BestGaze-ExtinctionInattention \n", "Visual ExtinctionInattention-Visual \n", "S2NihssArrival Loc-S2NihssArrival \n", "S2NihssArrival FacialPalsy-S2NihssArrival \n", "S2NihssArrival MotorLegLeft-S2NihssArrival \n", "S2NihssArrival S2NihssArrival-Sensory \n", "Visual BestGaze-Visual \n", "S1AdmissionYear_2016 AFAnticoagulentHeparin_No-S1AdmissionYear_2016 \n", "MotorArmRight BestLanguage-MotorArmRight \n", "S1AdmissionYear_2016 AFAnticoagulentVitK_No-S1AdmissionYear_2016 \n", "S1AdmissionYear_2016 AFAnticoagulentDOAC_No-S1AdmissionYear_2016 \n", "ExtinctionInattention ExtinctionInattention-Sensory \n", "MotorLegRight BestLanguage-MotorLegRight \n", "S1AdmissionYear_2017 S1AdmissionYear_2017-S1AdmissionYear_2018 \n", "S1AdmissionYear_2016 S1AdmissionYear_2016-S1AdmissionYear_2017 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get highly correlated data (covariance between 0.50 and 0.999)\n", "pd.set_option('display.max_rows', None)\n", "mask = (cov['abs_value'] <= 0.999) & (cov['abs_value'] >= 0.50) & (cov['missing'] == False)\n", "cov[mask]" ] }, { "cell_type": "markdown", "id": "78b7df71-5244-4914-98d5-7f2100c9656c", "metadata": {}, "source": [ "## Observations\n", "\n", "* Most of the features show weak correlation (96% of feature pairs have an R-squared of less than 0.1)\n", "\n", "* Perfectly correlated feature pairs ar epresent due to dichotomised coding of some features.\n", "\n", "* Many highly correlated features are due to correlaytions between missing data and the value if data is present. There are some 'more interesting' highly correlated data such as:\n", " \n", " * Right leg and arm weakness is are highly correlated, as are left leg and arm weakness.\n", " * Right leg weakness is highly correlated with problems in balance and language." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }