Outline#

This section demonstrates generation of synthetic data based on SAMueL-1 data.

  • Creating synthetic Titanic passenger data with SMOTE: Use of SMOTE (Synthetic Minority Oversamplying Technique) to generate synthetic data. This demonstration of the method uses Titanic passenger survival data to demonstrate the SMOTE method of creating synthetic data (this may be reproduced).

  • Creating synthetic SAMUeL data data with SMOTE: Generation of synthetic SAMueL data using SMOTE.

  • Testing of SAMueL-1 synthetic data: descriptive statistics: Descriptive statistics of synthetic vs. real data.

  • Testing of SAMueL-1 synthetic data: Compare differences between synthetic data and real data nearest-neighbours: Compare nearest neighbour distances within real data and between synthetic and real data.

  • Testing of SAMueL-1 synthetic data: random forest model: Training a random forst model with synthetic data and testing on real data (not used to create synthetic data).

  • Testing of SAMueL-1 synthetic data: logistic regression model: Training a logistic regression model with synthetic data and testing on real data (not used to create synthetic data).