Tabular Playground Series — August 2022

Tabular Playground Series is a monthly competition held by Kaggle.

The data represents the results of a large product testing study. For each, product_code you are given a set of products attributes (fixed for the code) as well as several measurement values for each product, representing various lab testing methods. Each product is used in a simulated real-world environment experiment and absorbs a certain amount of fluid (loading) to see whether or not it fails.

The ultimate goal is to use the data to predict individual product failures of new codes with their lab test results.

Getting familiar with available columns

Next, we can inspect the missing values from the data:

Missing values from both Train and Test data.

In the next step, we would like to see the distribution of the data

Data distribution

After understanding the distribution of the data, we take a look at the product code of both train and test data

Product code for Train and Test data

We notice that both train and test data has different product code.

Getting original and transformed plots for Train and Test data.

In the following step, we will mix both test and train data.

We will get the missing values of some of the attributes that we have:

Next, we need to get the information about the top ten measurement columns sorted by their correlation.

Then we get selected the column selected by the sum of its 3 first rows:

Selected columns

Next, we filled in NA values using the HuberRegressor and KNNImputer. HuberRegression will be used when (1) except for the target feature, all other correlated feature column has no null values or samples where the columns have no null values. KNNImputer will be used otherwise.

Once we get the data we will use StandardScaler on the data and then train the data using StratifiedKFold.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Irvi Aini

Machine Learning, Natural Language Processing, and Open Source.