Quickly & Easily Replace Missing Values with RapidMiner

In this blog post you can see how to quickly and easily replace missing values with RapidMiner's Turbo Prep Feature.

Loading Data

It is easy to use RapidMiner's Turbo Prep feature to quickly load data into memory for data preparation.

You can choose to import data from your computer or from a database.

In this example we will use the Titanic data included with RapidMiner's sample data sets.

*Note TurboPrep gives you some basic information about the data before loading like the number of rows and columns.

Exploring the data to find missing values

Once you have loaded the data using TurboPrep you can view the details for the selected data.

On the top of each column you can find a histogram which represents the distribution of the values in each column.

Also above each column there is a bar which shows quality measures for each column where a red line represent the quantity of missing values.

At the bottom TurboPrep displays the amount of rows & columns and column types.

If you right click a column you can select to 'show details' where you will be given statistics (i.e. max, min, std.dev.), a histogram for a visual representation of the distribution of the data, and a summary where you can find the percentage of missing values.

In RapidMiner missing values are denoted as a "?" in the data set.

Replacing Missing Values

Replacing missing numerical values

After selecting a column with missing values TurboPrep lets you quickly and easily replace those missing values by choosing the 'cleanse' action. You can also right click a column - hover over 'cleanse' - and select 'replace missing.'

For numerical type columns TurboPrep allows you to replace missing values by:

  • a specific value

  • zero

  • the maximum or minimum value

  • or the average value in that column.

In this example we are going to use the 'average' to replace the missing values in the Age column.

Replacing missing nominal values

Columns with both numbers and letters are considered nominal and are labeled as 'category' as shown above.

To replace missing values you can right click a column - hover over 'cleanse' - and select 'replace missing.'

For nominal type columns TurboPrep allows you to replace missing values by:

  • the most frequent value in the column

  • or a specific value

In the example above we will replace all missing values with a specific value, 'none.'

Exporting the data

TurboPrep allows you to easily export the prepared data to a repository, Excel, CSV, or Qlik file for storage.

In this example we chose to export the file to an Excel spreadsheet.

And there you go. This was an example of how RapidMiner's Turbo Prep feature can quickly and easily replace missing values in any data set.

156 views0 comments

Recent Posts

See All