Outliers, and how to deal with them

An introductory Colab notebook showing how to deal with outliers in simple Machine Learning tasks.

Your company collected a dataset containing information about investments in commercials via TV, Radio and Newspapers. Given the amount of money invested in different advertisement media (TV, Radio, Newspaper) predict sales.

In this notebook we are going to meet outliers 😦 for the first time!

What are we going to do? 🤔

Data import , analysis & preprocessing ⚙️
Train an ML model
Treat outliers
Check performance degradation due to outliers

But wait… what actually is an outlier?

From Wikipedia: in statistics, an outlier is a data point that differs significantly from other observations.

For a better experience, open in Colab: