Instagram
youtube
Facebook
Twitter

Outliers

Outliers

  • As we learn earlier that the mean helps us to quickly understand the different parts of the data.
  • However, the mean is highly sensitive to some specific values in our dataset.
  • These specific values are significantly different from others making them outliers as these don't fit within the majority of the dataset.
  • It is an essential step to identify these outliers, especially from the data science point of view. If they go unnoticed, they can cause errors in our data collection analysis.
  • Once we're able to spot outliers, we can decide whether or not they actually indicate a considerable, but a real deviation from the mean or whether they were caused by a sampling error.
  • Suppose we are handling a dataset of heights of some kids in inches, but somehow one of the kid's height's was recorded in centimeters, and the data set looks like this:
    [50, 50, 51, 49, 48, 145]
    

    In this case, the outlier is 145.

  •  What if one of the kid's height is significantly higher than the other's:
[50, 50, 51, 49, 48, 64.5]

​

            In this case, the outlier is 64.5.