Instagram
youtube
Facebook
Twitter

Calculating Percentiles in NumPy

Percentiles

  • Suppose, we wanted to find a point where 70% of samples are above and 30% are below and how are we going to find it. These types of points are called a percentile.

  • The Nth percentile is called at the point where the N% samples are below. 

  • Percentiles are useful measurements because they show us where a specific value falls within the larger dataset.

Example:

import numpy as np

arr = np.array([ 2, 6, 14, 4, 3, 9, 1, 11, 4, 2, 8])

thirtieth_percentile = np.percentile(arr, 30)
seventieth_percentile = np.percentile(arr, 70)
fourteith_percentile = np.percentile(arr, 40)

print(thirtieth_percentile)
print(seventieth_percentile)
print(fourteith_percentile)

 Example:

3.0
8.0
4.0

Some percentiles have specific names: 

  1. The 25th percentile is called the first quartile.

  2. The 50th percentile is called the median.

  3. The 75th percentile is called the third quartile.

The difference between the first and third quartile is a value called the interquartile range.

Example:

d = [1, 2, 3, 4, 4, 4, 6, 6, 7, 8, 8]

Now we can calculate 25th and 75th percentile

np.percentile(d, 25) #prints 3.5

np.percentile(d, 75) #prints 6.5

To find the interquartile range, we subtract the value of the 25th percentile from the value of the 75th:

6.5 - 3.5 = 3

Half of the data will fall within the interquartile range. The interquartile range indicates how to spread out our data. The lower the interquartile range value, the less variation there is in our dataset. The higher the value, the greater the variance.