Instagram
youtube
Facebook
Twitter

Cleaning Empty Cells

When analyzing data, empty cells may produce incorrect results.

Removing Rows

  • In a large dataset removing a few rows with empty cells might not affect your analysis. for instance: here we have created a dataframe with some None values in rows.
    import pandas as pd
    
    df = pd.DataFrame([
      ['JOHN SMITH', 'john.smith@gmail.com'],
      ['Jane Doe', 'jdoe@yahoo.com'],
      [None, 'jonathanbyers888@gmail.com'],
      ['joe schmo', 'joeschmo@hotmail.com'],
      ['Jim Hopper', None],
      ['Mike Wheeler', None]
    ],
    columns=['Name', 'Email'])
    
    print(df)
    
    new_df = df.dropna()
    
    print('-----------------------------------------------')
    print('After droping irrelevant rows new dataframe is:') #i.e rows 3, 5, 6
    print('-----------------------------------------------')
    print(new_df)

  • dropna() will not change the original DataFrame it returns a new DataFrame, to change the original DataFrame use the inplace = True argument inside the dropna.


Replacing The Empty Values

  • We can also replace the empty values inside our DataFrame and insert a new value instead.
  • With this, we won't be deleting those entire rows just because of some empty cells.
  • Here, fillna() method used to replace the value, for instance:
    import pandas as pd
    
    df = pd.DataFrame([
      ['JOHN SMITH', 'john.smith@gmail.com'],
      ['Jane Doe', 'jdoe@yahoo.com'],
      [None, 'jonathanbyers888@gmail.com'],
      ['joe schmo', 'joeschmo@hotmail.com'],
      ['Jim Hopper', None],
      ['Mike Wheeler', None]
    ],
    columns=['Name', 'Email'])
    
    print(df)
    
    df.fillna('Not Given', inplace=True)
    
    print('-----------------------------------------------')
    print('After replacing irrelevant rows cells our dataframe is:') 
    print('-----------------------------------------------')
    print(df)

  • We can also just replace the value in some specified columns, for Example: here only the Name column value is replaced with 'will update shortly'.

    import pandas as pd
    
    df = pd.DataFrame([
      ['JOHN SMITH', 'john.smith@gmail.com'],
      ['Jane Doe', 'jdoe@yahoo.com'],
      [None, 'jonathanbyers888@gmail.com'],
      ['joe schmo', 'joeschmo@hotmail.com'],
      ['Jim Hopper', None],
      ['Mike Wheeler', None]
    ],
    columns=['Name', 'Email'])
    
    df['Name'].fillna('Will update shortly', inplace=True)
    
    print(df)