Cleaning Empty Cells
When analyzing data, empty cells may produce incorrect results.
Removing Rows
- In a large dataset removing a few rows with empty cells might not affect your analysis. for instance: here we have created a dataframe with some None values in rows.
import pandas as pd df = pd.DataFrame([ ['JOHN SMITH', 'john.smith@gmail.com'], ['Jane Doe', 'jdoe@yahoo.com'], [None, 'jonathanbyers888@gmail.com'], ['joe schmo', 'joeschmo@hotmail.com'], ['Jim Hopper', None], ['Mike Wheeler', None] ], columns=['Name', 'Email']) print(df) new_df = df.dropna() print('-----------------------------------------------') print('After droping irrelevant rows new dataframe is:') #i.e rows 3, 5, 6 print('-----------------------------------------------') print(new_df)
-
dropna() will not change the original DataFrame it returns a new DataFrame, to change the original DataFrame use the inplace = True argument inside the dropna.
Replacing The Empty Values
- We can also replace the empty values inside our DataFrame and insert a new value instead.
- With this, we won't be deleting those entire rows just because of some empty cells.
- Here, fillna() method used to replace the value, for instance:
import pandas as pd df = pd.DataFrame([ ['JOHN SMITH', 'john.smith@gmail.com'], ['Jane Doe', 'jdoe@yahoo.com'], [None, 'jonathanbyers888@gmail.com'], ['joe schmo', 'joeschmo@hotmail.com'], ['Jim Hopper', None], ['Mike Wheeler', None] ], columns=['Name', 'Email']) print(df) df.fillna('Not Given', inplace=True) print('-----------------------------------------------') print('After replacing irrelevant rows cells our dataframe is:') print('-----------------------------------------------') print(df)
-
We can also just replace the value in some specified columns, for Example: here only the Name column value is replaced with 'will update shortly'.
import pandas as pd df = pd.DataFrame([ ['JOHN SMITH', 'john.smith@gmail.com'], ['Jane Doe', 'jdoe@yahoo.com'], [None, 'jonathanbyers888@gmail.com'], ['joe schmo', 'joeschmo@hotmail.com'], ['Jim Hopper', None], ['Mike Wheeler', None] ], columns=['Name', 'Email']) df['Name'].fillna('Will update shortly', inplace=True) print(df)