Pandas Tutorials | Projects | Interview Questions Tutorial

Interview Questions in Pandas

Groupby

❮ Previous Next ❯

Groupby

Pandas groupby is used to group data into categories and then apply a function to the categories. It also aids in the efficient aggregation of data.
In pandas, we use groupby() function which splits the data into groups based on some condition.
- Syntax:
```
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)
```
  Parameters here:
- by: mapping, function, str.
- axis: int, default 0
- level: If the axis is a MultiIndex, group by a particular level or levels.
- as_index: For aggregated output, return an object with group labels as the index. Only relevant for DataFrame input. as_index=False is an effective “SQL-style” grouped output. sort:
- Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.
- group_keys: When calling apply, add group keys to the index to identify pieces.
- squeeze: Reduce the dimensionality of the return type if possible, otherwise return a consistent type
- Returns: GroupBy object.

Example:

import pandas as pd

arr = [[11, 12, 13], [41, 76, 34], [23, None, 37], [91, 12, 20]]

df = pd.DataFrame(arr, columns=['p','q','r'])

print(df)

sk = df.groupby('q') #spliting the data based on column 'q'

print(sk.first())
print('----------------')
print(df.groupby('q').sum()) #printing the sum of other values based on q's values
print('----------------')

Output:

    p     q   r
0  11  12.0  13
1  41  76.0  34
2  23   NaN  37
3  91  12.0  20
       p   r
q           
12.0  11  13
76.0  41  34
----------------
        p   r
q            
12.0  102  33
76.0   41  34
----------------

Example:

import pandas as pd

df = pd.DataFrame({'Avengers': ['Falcon', 'Falcon',
                              'Iron Man', 'Iron Man'],
                   'Max Speed': [380., 370., 424., 226.]})

print(df)
print('---------------------------')
print(df.groupby('Avengers').mean())

Output:

   Avengers  Max Speed
0    Falcon      380.0
1    Falcon      370.0
2  Iron Man      424.0
3  Iron Man      226.0
---------------------------
          Max Speed
Avengers           
Falcon        375.0
Iron Man      325.0

❮ Previous Next ❯

Pandas Tutorials | Projects | Interview Questions Tutorial

Pandas Tutorials | Projects | Interview Questions Tutorial

Introduction to Pandas

Cleaning Data with Pandas

Tables in Pandas

Interview Questions in Pandas

Groupby

Groupby

Important Links