Pandas DataFrame
DataFrame
- Dataframe is an object that stores data as rows and columns.
- You can think of it as a spreadsheet or an SQL table.
- You can manually create a Dataframe or fill it with data from a CSV, an Excel spreadsheet, or SQL.
- Rows and columns are present in DataFrames. Each column is identified by a string name. Each row has a unique integer index. Strings, ints, floats, tuples, and other data types can be stored in DataFrames.
- You can also pass in a dictionary to a dataframe where each key is a column name and each value in a list is the column value. For instance, take a look at the code below:
df1 = pd.DataFrame({ 'name': ['Smith', 'Jane', 'Joe'], 'address': ['123 Main St.', '456 Maple Ave.', '789 Broadway'], 'age': [34, 28, 51] })
This will create a DataFrame like:
address age name 123 Main St 34 Smith 456 Maple Ave 28 Jane 789 Broadway 51 Joe Columns will always appear in alphabetical order.
Example:
We're going to create a DataFrame consisting of data based on clothing with product-ID its name and color.
import pandas as pd
df1 = pd.DataFrame({
'Product ID': [1, 2, 3, 4],
'Product Name':['t-shirt','t-shirt','skirt','skirt'],
'Color': ['blue', 'green', 'red','black']
})
print(df1)
Output:
Product ID | Product Name | Color |
---|---|---|
1 | t-shirt | blue |
2 | t-shirt | green |
3 | skirt | red |
4 | skirt | black |
Adding Data using lists
- We can also use lists to add data in DataFrame.
- You can enter a list of lists, each representing a row of data. To pass a list of column names, use the keyword columns.
Example:
We're going to create a DataFrame of a company with its ID and location and the number of employees being columns using lists.
import pandas as pd
df2 = pd.DataFrame([
[1, 'Indore', 100],
[2, 'Bangalore', 120],
[3, 'Hyderabad', 90],
[4, 'Mumbai', 115]
],
columns=[
'ID', 'Location', 'Number of Employees'
])
print(df2)
Output:
ID | Location | Number of Employees |
---|---|---|
1 | Indore | 100 |
2 | Bangalore | 120 |
3 | Hyderabad | 90 |
4 | Mumbai | 115 |