Interview Questions in Pandas
Q-1) What is Pandas in Python?
- Pandas is referred to as an open-source library that offers high-performance data manipulation in Python. The term Panel Data, which denotes a Structural equation model from Multidimensional Data, is where the name Pandas originates.
Q-2) What are the different types of Data Structures in Pandas?
- Pandas provide two data structures, Series and DataFrames, which are supported by the pandas library. Both of these data structures are based on NumPy. In Pandas, a Series is a one-dimensional data structure, whereas a DataFrame is a two-dimensional data structure.
Q-3) Define DataFrames?
- One of the pandas' most popular data structures, the DataFrame, uses a two-dimensional array with named axes (rows and columns) As a common method of storing data, DataFrame has two separate indexes: row index and column index.
Q-4) What are the features of pandas that are significant?
- The key features are:
- Memory Efficient
- Time Series
- Reshaping
- Merge and join
- Data Alignment
Q-5) Define Series in Pandas?
- A Series is a one-dimensional array that can store a variety of data types. The index refers to the row labels of a series. We can easily convert a list, tuple, or dictionary into a series by using the 'series' method. A Series cannot have more than one column.
Q-6) What is categorical data in Pandas?
-
A categorical data is described as corresponding to a statistical categorical variable. A categorical variable often only has a small, typically set range of possible values.
Q-7) What is the name of Pandas library tools used to create a scatter plot matrix?
- Scatter_matrix
Q-8) What are the different ways a DataFrame can be created in pandas?
- There are following 2 ways:
- List
- Dict of arrays
Q-9) How to convert a numpy array to a dataframe?
-
p = pd.Series(np.random.randint(1, 7, 35)) info = pd.DataFrame(p.values.reshape(7,5))
Output:
0 1 2 3 4 0 3 2 5 5 1 1 3 2 5 5 5 2 1 3 1 2 6 3 1 1 1 2 2 4 3 5 3 3 3 5 2 5 3 6 4 6 3 6 6 6 5
Q-10) How to convert DataFrame into Numpy array?
-
We can convert Pandas DataFrames to numpy arrays to perform some high-level mathematical functions. It makes use of the DataFrame.to_numpy() method.
-
The DataFrame.to_numpy() function is used to return a numpy ndarray.
Q-11) Pandas Index?
- A crucial tool that chooses specific rows and columns of data from a DataFrame is called a Pandas Index.
Q-12) Whar are the Data Operations in Pandas?
- There are following useful data operations in pandas:
- Row and column selection
- Filter Data
- Null Values
Q-13) What is GroupBy in Pandas?
- By using them on actual data sets, the Pandas groupby() method enables us to reorder the data. Its main duty is to divide the data into numerous categories.
Q-14) Why are Pandas used in Data analysis?
- Pandas is a library in the python programming language to conduct tasks such as data processing and analysis. The library includes operations and data structures for manipulating time series and numerical tables.
Q-15) Define Time Series in Pandas.
- A time series is an ordered succession of data that illustrates how a quantity evolves over time. Pandas have a wide range of capabilities and features for working with time series data across multiple domains.
Q-16) What Reindexing in Pandas?
- Reindexing implies converting DataFrame to a new index with optional filling logic, inserting NA/NaN in places where there was no value in the previous index. It modifies the row and column labels of a DataFrame.
Q-17) Define Categorial Data in Pandas.
- Categoricals are data type in Pandas that corresponds to categorical variables in statistics. A categorical variable has a limited and usually fixed, set of values (categories; levels in R). Gender, social class, blood type, nation affiliation, observation time, or rating using Likert scales are some examples. All categorical data values are either in categories or np.nan.
Q-18) How to create a copy of the series in Pandas?
-
One can create a copy of a series in pandas using the following syntax:
pandas.Series.copy Series.copy(deep=True)
This will create a deep copy of the series i.e. it includes data and indices. if deep=False then it will neither copy data not indices.
Q-19) How to add Index, column or row in Pandas DataFrame?
- For Index: If you build a DataFrame, Pandas allows you to add the inputs to the index argument. It will ensure that the desired index is present. If no inputs are specified, the DataFrame has by default a numerically valued index that begins with 0 and finishes on the DataFrame's last row.
- For Columns & Rows we can use .loc, .iloc:
- Here .loc works for the labels of an index. The loc() operate is label primarily based knowledge choosing methodology which suggests that we've got to pass the name of the row or column that we wish to pick out.
- Here .iloc works for the position in the index. The iloc() operate is associate degree indexed-based choosing methodology which implies that we've to pass the associate degree number index within the methodology to pick out a particular row/column. This methodology doesn't embrace the last component of the vary passed in it in contrast to loc(). iloc() doesn't settle for the Boolean information in contrast to loc().
Q-20) Which function is used to iterate over Pandas DataFrame?
- By combining a for loop with an iterrows() function on the DataFrame, you may iterate over the rows of the DataFrame.
Q-21) Convert DataFrame into an excel file.
- Using the to excel() function, we can export the DataFrame to an Excel file.
Q-22) List down the Data Operations in Pandas.
- Data Operation in Pandas are:
- By Filtering the data: By using the boolean operations/expressions in DataFrame we can easily filter the data in Pandas.
- By Rows and Column Selection: In Pandas, we can easily select any row and column of the DataFrame by passing the name of the rows & columns. After that, it becomes 1-Dimensional and termed a Series.
- Null Values: This occurs when no data is provided in the items/rows/columns in a dataframe. these are usually represented as NaN.
Q-23) Name the function used to get the number of rows and columns in a Dataframe in Pandas.
- For this, we can use .shape() function to find the number of rows and columns in a DataFrame in Pandas.
- syntax:
df.shape()
Q-24) Name the function used to know if a DataFrame is empty or not in Pandas.
- Here, the function used here is .empty(). It checks if DataFrame is empty or not, It returns True if empty otherwise False.
Q-25) Name the function used to get the sum of values of a column in Pandas DataFrame.
- Dataframe in Pandas sum() returns the sum of the values for the specified axis. If the input is an index axis, it adds all the values in a column and then continues the process for all columns, returning a series with the sum of all the values in each column. It also allows you to skip over missing values in the dataframe while calculating the sum.
- syntax:
DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs) Parameters : axis : {index (0), columns (1)} skipna : Exclude NA/null values when computing the result. level : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series numeric_only : Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series. min_count : The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA. Returns : sum : Series or DataFrame (if level specified)
Q-26) Name the function to get the average of values of a column in pandas dataframe.
- The dataframe.mean() function in Pandas returns the mean of the values for the specified axis. When applied to a pandas series object, the method produces a scalar value representing the mean of all the observations in the dataframe. When applied to a pandas dataframe object, the method returns a pandas series object with the mean of the values along the selected axis.
- Syntax:
DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs) Parameters : axis : {index (0), columns (1)} skipna : Exclude NA/null values when computing the result level : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series numeric_only : Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series. Returns : mean : Series or DataFrame (if level specified)
Q-27) Create a DataFrame using a list of tuples.
- We can create a DataFrame by simply passing tuple to the DataFrame constructor:
import pandas as pd # data in the form of list of tuples data = [('hid', 18, 1), ('wihd', 17, 2), ('weee', 16, 3), ('qwqw', 15, 4), ('pkap', 14, 5) ] # create DataFrame using data df = pd.DataFrame(data, columns =['Name', 'Age', 'Score']) print(df)
Output:
Name Age Score 0 hid 18 1 1 wihd 17 2 2 weee 16 3 3 qwqw 15 4 4 pkap 14 5 >
Q-28) How to Rename the Index or column of a DataFrame in Pandas?
- The columns or index values of a DataFrame can be given new values using the.rename method.
Q-29) Define Data Aggregation.
- In Data Aggregation we apply some aggregation to one or more columns by using the following method:
- sum: It is employed to provide the requested axis's total value range.
- min: it returns the minimum value from the requested axis.
- max: This function returns the axis's maximum values.
Q-30) How to write DataFrame into a file in pandas?
-
You might want to convert the DataFrame to another format after using Pandas to perform your data munging and transformation. The two methods of exporting your DataFrame—to a CSV file or to a Excel .
Creating a CSV file from a DataFrame:
Use to csv to create a CSV file from a Pandas DataFrame ().Excel DataFrame Writing:
To write your table to Excel, use the to excel() function in a manner very similar to how you did it for your DataFrame's CSV output.