• 6 months, 1 week ago

Wipro Data Science Interview Questions and Answers

Aadesh Shrivastav
Table of Contents

Navigating a Wipro data science interview requires a combination of technical proficiency, problem-solving skills, and the ability to communicate your thought process effectively. This blog serves as a comprehensive resource to aid in your preparation, offering insights into the types of questions you may encounter and providing expertly crafted answers. By thoroughly understanding these interview questions and practicing your responses, you can confidently approach your Wipro data science interview and increase your chances of landing that coveted role in this dynamic field.

Q1. Python or R - Which one would you prefer for text analytics?

Ans: Both Python and R are widely used and have strong capabilities. The choice between them often depends on factors such as your team's expertise, the specific libraries or packages you prefer, and any existing infrastructure or tools in your organization. Here's a brief summary:

  • Python:
    • Versatility: Python is a general-purpose programming language with a vast ecosystem of libraries and tools.
    • NLP Libraries: NLTK and SpaCy are powerful NLP libraries in Python that provide extensive functionality for text processing and analysis.
    • Machine Learning: Python has robust machine learning libraries (e.g., scikit-learn, TensorFlow, PyTorch), making it suitable for integrating text analytics with machine learning models.
    • Community Support: Python has a large and active community, resulting in abundant resources and documentation.
  • R:
    • Statistical Tradition: R has a strong tradition in statistical analysis, which can be advantageous if your text analytics work involves a significant statistical component.
    • Tidyverse: The Tidyverse in R provides a consistent and efficient way to manipulate and visualize data, which includes packages for text mining like tm and quanteda.
    • Data Visualization: If your text analytics work involves substantial data visualization, R's ggplot2 is a powerful tool for creating visualizations.
    • Statistical Packages: R has a wide range of statistical packages that may be beneficial for certain aspects of text analytics.

In summary, both Python and R are capable of handling text analytics tasks, and the choice depends on your specific project requirements and the preferences and expertise of your team. Many data scientists use a combination of both languages based on the task at hand.


Q2. What is Spatial, Context Vector and Attention Vector? 

Ans: The terms "Spatial," "Context Vector," and "Attention Vector" are often associated with concepts in natural language processing (NLP) and deep learning, particularly in the context of attention mechanisms used in neural networks.

1. Spatial:

  • In the context of computer vision, "spatial" generally refers to the spatial dimensions or the arrangement of elements in an image. For example, spatial features in an image might refer to the arrangement of pixels or regions within the image. In the context of attention mechanisms in NLP, "spatial attention" can refer to focusing on specific positions or elements in a sequence (e.g., words in a sentence).

2. Context Vector:

  • A context vector is a representation that captures the context or information from a given input sequence. In NLP, context vectors are often used in the context of recurrent neural networks (RNNs) or other sequence-to-sequence models. The context vector aims to capture the relevant information from the input sequence to aid in making predictions or generating output.

3. Attention Vector:

  • The attention mechanism is a concept used in neural networks to selectively focus on certain parts of the input sequence when generating an output. The attention vector represents the weights assigned to different elements in the input sequence, indicating their importance or relevance to the current step in the output generation. Attention mechanisms are particularly useful in handling long sequences and capturing dependencies between different parts of the input.

In the context of attention mechanisms, there are different types of attention, including:

  • Soft Attention: Assigns weights to all elements in the input sequence, with the weights summing up to 1. The output is a weighted sum of the input elements.
  • Hard Attention: Selects a subset of elements from the input sequence based on learned or predefined criteria.

These concepts are often used in advanced NLP models, such as Transformer models, which have proven to be highly effective for various natural language processing tasks, including machine translation, text summarization, and question answering.


Q3. Explain multiclass problem in neural networks.

Ans: In the context of neural networks and machine learning, a multiclass problem refers to a classification task where there are more than two classes or categories that the model needs to predict. Unlike binary classification, where the goal is to classify instances into one of two classes (e.g., spam or not spam), multiclass classification involves predicting the correct class from a set of three or more possible classes.

Here are some key points related to multiclass classification problems in neural networks:

1. Output Layer:

  • In a neural network designed for multiclass classification, the number of nodes in the output layer is equal to the number of classes. Each node in the output layer corresponds to a specific class, and the network's task is to assign a probability distribution over these classes.

2. Activation Function:

  • The activation function used in the output layer depends on the nature of the problem. For multiclass classification, the softmax activation function is commonly used. Softmax converts the raw output scores into probabilities, ensuring that the sum of probabilities across all classes is equal to 1.

3. Loss Function:

  • Cross-entropy loss (or categorical cross-entropy) is a common choice for the loss function in multiclass classification problems. It measures the difference between the predicted probabilities and the true distribution of class labels.

4. Training:

  • During training, the neural network adjusts its weights and biases using an optimization algorithm (e.g., stochastic gradient descent) to minimize the chosen loss function.

5. One-Hot Encoding:

  • In the dataset, class labels are often represented using one-hot encoding. Each class label is represented as a binary vector where only the index corresponding to the class is marked as 1, and the others are 0. This helps in representing categorical information in a format suitable for neural network training.

6. Evaluation:

  • Evaluation metrics for multiclass classification include accuracy, precision, recall, and F1-score, among others. These metrics provide insights into the performance of the model across all classes.

Example: In a handwritten digit recognition task, where the goal is to classify digits (0-9), it's a multiclass classification problem with 10 classes.

In summary, a multiclass problem in neural networks involves predicting one of several classes for each input instance. This type of classification task is common in various applications, such as image recognition, natural language processing, and speech recognition.


Q4. Difference between cross entropy and sparse entropy.

Ans: Cross-entropy and sparse cross-entropy are both loss functions commonly used in machine learning for classification problems. They measure the difference between predicted probabilities and actual class labels, encouraging the model to produce accurate probability distributions. The main difference between them lies in how they handle label representations.


Cross-Entropy Loss

Sparse Cross-Entropy Loss

Use Case

Multiclass classification with one-hot encoding

Multiclass classification with integer encoding

Label Representation

One-hot encoded vectors

Integer-encoded labels





Exclusive classes where each instance belongs to only one class

Instances belong to only one class, and class labels are represented as integers

Computational Efficiency

May be less computationally efficient, especially with large datasets

Generally more computationally efficient, suitable for large datasets

TensorFlow/Keras Usage




Q5. Difference and correlation between Prediction and Estimation. 

Ans: Difference:

1. Prediction:

  • Goal: The main goal of prediction is to forecast or anticipate future outcomes based on available data.
  • Focus: It focuses on making accurate and precise predictions about unknown or future values.
  • Example: In machine learning, building a predictive model to forecast stock prices based on historical data is an example of prediction.
  • Evaluation: Prediction models are often evaluated based on metrics such as accuracy, mean squared error, or other measures of prediction error.

2. Estimation:

  • Goal: The primary goal of estimation is to infer or determine the likely value of a parameter or a characteristic of a population based on observed data.
  • Focus: It focuses on determining the best possible estimate of a parameter, such as the mean or variance of a population.
  • Example: Estimating the average income of a population based on a sample of survey data is an example of estimation.
  • Evaluation: Estimation methods are often evaluated based on properties like unbiasedness, efficiency, or consistency.

In summary, prediction is concerned with making future forecasts or guesses based on available data, while estimation is focused on determining the likely value of a parameter or characteristic of a population based on observed data.


Correlation is a statistical measure that describes the degree of association between two variables. It quantifies the strength and direction of a linear relationship between two variables. The correlation coefficient, often denoted by r, ranges from -1 to 1:

  • r=1: Perfect positive correlation
  • r=−1: Perfect negative correlation
  • r=0: No linear correlation

Key Points:

  • Correlation does not imply causation. Even if two variables are correlated, it does not necessarily mean that one causes the other.
  • Correlation is sensitive to the scale of measurement of the variables.

In summary, correlation provides a measure of the strength and direction of a linear relationship between two variables, while prediction and estimation refer to different goals in statistical modeling: predicting future outcomes and estimating population parameters, respectively.


Q6. What type of regularizations are used in ANI?

Ans: Regularization techniques are commonly employed in artificial neural networks (ANNs) to prevent overfitting, improve generalization, and enhance the model's performance on unseen data. Several regularization methods are used in ANNs, and some of the key ones include:

1. L1 Regularization (Lasso):

  • Objective: Adds a penalty term proportional to the absolute values of the weights to the loss function.
  • Effect: Encourages sparsity in the weight matrix, leading some weights to become exactly zero. This can be useful for feature selection.

2. L2 Regularization (Ridge):

  • Objective: Adds a penalty term proportional to the squared values of the weights to the loss function.
  • Effect: Discourages overly large weights and helps prevent the dominance of a small number of features. It is the most commonly used regularization technique in neural networks.

            3. Dropout:

  • Technique: Randomly drops (sets to zero) a fraction of input units (neurons) during training.
  • Effect: Introduces redundancy by preventing specific neurons from learning co-dependencies. This helps prevent overfitting and improves generalization.

            4. Early Stopping:

  • Technique: Monitors the model's performance on a validation set during training and stops training when the performance starts to degrade.
  • Effect: Prevents the model from learning the noise in the training data, as continued training might lead to overfitting.

            5. Batch Normalization:

  • Technique: Normalizes the inputs of each layer to have zero mean and unit variance.
  • Effect: Mitigates the internal covariate shift problem, making training more stable and reducing the dependence on initialization.

            6. Weight Constraints:

  • Technique: Constrains the magnitude of weights during training.
  • Effect: Helps prevent large weight values that might lead to overfitting.

            7. Data Augmentation:

  • Technique: Introduces variations in the training data by applying transformations such as rotation, scaling, or cropping.
  • Effect: Increases the diversity of the training set, making the model more robust and less prone to overfitting.

            8. DropConnect:

  • Technique: Similar to dropout but extends to connections (weights) rather than nodes (neurons).
  • Effect: Randomly sets a fraction of weights to zero during training, promoting redundancy and preventing overfitting.

The choice of regularization technique depends on the specific characteristics of the data and the neural network architecture. Often, a combination of these techniques is used to achieve better generalization performance.


Q7. What is Partitioning? What are the Partitioning methods and partitioning criteria?

Ans: In the context of data analysis and machine learning, partitioning refers to the division of a dataset into subsets or partitions based on certain criteria. The purpose of partitioning is to organize and structure the data in a way that facilitates analysis, model training, and evaluation. Partitioning is commonly used in various tasks such as training/testing split, cross-validation, and dataset preprocessing.

Partitioning Methods:

1. Training/Testing Split:

  • Description: The dataset is divided into two parts: a training set used to train the model, and a testing set used to evaluate its performance on unseen data.
  • Use Case: Commonly used in supervised learning to assess how well a trained model generalizes to new, unseen examples.

2. K-Fold Cross-Validation:

  • Description: The dataset is divided into k equally sized folds. The model is trained k times, each time using k-1 folds for training and the remaining fold for validation.
  • Use Case: Provides a more robust estimate of a model's performance by using different subsets of the data for training and validation in each iteration.

3. Stratified Sampling:

  • Description: The dataset is divided into partitions while ensuring that each partition maintains the same distribution of target classes as the original dataset.
  • Use Case: Important for imbalanced datasets where certain classes may be underrepresented. Ensures that each subset used for training/testing reflects the overall class distribution.

4. Temporal Split:

  • Description: Data is partitioned based on time, with earlier time periods used for training and later time periods used for testing.
  • Use Case: Common in time-series analysis to assess a model's ability to make predictions on future data based on past observations.

5. Random Sampling:

  • Description: Data is randomly divided into partitions without considering any specific order or structure.
  • Use Case: Useful when the data doesn't exhibit specific patterns or when randomization is desired to create diverse subsets.

Partitioning Criteria:

The choice of a partitioning method depends on the specific goals of the analysis, the characteristics of the dataset, and the nature of the problem being addressed. Key criteria for selecting a partitioning method include:

1. Representativity:

  • Ensures that each partition is representative of the overall dataset in terms of its distribution and characteristics.

2. Generalization:

  • A good partitioning method should allow the model to generalize well to new, unseen data, providing a reliable estimate of its performance.

3. Avoiding Data Leakage:

  • Prevents information from the testing set influencing the training process, ensuring the model's evaluation reflects its ability to handle truly unseen examples.

4. Statistical Significance:

  • Ensures that the performance estimates obtained through partitioning are statistically significant and reliable.

The appropriate partitioning method and criteria depend on the specific context and objectives of the analysis or machine learning task.


Q8. What is a trigger? Provide an example of usage.

Ans: In various contexts, a "trigger" refers to an event or condition that initiates a particular action or process. Triggers are commonly used in programming, databases, and automation to respond to specific events or conditions.

Example in Database Management:

In the context of databases, a trigger is a set of instructions that are automatically executed ("triggered") in response to a specific event or condition occurring in the database. These events could include data changes, such as inserts, updates, or deletes.

Example of a database trigger:

-- Create a trigger that updates a timestamp when a row is updated in the 'employees' table
CREATE TRIGGER update_timestamp
SET NEW.last_updated = NOW();

In this example:

  • The trigger is named update_timestamp.
  • It is set to execute before an update operation (BEFORE UPDATE) on the employees table.
  • For each affected row (FOR EACH ROW), it updates the last_updated column with the current timestamp (NOW()).

So, every time a row in the employees table is updated, the last_updated column for that row is automatically set to the current timestamp.

Example in Programming:

In programming, triggers can be used in event-driven architectures to execute specific actions in response to events.

Example in Python using the tkinter library (a GUI library):

import tkinter as tk

def on_button_click():
    print("Button clicked!")

# Create a simple GUI with a button
root = tk.Tk()

button = tk.Button(root, text="Click me", command=on_button_click)


In this example:

  • A trigger-like behavior is achieved by associating the on_button_click function with the button's click event.
  • When the button is clicked, the on_button_click function is automatically executed, printing "Button clicked!" to the console.

These are just a couple of examples, and the concept of triggers is widely used in various fields to automate responses to specific events or conditions.

Add a comment: