Python Selecting Training Set Programme Codechef Solution
Problem
You are given a dataset consisting of N items. Each item is a pair of a word and a boolean denoting whether the given word is a spam word or not.
We want to use this dataset for training our latest machine learning model. Thus, we want to choose some subset of this dataset as a training dataset. We want to make sure that there are no contradictions in our training set, i.e., there shouldn't be a word included in the training set that's marked both as spam and not-spam. For example, items {"fck", 1}, and "fck, 0"} can't be present in the training set because the first item says the word "fck" is spam, whereas the second item says it is not, which is a contradiction.
Your task is to select the maximum number of items in the training set.
Note that the same pair of {word, and bool} can appear multiple times in input. The training set can also contain the same pair multiple times.
Input
-
The first line will contain T, the number of test cases. Then the test cases follow.
-
The first line of each test case contains a single integer, N.
-
N-lines follow. For each valid i, the i-th of these lines contains a string wi, followed by a space, and an integer (boolean) si, denoting the i-th item.
Output
For each test case, output an integer corresponding to the maximum number of items that can be included in the training set in a single line.
Constraints
-
1<=T<=10
-
1<=N<=25, 000
-
1=|wi| <=5 for each valid i
-
1=si <=5 for each valid i
-
W1, w2,..., and wN contain only lowercase English letters.
Sample Input:
3
3
abc 0
abc 1
efg 1
7
fck 1
fck 0
fck 1
body 0
body 0
body 0
ram 0
5
vv 1
vv 0
vv 0
vv 1
vv 1
Sample Output:
2
6
3
Explanation
Example case 1: You can include either of the first and second items, but not both. The third item can also be taken. This way, the training set can contain at most 2 items.
Example Case 2: You can include all the items except the second item in the training set.
Solution:
try:
t=int(input(“Enter number of terms: “))
for i in range(t):
n=int(input(“Enter a number: “))
d={}
for j in range(n):
a,b=map(str,input(“Enter a string and a bool value i.e.,0 and 1: “).split(" "))
b=int(b)
if a not in d:
d[a]=[0,0]
d[a][b]+=1
sums=0
for j in d:
sums=sums+max(d[j])
print(sums)
except:
pass
Steps to solve this problem:
-
In the try block, ask the user to enter a number of terms and store them in
-
In the loop, ask the user to enter a number and store it in
-
Create an empty dictionary.
-
In a nested loop, ask the user to enter multiple string and bool values, and using the map() function, get iterator objects and store them in variables a and b.
-
Convert b into an int type using the int() function.
-
Now check if a is not present in the dictionary d, update the dictionary by value [0,0], or move one step ahead.
-
Initialise a variable sum with 0.
-
In the loop, add sums with dictionary d.
-
Print the value of sums.
-
In the except block, we just left it empty, so we used the pass statement in it.