I have a data set which has "Speed" as one of the columns (features). The column contains both zeros and non-zero values. I want to randomly set 10% of the non-zero values to zeros. This will change the corresponding class label to be zeros. I mean any value that is set to zero, its corresponding class value will be zero. I have done this but it is give me errors below the error.
file_path = 'Processed_data/data1.csv'
df = pd.read_csv(file_path)
per_change = 0.1
attr = 'Speed'
target = 'Class'
df_spd = df[df['Speed'] > 0.]
num_rows_to_change = int(df.shape[0] * per_change)
num_with_zero_initial = df[df[attr] == 0].shape[0]
assert df_spd.shape[0] > num_rows_to_change, \
'Number of rows with non-zero speed is less than 10% of the original dataset.'
df_update = df_spd.sample(num_rows_to_change)
df_update[attr] = 0.
df_update[target] = 0.
df.update(df_update)
update_list = df_update.index.tolist()
num_with_zero_final = df[df['Speed'] == 0].shape[0]
assert num_with_zero_final == num_with_zero_initial + num_rows_to_change, \
'Number of rows needed to change not equal to number of rows changed.'
df.to_csv('changed.csv')
AssertionError
Traceback (most recent call last)
<ipython-input-11-f93535705bac> in <module>
1 assert num_with_zero_final == num_with_zero_initial + num_rows_to_change, \
----> 2 'Number of rows needed to change not equal to number of rows changed.'
AssertionError: Number of rows needed to change not equal to number of rows changed.