Binary Missing Value Imputation
A few datasets that I’ve seen have come with several different columns representing binary responses to questions. Naturally, there are missing values scattered throughout, so some amount of imputation had to occur. I decided to try coding up a way to do this by picking the mode of rows that were as similar as possible to the row with missing values.
The data being considered is a set of binary response variables to something like yes-no survey questions, not for something like one-hot columns. This method is also just generally very slow, so it’s not recommended for much – it just seemed like an interesting experiment.