Posts with the tag Missing data:
The weather in Grantue wasn’t the most pleasant that day. Mostly misty, but here and there it was raining. No-one could be seen on the streets, though it had more to do with the recent outbreak of a disease and the general reluctance of the Grantue citizens to go out unless absolutely necessary. But I, I actually liked that high humidity, lack of wind and vague temperature.
And there I was, with soaked glasses in the pockets, returning home from the meeting that could have taken place on a videoconference but didn’t. I was angry, but every droplet of rain reduced my anger a bit and when I finally got home, I was already calmed, leaving a path of angry droplets behind me.
Imputing missing data for a classification problem Authors: Karol Saputa, Małgorzata Wachulec, Aleksandra Wichrowska (Warsaw University of Technology)
As students of the same university course, we were asked to sum up the findings of our colleges, the authors of the Default imputation efficiency comparison article. In their work, they used many missing data imputation techniques on 11 datasets, on which they then run different classification algorithms. By measuring the results obtained using these imputation algorithms they could judge their performance. But first:
What is data imputation? Some datasets have missing values that many classification algorithms cannot handle. One way to make the algorithm work is to delete the observations that include missing data or, if missing values come just from a few columns, we can delete them instead.
TL;DR Lot of people would like to find the best method to impute data, that covers most of the cases, but from this article we will learn that the task of imputing missing data is not so trivial. It demands looking at a bigger picture, for example model type or percentage of missing data. Reading this article we will learn what algorithms to use in which cases and understand the vast problem of imputation.
Introduction We have read an article about imputation techniques and their interaction with ML algorithms. It was written by Martyna Majchrzak, Agata Makarewicz, Jacek Wiśniewski. Before reading we were expecting to find out which imputation techniques are the best and how to use them.