Posts with the tag OpenML:

Hajada trials

Our story begins The story of Hajada is perhaps not an Arabian epic, but still, but just like one of Shecherezade’s stories, it’ll keep you, dear Reader, on your toes. Sometimes, the data is incomplete and we need to cope with that. There are many methods of so-called “imputation’’, i.e. filling gaps in the data. Hanna Zdulska, Jakub Kosterna, Dawid Przybyliński took an effort to compare those methods. Here’s what they found out. Challengers approach What kinds of imputation did they compare? The popular ones, you may know them or not, namely: “Bogo replace”, replacing with mode or median, “MICE” imputation method, “missForest” imputation and “VIM’s k-NN”.

Interaction between imputation and ML algorithms

TL;DR Lot of people would like to find the best method to impute data, that covers most of the cases, but from this article we will learn that the task of imputing missing data is not so trivial. It demands looking at a bigger picture, for example model type or percentage of missing data. Reading this article we will learn what algorithms to use in which cases and understand the vast problem of imputation. Introduction We have read an article about imputation techniques and their interaction with ML algorithms. It was written by Martyna Majchrzak, Agata Makarewicz, Jacek Wiśniewski. Before reading we were expecting to find out which imputation techniques are the best and how to use them.

Not so famous (yet!) Hajada and his results

Meet Hajada! Have you heard of the Indian mathematician Hajada? We started to think about it, having read the title of the article “The Hajada Imputation Test” - it sounded somehow familiar… But you probably haven’t had any contact with him, because not so long ago there was no such man. He was born by the authors of the test and the article, and his name comes from the first letters of their names. So what is his test? Hajada decided to study the effectiveness and time efficiency of various methods of dealing with missing data. He juxtaposed three simple (or even naive) methods such as deleting rows or inserting random values and three more sublime methods, including mice and missForest algorithms.