Posts with the tag dataset:

Hajada trials

Our story begins The story of Hajada is perhaps not an Arabian epic, but still, but just like one of Shecherezade’s stories, it’ll keep you, dear Reader, on your toes. Sometimes, the data is incomplete and we need to cope with that. There are many methods of so-called “imputation’’, i.e. filling gaps in the data. Hanna Zdulska, Jakub Kosterna, Dawid Przybyliński took an effort to compare those methods. Here’s what they found out. Challengers approach What kinds of imputation did they compare? The popular ones, you may know them or not, namely: “Bogo replace”, replacing with mode or median, “MICE” imputation method, “missForest” imputation and “VIM’s k-NN”.

Not so famous (yet!) Hajada and his results

Meet Hajada! Have you heard of the Indian mathematician Hajada? We started to think about it, having read the title of the article “The Hajada Imputation Test” - it sounded somehow familiar… But you probably haven’t had any contact with him, because not so long ago there was no such man. He was born by the authors of the test and the article, and his name comes from the first letters of their names. So what is his test? Hajada decided to study the effectiveness and time efficiency of various methods of dealing with missing data. He juxtaposed three simple (or even naive) methods such as deleting rows or inserting random values and three more sublime methods, including mice and missForest algorithms.