Posts with the tag Machine Learning:

How should I impute? – imputation techniques comparison.

Why impute? When people start their journey with machine learning and data analysis, they show a lot of enthusiasm and desire to learn and create. As they progress, they encounter many obstacles, that may strip them of their positive attitude. One example of such obstacles is missing data in the dataset they’re working on. Authors of the article titled “Imputation techniques’ comparison in R programming language” formulated three main problems that come with missing values – substantial amount of trained model’s bias, reduction in data analysis efficiency and inability to use many machine learning models, that were not adjusted to handle missing data.

Data for Takeaway: A Quest for the Missing Data

The weather in Grantue wasn’t the most pleasant that day. Mostly misty, but here and there it was raining. No-one could be seen on the streets, though it had more to do with the recent outbreak of a disease and the general reluctance of the Grantue citizens to go out unless absolutely necessary. But I, I actually liked that high humidity, lack of wind and vague temperature. And there I was, with soaked glasses in the pockets, returning home from the meeting that could have taken place on a videoconference but didn’t. I was angry, but every droplet of rain reduced my anger a bit and when I finally got home, I was already calmed, leaving a path of angry droplets behind me.

Black-box VS White-box Duel

Prepare to fight The interpretability of machine learning models is gaining more and more interest in the scientific world. It’s because artificial intelligence is used in a lot of business solutions that impact our everyday life. Knowledge how the model works can, among others, assure us about the safety of the implemented solution. We came across the article of students of the Warsaw University of Technology Wojciech Bogucki, Tomasz Makowski, Dominik Rafacz titled “Predicting code defects using interpretable static measures.” touching this topic. Black box vs white box Using interpretable models, such as linear regression, decision trees and k-nearest neighbors is one way to have your solution explainable.

Interaction between imputation and ML algorithms

TL;DR Lot of people would like to find the best method to impute data, that covers most of the cases, but from this article we will learn that the task of imputing missing data is not so trivial. It demands looking at a bigger picture, for example model type or percentage of missing data. Reading this article we will learn what algorithms to use in which cases and understand the vast problem of imputation. Introduction We have read an article about imputation techniques and their interaction with ML algorithms. It was written by Martyna Majchrzak, Agata Makarewicz, Jacek Wiśniewski. Before reading we were expecting to find out which imputation techniques are the best and how to use them.