6.5 Rashomon sets of in-hospital mortality prediction random forest models

Authors: Jeugeniusz Winiczenko, Mikolaj Malec, Patryk Wrona (Warsaw University of Techcnology)

6.5.1 Abstract

The concept of the Rashomon set is gaining more and more popularity in the machine learning world. However, the most efficient ways of building and analyzing such sets are yet to be discovered. The main aim of this study was to develop several approaches to creating Rashomon sets, examining their characteristics, and using them for further predictions. In this article, the results of Rashomon sets obtained from the group of random forest classifiers trained for in-hospital mortality prediction task on physiological time-series and medical histories from the Medical Information Mart for Intensive Care (MIMIC-III) are presented.

6.5.2 Introduction

The main goals of this study were to check if * Rashomon sets can be better at predictions than single best models, * the way of obtaining predictions from Rashomon set has any impact on their performance,* Rashomon sets that consist of different models performs better than those with high performance but similar models.

In this study, Rashomon set concept with top performance or the top most different random forest classifiers was used. These classifiers were trained for the in-hospital mortality prediction tasks on two datasets: the first one containing only physiological time-series and the second one containing both physiological time-series and medical histories. Both datasets were created from preprocessed data from the MIMIC-III database.

To apply the created Rashomon sets for predicting, several techniques such as mean, median, weighted mean, etc. of votes of classifiers in each set were used. For sets’ prediction assessment area under the receiver operating characteristic curve(AUC) was used. One of the main goals of this study was also to check if Rashomon sets that are formed from different models perform better than those where models were similar. Different models were defined as those which have quite different sets of important variables. Different models which pay attention to different features were thought to be better at predicting just as the team of experts where each knows a different field would be. To verify this aspect, analysis of feature importance plots for each model from each Rashomon set were conducted, and models with different important features were united into sets.

Furthermore, for better accommodation with the article, its structure is provided:

  1. Abstract

  2. Introduction

  3. Related Work

  4. MIMIC-III dataset - contains a description of both training datasets and their origin

  5. Rashomon sets - contains a description of performance results of best models, voting Rashomon sets and methods of voting

  6. Results - contains a summary of the most interesting results of this study

  7. Conclusion.

6.5.4 MIMIC-III Dataset

MIMIC III Clinical Database is a large database comprising de-identified health-related data associated with tens of patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. This database consists of 26 tables issued from different measurements during patients’ stays in the hospital. The preprocessing of these tables was conducted just like in (Tang et al. 2018).

6.5.4.1 X48 Variable Set

This set consisted of 27616 observations having 76 predictive variables in total. It was created from the icu_stay.csv file and is in fact a preprocessed raw MIMIC table according to the reproduced article. It contains averaged statistics of 48-hour patients’ measurements (heart rate etc.). For each measurement maximal value, averaged value, minimum value, and standard deviation are denoted as separate variables.

6.5.4.2 W48 Variable Set

This set consisted of 27616 observations having 276 predictive variables in total and was created from icu_stay.csv and d_icd_diagnoses.csv files and was in fact a preprocessed raw MIMIC tables according to the reproduced article. Just like X48, it contains averaged statistics of 48-hour patients’ measurements (heart rate, etc.) but is also combined with diagnosis histories. It is the combination of the X48 variable set and w2v embedding of medical events of all ICD-9 group codes.

6.5.5 Rashomon Sets

Rashomon sets are sets of machine learning models performing especially well in the task of predicting in-hospital mortality. They can be chosen using a given criterion or metric. In this work, Rashomon sets were created using the Area Under ROC Curve metric, but also by finding the most different treating of predicting variables. The first Rashomon set was named ‘best AUC models’ and the second Rashomon set was called ‘experts’ because of their expertise on different predictors.

To find such best models, 100 models were trained for each dataset using 3-fold cross-validation. Each time the validation set consisted of 20% of full data and the testing set was made of 10% of full data. We tried to verify how AUC changes depending on the amount of training data and the number of models included in a Rashomon set to find the most optimal number of models in such a set. The training was performed on 1%, 5%, 10%, 30%, and 70% of observations in the dataset. Furthermore, from the mentioned 100 models, 6 models with the highest AUC value were the best AUC models and 6 models with the most different variable importance were chosen for the ‘experts’ set. With these pairs of 6 models for 2 datasets(4 model sets in total) variable importance plots were checked to compare how it changes across different variable sets(X48 and W48) and different approaches of choosing the Rashomon set(best AUC or most different variable importance). We give the hyperparameters of these models in the next paragraphs.

6.5.5.1 Sets of best AUC models

6 models in rashomon sets built according to the AUC value. Hiperparameters of these models were found among best 20 models from 3-fold cross-validation and are given: For X48 Variable Set:

Table 6.1: Hiperparameters for best AUC models of X48 dataset
param_n_estimators param_min_samples_split param_min_samples_leaf param_max_features param_max_depth param_bootstrap
1400 2 4 sqrt 80 FALSE
2000 5 4 sqrt NaN FALSE
1200 2 4 sqrt 70 FALSE
600 2 4 sqrt 30 FALSE
1200 10 4 sqrt 20 FALSE
800 5 4 sqrt 30 FALSE

For W48 Variable Set:

Table 6.2: Hiperparameters for best AUC models of W48 dataset
param_n_estimators param_min_samples_split param_min_samples_leaf param_max_features param_max_depth param_bootstrap
1800 2 2 sqrt 60 FALSE
1600 2 4 sqrt 110 FALSE
1600 10 4 sqrt 40 FALSE
600 10 4 sqrt 90 FALSE
1200 10 2 sqrt 70 FALSE
800 2 1 sqrt 20 FALSE

6.5.5.2 Sets of experts

6 rashomon sets built according to the most different important variables so that each of them treasure other sets of predictors. Hiperparameters of these models were found among best 20 models from 3-fold cross-validation and are given:

For X48 Variable Set:

Table 6.3: Hiperparameters for expert models of X48 dataset
param_n_estimators param_min_samples_split param_min_samples_leaf param_max_features param_max_depth param_bootstrap
1400 2 4 sqrt 80 FALSE
1400 5 2 sqrt NaN FALSE
600 2 2 sqrt 110 FALSE
800 2 2 sqrt 50 FALSE
800 10 2 sqrt 30 FALSE
1000 5 2 sqrt 100 FALSE

For W48 Variable Set:

Table 6.4: Hiperparameters for expert models of W48 dataset
param_n_estimators param_min_samples_split param_min_samples_leaf param_max_features param_max_depth param_bootstrap
1800 2 2 sqrt 60 FALSE
1600 2 4 sqrt 110 FALSE
400 10 1 sqrt 60 FALSE
200 10 2 sqrt 50 TRUE
2000 5 2 sqrt 10 TRUE
200 5 2 sqrt 10 TRUE

6.5.5.3 Methods of voting

As an experiment, we used 7 methods of voting in each Rashomon set. We wanted to find out if there is any significant difference between them. Methods of voting were as follows:

  1. mean predictions of models

  2. median predictions of models

  3. mean predictions of models with weights equal to the score of the model

  4. mean predictions of models with weights equal to the score of the model transformed into the interval [0,1]

  5. mean predictions of models with weights equal to the score of the model transformed into the interval [1,2]

  6. mean predictions of models with weights equal to the rank of the model transformed into the interval [0,1], with 1 being the weight of the best model

  7. mean predictions of models with weights equal to the rank of the model transformed into the interval [1,2], with 2 being the weight of the best model

Our results showed that all voting mechanisms behaved more or less likewise, with simple mean and median being in our best method to create a prediction for the group. It is also worth noticing that transforming score or rank to the interval [0,1] doesn’t take into account the worst models, so they are not the best way for efficient voting.

6.5.6 Results

6.5.6.1 Number of models in Rashomon set - influence on AUC

Below are presented the results of the top performance model sets for different cardinalities of those sets.

As one may notice there are no significant differences in AUC values between different cardinalities of the Rashomon sets and datasets on which models were trained. The best number of models that one can deduce from this figure is in the range from 5 to 10 models in a Rashomon set. As we mentioned earlier all voting strategies performed more or less the same.

6.5.6.2 Variable Importance PlotsIn this section, the analysis of feature importance plots is performed.

6.5.6.2.1 Best AUC Rashomon Sets

Below feature importance plots for 6 models with the highest AUC are presented.

For X48 dataset:

There is no noticeable difference between important features for the X48 dataset.

For W48 dataset:

Of course, there is a noticeable difference between the most important variables across models trained on different datasets, even among variables comprised solely in the X48 dataset( having numbers up to 76). Interestingly, variable 46(mean_inr) has lost its’ dominance in the W48 dataset.

6.5.6.2.2 Experts Rashomon Sets

Below are presented feature importance plots for models with possibly most different important features.

The top 2 or 3 important features usually remain the same across models trained on the same dataset. All expert models emphasize the influence of X48 variables. Moreover, in 3 cases (out of 6 models) variables being less important than the 46(mean_inr) variable now gained importance. These variables are 27(model 2) and 56(model 4 & model 5).

6.5.6.3 Voting in mortality prediction

Below results of Rashomon experts sets for different voting strategies are presented.

Sets that were created from models trained on the W48 dataset demonstrate significantly better AUC values than those trained on the X48 dataset. Unfortunately, expert sets did not show any better results than sets of top AUC models (denoted as a vertical line in these plots).

6.5.7 Conclusion

In this article 2 different ways of creating Rashomon sets were discussed:

  • choose first n of top performance models
  • choose first n of most different models.

In addition to that, several voting strategies of creating Rashomon sets for further predictions were tried out. During this whole work, the set of most important features of MIMIC-III for the mortality prediction task was discovered, which also may be useful for further researches or could give rise to new medical conclusions.

Summing the results of all experiments up, one can conclude that Rashomon expert sets are worth the attention of researchers even though in this study they have slightly underperformed top performance model sets. Because of this result, we suggest there also be no bigger difference in the performance of voting strategies that were presented, and this may be the point to inventing and testing new strategies by further researchers. Furthermore, adding new variables to a model, just like adding new variables to the X48 variable set, may cause the old variables to lower their importance on the output of models among the Rashomon sets.

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., et al., et al. (2015). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. USENIX conference on Operating Systems Design and Implementation. https://dl.acm.org/doi/10.5555/3026877.3026899
Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
Ahmad, M. W., Mourshed, M., & Rezgui, Y. (2017). Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy and Buildings, 147, 77--89. https://doi.org/10.1016/j.enbuild.2017.04.038
Aivodji, U., Arai, H., Fortineau, O., Gambs, S., Hara, S., & Tapp, A. (2019). Fairwashing: The risk of rationalization. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th international conference on machine learning (Vol. 97, pp. 161–170). PMLR. http://proceedings.mlr.press/v97/aivodji19a.html
Alkahtani, A. S., & Jilani, M. (2019). Predicting return donor and analyzing blood donation time series using data mining techniques. International Journal of Advanced Computer Science and Applications, 10(8). https://doi.org/10.14569/IJACSA.2019.0100816
Al-Shawwa, M., Abu-Naser, S., & Nasser, I. (2019). Developing artificial neural network for predicting mobile phone price range (Vol. 3, pp. 1–6).
Andriawan, Z. A., Purnama, S. R., Darmawan, A. S., Ricko, Wibowo, A., Sugiharto, A., & Wijayanto, F. (2020). Prediction of hotel booking cancellation using CRISP-DM. In 2020 4th international conference on informatics and computational sciences (ICICoS) (pp. 1–6). https://doi.org/10.1109/ICICoS51170.2020.9299011
Anthimopoulos, M., Christodoulidis, S., Ebner, L., Geiser, T., Christe, A., & Mougiakakou, S. (2019). Semantic segmentation of pathological lung tissue with dilated fully convolutional networks. IEEE Journal of Biomedical and Health Informatics, 23(2), 714–722. https://doi.org/10.1109/JBHI.2018.2818620
Antonio, N., de Almeida, A., & Nunes, L. (2017). Predicting hotel booking cancellations to decrease uncertainty and increase revenue. Tourism & Management Studies, 13(2), 25–39. https://doi.org/10.18089/tms.2017.13203
Antonio, N., de Almeida, A., & Nunes, L. (2019a). An automated machine learning based decision support system to predict hotel booking cancellations. Data Science Journal, 18(1), 1–20. https://doi.org/10.5334/dsj-2019-032
Antonio, N., de Almeida, A., & Nunes, L. (2019b). Hotel booking demand datasets. Data in Brief, 22, 41–49. https://doi.org/10.1016/j.dib.2018.11.126
Apley, D. (2018). ALEPlot: Accumulated Local Effects (ALE) Plots and Partial Dependence (PD) Plots. https://CRAN.R-project.org/package=ALEPlot
Apley, D. W., & Zhu, J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society Series B, 82(4), 1059–1086. https://doi.org/10.1111/rssb.12377
Arik, S. O., & Pfister, T. (2021). TabNet: Attentive Interpretable Tabular Learning. AAAI Conference on Artificial Intelligence (AAAI). https://arxiv.org/abs/1908.07442
Arrieta, A. B., Dı́az-Rodrı́guez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., et al., et al. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012
Arsad, P. M., Buniyamin, N., & Manan, J. A. (2013). Prediction of engineering students’ academic performance using Artificial Neural Network and Linear Regression: A comparison. ICEED. https://doi.org/10.1109/iceed.2013.6908300
Asadi-Aghbolaghi, M., Azad, R., Fathy, M., & Escalera, S. (2020). Multi-level context gating of embedded collective knowledge for medical image segmentation. https://arxiv.org/abs/2003.05056
Associacion, G. (2020). The Mobile Economy. GSM Associacion. https://www.gsma.com/mobileeconomy/wp-content/uploads/2020/03/GSMA_MobileEconomy2020_Global.pdf
Azad, R., Asadi-Aghbolaghi, M., Fathy, M., & Escalera, S. (2019). Bi-directional ConvLSTM u-net with densley connected convolutions. In 2019 IEEE/CVF international conference on computer vision workshop (ICCVW) (pp. 406–415). https://doi.org/10.1109/ICCVW.2019.00052
Bahel, D., Ghosh, P., Sarkar, A., Lanham, M. A., & Lafayette, W. (2017). Predicting blood donations using machine learning techniques. http://matthewalanham.com/Students/2017_MWDSI_Final_Bahel.pdf
Baker, M. (2016). Reproducibility crisis. Nature, 533(26), 353–66.
Baniecki, H., Kretowicz, W., Piatyszek, P., Wisniewski, J., & Biecek, P. (2020a). dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python. arXiv:2012.14406. https://arxiv.org/abs/2012.14406
Baniecki, H., Kretowicz, W., Piatyszek, P., Wisniewski, J., & Biecek, P. (2020b). dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python. arXiv:2012.14406. https://arxiv.org/abs/2012.14406
Barda, N., Riesel, D., Akriv, A., Levy, J., Finkel, U., Yona, G., et al. (2020). Developing a COVID-19 mortality risk prediction model when individual-level data are not available. Nature Communications, 11. https://doi.org/10.1038/s41467-020-18297-9
Barish, M., Bolourani, S., Lau, L. F., Shah, S., & Zanos, T. P. (2020). External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19. Nature Machine Intelligence, 3, 25–27. https://doi.org/10.1038/s42256-020-00254-2
Barish, M., Bolourani, S., Lau, L. F., Shah, S., & Zanos, T. P. (2021). External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19. Nature Machine Intelligence, 3(1), 25–27. https://doi.org/10.1038/s42256-020-00254-2
Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., et al. (2020c). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012
Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., et al. (2020b). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. http://www.sciencedirect.com/science/article/pii/S1566253519308103
Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., et al. (2020a). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. http://www.sciencedirect.com/science/article/pii/S1566253519308103
Barsoum, E., Zhang, C., Ferrer, C. C., & Zhang, Z. (2016). Training deep networks for facial expression recognition with crowd-sourced label distribution. In Proceedings of the 18th ACM international conference on multimodal interaction (pp. 279–283). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2993148.2993165
Belke, A., & Keil, J. (2017). Fundamental determinants of real estate prices: A panel study of german regions, (731). Ruhr Economic Papers. https://doi.org/10.4419/86788851
Bello-Chavolla, O. Y., Bahena-López, J. P., Antonio-Villa, N. E., Vargas-Vázquez, A., González-Díaz, A., Márquez-Salinas, A., et al. (2020). Predicting Mortality Due to SARS-CoV-2: A Mechanistic Score Relating Obesity and Diabetes to COVID-19 Outcomes in Mexico. The Journal of Clinical Endocrinology & Metabolism, 105, 2752--2761. https://doi.org/10.1210/clinem/dgaa346
Berk, R., Heidari, H., Jabbari, S., Kearns, M., & Roth, A. (2017). Fairness in Criminal Justice Risk Assessments: The State of the Art. Sociological Methods & Research. https://doi.org/10.1177/0049124118782533
Biecek, Przemyslaw. (2018b). DALEX: Explainers for Complex Predictive Models in R. Journal of Machine Learning Research, 19(84), 1–5. http://jmlr.org/papers/v19/18-416.html
Biecek, Przemyslaw. (2018c). DALEX: Explainers for Complex Predictive Models in R. Journal of Machine Learning Research, 19(84), 1–5. https://jmlr.org/papers/v19/18-416.html
Biecek, Przemysław. (2018). DALEX: Explainers for complex predictive models in r. The Journal of Machine Learning Research, 19(1), 3245–3249.
Biecek, Przemyslaw. (2018a). DALEX: Explainers for Complex Predictive Models in R. Journal of Machine Learning Research, 19(84), 1–5. http://jmlr.org/papers/v19/18-416.html
Biecek, Przemyslaw, & Burzykowski, T. (n.d.). https://ema.drwhy.ai/introduction.html
Biecek, Przemyslaw, & Burzykowski, T. (2021a). Explanatory Model Analysis. Chapman; Hall/CRC, New York. https://pbiecek.github.io/ema/
Biecek, Przemyslaw, & Burzykowski, T. (2021b). Explanatory model analysis: Explore, explain, and examine predictive models. CRC Press.
Biecek, Przemyslaw, Maksymiuk, S., & Baniecki, H. (2021). DALEX: moDel Agnostic Language for Exploration and eXplanation . https://CRAN.R-project.org/package=DALEX
Bird, S., Dudík, M., Edgar, R., Horn, B., Lutz, R., Milan, V., et al. (2020). Fairlearn: A toolkit for assessing and improving fairness in AI (No. MSR-TR-2020-32). Microsoft. https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/
Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., et al. (2016b). mlr: Machine learning in r. Journal of Machine Learning Research, 17(170), 1–5. https://jmlr.org/papers/v17/15-066.html
Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., et al. (2016a). mlr: Machine Learning in R. Journal of Machine Learning Research, 17(170), 1–5. http://jmlr.org/papers/v17/15-066.html
Bisong, E. (2019). Google colaboratory. In Building machine learning and deep learning models on google cloud platform (pp. 59–64). Springer.
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn., 30(7), 1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2
Breiman, L. (1999). Random forests. UC Berkeley TR567.
Breiman, L. et al. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3), 199–231.
Browne, M. W. (2000). Cross-validation methods. Journal of mathematical psychology, 44(1), 108–132. https://doi.org/10.1006/jmps.1999.1279
Calvert, J., Mao, Q., Rogers, A. J., Barton, C., Jay, M., Desautels, T., et al. (2016). A computational approach to mortality prediction of alcohol use disorder inpatients. Computers in Biology and Medicine, 75, 74–79. https://doi.org/https://doi.org/10.1016/j.compbiomed.2016.05.015
Campbell-Kelly, M., Aspray, W., Ensmenger, N., & Yost, J. R. (2018). Computer: A history of the information machine. Routledge. https://doi.org/10.4324/9780429495373
Can, A. (1990). The measurement of neighborhood dynamics in urban house prices. Economic Geography, 66(3), 254–272. https://doi.org/10.2307/143400
Cao, Y., Liu, X., Xiong, L., & Cai, K. (2020). Imaging and clinical features of patients with 2019 novel coronavirus SARS-CoV-2: A systematic review and meta-analysis. Journal of Medical Virology, 92(9), 1449–1459. https://doi.org/10.1002/jmv.25822
Casadevall, A., & Fang, F. C. (2010). Reproducible science. Infection and Immunity, 78(12), 4972–4975. https://doi.org/10.1128/IAI.00908-10
Chen, T., & Guestrin, C. (2016a). XGBoost: A Scalable Tree Boosting System. International Conference on Knowledge Discovery and Data Mining (KDD). https://doi.org/10.1145/2939672.2939785
Chen, T., & Guestrin, C. (2016b). XGBoost: A Scalable Tree Boosting System. KDD. https://doi.org/https://doi.org/10.1145/2939672.2939785
Chollet, F. (2017). Deep learning with python. Manning.
Chouldechova, A. (2016). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, 5. https://doi.org/10.1089/big.2016.0047
Chow, J. C. K. (2017, June). Analysis of Financial Credit Risk Using Machine Learning (Master’s thesis). Aston University. Retrieved from https://www.researchgate.net/publication/318959365_Analysis_of_Financial_Credit_Risk_Using_Machine_Learning
Chowdhury, M. E. H., Rahman, T., Khandakar, A., Mazhar, R., Kadir, M. A., Mahbub, Z. B., et al. (2020). Can AI help in screening viral and COVID-19 pneumonia? IEEE Access, 8, 132665–132676. https://doi.org/10.1109/ACCESS.2020.3010287
Christe, A. A. M. D., Andreas MD∗; Peters. (2019). Computer-aided diagnosis of pulmonary fibrosis using deep learning and CT images. Investigative Radiology, 54, 627–632. https://doi.org/10.1097/RLI.0000000000000574
Chu, X., Ilyas, I. F., Krishnan, S., & Wang, J. (2016). Data cleaning: Overview and emerging challenges. In Proceedings of the 2016 international conference on management of data (pp. 2201–2206).
Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., et al. (2013). The cancer imaging archive (TCIA): Maintaining and operating a public information repository. Journal of Digital Imaging, 26(6), 1045–1057. https://doi.org/10.1007/s10278-013-9622-7
Code of Federal Regulations. (1978). SECTION 4D, UNIFORM GUIDELINES ON EMPLOYEE SELECTION PROCEDURES (1978). https://www.govinfo.gov/content/pkg/CFR-2014-title29-vol4/xml/CFR-2014-title29-vol4-part1607.xml
Cohen, J. P., Morrison, P., Dao, L., Roth, K., Duong, T. Q., & Ghassemi, M. (2020a). COVID-19 image data collection: Prospective predictions are the future. arXiv 2006.11988. https://github.com/ieee8023/covid-chestxray-dataset
Cohen, J. P., Morrison, P., Dao, L., Roth, K., Duong, T. Q., & Ghassemi, M. (2020b). COVID-19 image data collection: Prospective predictions are the future. arXiv 2006.11988. https://github.com/ieee8023/covid-chestxray-dataset
Computing Machinery, A. for. (2018). Artifact review and badging. https://www.acm.org/publications/policies/artifact-review badging
Conway, J. (2018, January). Artificial Intelligence and Machine Learning : Current Applications in Real Estate (PhD thesis). Retrieved from https://dspace.mit.edu/bitstream/handle/1721.1/120609/1088413444-MIT.pdf
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., & Huq, A. (2017). Algorithmic Decision Making and the Cost of Fairness. https://doi.org/10.1145/3097983.3098095
Culp, M., Johnson, K., & Michailidis, G. (2016). Ada: The r package ada for stochastic boosting. https://CRAN.R-project.org/package=ada
Darwiche, M., Feuilloy, M., Bousaleh, G., & Schang, D. (2010). Prediction of blood transfusion donation, 51–56. https://doi.org/10.1109/RCIS.2010.5507363
Das, S., Cashman, D., Chang, R., & Endert, A. (2019). BEAMES: Interactive multimodel steering, selection, and inspection for regression tasks. IEEE Computer Graphics and Applications, 39(5), 20–32. https://doi.org/10.1109/MCG.2019.2922592
Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In Proceedings of the 23rd international conference on machine learning (pp. 233–240). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/1143844.1143874
Deo, R. C. (2015). Machine learning in medicine. Circulation, 132(20), 1920–1930. https://doi.org/10.1161/CIRCULATIONAHA.115.001593
Desai, B., S. (2020). Data from chest imaging with clinical and genomic correlates representing a rural COVID-19 positive population [data set]. The Cancer Imaging Archive. https://doi.org/10.7937/tcia.2020.py71-5978
Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.-J., Sandhu, S., et al. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. The American Journal of Cardiology, 64(5), 304–310. https://doi.org/10.1016/0002-9149(89)90524-9
Detrano, R., Yiannikas, J., Salcedo, E. E., Rincon, G., Go, R. T., Williams, G., & Leatherman, J. (1984). Bayesian probability analysis: A prospective demonstration of its clinical utility in diagnosing coronary disease. Circulation, 69(3), 541—547. https://doi.org/10.1161/01.CIR.69.3.541
Dong, J., & Rudin, C. (2020). Exploring the cloud of variable importance for the set of all good models. Nature Machine Intelligence, 2(12), 810–824.
Du, X., Cai, Y., Wang, S., & Zhang, L. (2016). Overview of deep learning. In 2016 31st youth academic annual conference of chinese association of automation (YAC) (pp. 159–164). IEEE.
Dua, D., & Graff, C. (2017). UCI machine learning repository. http://archive.ics.uci.edu/ml
Dubin, R. A. (1998). Predicting house prices using multiple listings data. The Journal of Real Estate Finance and Economics. https://doi.org/10.1023/A:1007751112669
Dupuis, C., De Montmollin, E., Neuville, M., Mourvillier, B., Ruckly, S., & Timsit, J. F. (2021). Limited applicability of a COVID-19 specific mortality prediction rule to the intensive care setting. Nature Machine Intelligence, 3(1), 20–22. https://doi.org/10.1038/s42256-020-00252-4
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through awareness. ITCS. https://doi.org/10.1145/2090236.2090255
England, R. (2019). Wine’s alcohol levels explained. https://www.wineinvestment.com/wine-blog/2019/05/wines-alcohol-levels-explained?fbclid=IwAR3xpQITEQZrQUPPaEt7-DbFHmvHE559-iVuLsgS6dDinOeWrl04MZiglbM.
Falk, M., & Vieru, M. (2018). Modelling the cancellation behaviour of hotel guests. International Journal of Contemporary Hospitality Management, 30(10), 3100–3116. https://doi.org/10.1108/ijchm-08-2017-0509
Fan, C., Cui, Z., & Zhong, X. (2018). House prices prediction with machine learning algorithms. In Proceedings of the 2018 10th international conference on machine learning and computing (pp. 6–10). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3195106.3195133
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/https://doi.org/10.1016/j.patrec.2005.10.010
Finlayson, S. G., Chung, H. W., Kohane, I. S., & Beam, A. L. (2018). Adversarial attacks against medical deep learning systems. arXiv preprint arXiv:1804.05296.
Fisher, A., Rudin, C., & Dominici, F. (2018b). All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. arXiv. https://arxiv.org/abs/1801.01489
Fisher, A., Rudin, C., & Dominici, F. (2018a). All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. arXiv. https://arxiv.org/abs/1801.01489
Fisher, A., Rudin, C., & Dominici, F. (2019a). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177), 1–81. http://jmlr.org/papers/v20/18-760.html
Fisher, A., Rudin, C., & Dominici, F. (2019d). All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research, 20(177), 1–81. http://jmlr.org/papers/v20/18-760.html
Fisher, A., Rudin, C., & Dominici, F. (2019e). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177), 1–81.
Fisher, A., Rudin, C., & Dominici, F. (2019b). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177), 1–81. http://jmlr.org/papers/v20/18-760.html
Fisher, A., Rudin, C., & Dominici, F. (2019c). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177), 1–81.
Fisher, R. A. (1922). On the interpretation of χ2 from contingency tables, and the calculation of p. Journal of the Royal Statistical Society, 85(1), 87–94. http://www.jstor.org/stable/2340521
Friedman, J. H. (2000a). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232.
Friedman, J. H. (2000b). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29, 1189–1232. https://doi.org/10.1214/aos/1013203451
Friedman, J. H. (2000c). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29, 1189–1232. https://doi.org/10.1214/aos/1013203451
Ge, X., Runeson, G., & Lam, K. C. (2021). Forecasting hong kong housing prices: An artificial neural network approach.
Genders, T. S. S., Steyerberg, E. W., Alkadhi, H., Leschka, S., Desbiolles, L., Nieman, K., et al. (2011). A clinical prediction rule for the diagnosis of coronary artery disease: Validation, updating, and extension. European Heart Journal, 32(11), 1316–1330. https://doi.org/10.1093/eurheartj/ehr014
Géron, A. (2017). Hands-on machine learning with scikit-learn and TensorFlow : Concepts, tools, and techniques to build intelligent systems. O’Reilly Media.
Ghysels, E., Plazzi, A., Valkanov, R., & Torous, W. (2013). Chapter 9 - forecasting real estate prices, 2, 509–580. https://doi.org/https://doi.org/10.1016/B978-0-444-53683-9.00009-8
Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., & Kagal, L. (2018). Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th international conference on data science and advanced analytics (DSAA) (pp. 80–89). IEEE. https://doi.org/10.1109/DSAA.2018.00018
Glauner, P. (2021). An assessment of the AI regulation proposed by the european commission. https://arxiv.org/abs/2105.15133
GOLDNER, M. C., ZAMORA, M. C., DI LEO LIRA, P., GIANNINOTO, H., & BANDONI, A. (2009). EFFECT OF ETHANOL LEVEL IN THE PERCEPTION OF AROMA ATTRIBUTES AND THE DETECTION OF VOLATILE COMPOUNDS IN RED WINE. Journal of sensory studies, 24(2), 243–257.
Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2014). Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation.
Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2015). Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. Journal of Computational and Graphical Statistics, 24(1), 44–65. https://doi.org/10.1080/10618600.2014.907095
Goodman, B., & Flaxman, S. (2017). European union regulations on algorithmic decision-making and a “right to explanation.” AI Magazine, 38(3), 50–57. https://doi.org/10.1609/aimag.v38i3.2741
Gosiewska, A., & Biecek, P. (2019b). Do Not Trust Additive Explanations. arXiv. https://arxiv.org/abs/1903.11420v3
Gosiewska, A., & Biecek, P. (2019a). Do not trust additive explanations. arXiv preprint arXiv:1903.11420. https://doi.org/arXiv:1903.11420
Gosiewska, A., & Biecek, P. (2020). Do not trust additive explanations. https://arxiv.org/abs/1903.11420
Goyal, S. (2020, November). Credit card customers. Kaggle. https://www.kaggle.com/sakshigoyal7/credit-card-customers
Greenwell, Brandon M. (2017c). pdp: An R Package for Constructing Partial Dependence Plots. The R Journal, 9(1), 421–436. http://doi.org/10.32614/RJ-2017-016
Greenwell, Brandon M. (2017a). Pdp: An r package for constructing partial dependence plots. R J., 9(1), 421. https://doi.org/10.32614/RJ-2017-016
Greenwell, Brandon M. (2017b). pdp: An R Package for Constructing Partial Dependence Plots. The R Journal, 9(1), 421–436. http://doi.org/10.32614/RJ-2017-016
Greenwell, B., Boehmke, B., Cunningham, J., & Developers, G. (2020). Gbm: Generalized boosted regression models. https://CRAN.R-project.org/package=gbm
Gregutt, P. (2003). Does a higher alcohol content mean it’s a better drinking wine? The Seattle Times. https://archive.seattletimes.com/archive/?date=20031008&slug=wineqanda08&fbclid=IwAR3lBlpdwUCUWjWKaH4Px21b9fJQwBT0aMTa8bNWCbx4ipo4otWzvR9_mTc
Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G.-Z. (2019). XAI—explainable artificial intelligence. Science Robotics, 4(37). https://doi.org/10.1126/scirobotics.aay7120
Hanley, J. A. (2014). Receiver Operating Characteristic (ROC) Curves. Wiley StatsRef: Statistics Reference Online. https://doi.org/10.1002/9781118445112.stat05255
Hardt, M., Price, E., Price, E., & Srebro, N. (2016). Equality of Opportunity in Supervised Learning. NeurIPS. https://papers.nips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html
Heyman, A., & Sommervoll, D. (2019). House prices and relative location. Cities, 95, 102373. https://doi.org/10.1016/j.cities.2019.06.004
Holzinger, A. (2016). Interactive machine learning for health informatics: When do we need the human-in-the-loop? Brain Informatics, 3, 119–131. https://doi.org/10.1007/s40708-016-0042-6
Holzinger, A. (2021). Explainable AI and multi-modal causability in medicine. i-com, 19(3), 171–179. https://doi.org/10.1515/icom-2020-0024
Holzinger, A., Biemann, C., Pattichis, C., & Kell, D. (2017). What do we need to build explainable AI systems for the medical domain?
Holzinger, A., Langs, G., Denk, H., Zatloukal, K., & Müller, H. (2019). Causability and explainabilty of artificial intelligence in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9. https://doi.org/10.1002/widm.131
Jefford, A. (2010). Alcohol levels: The balancing act. https://www.decanter.com/features/alcohol-levels-the-balancing-act-246426/?fbclid=IwAR0bsIWug6-7l77rxb01Va8P1F_hVkaUTacNtlF-V-wRXb1HA3rJXpl74Pw.
Johnson, A. E. W., Pollard, T. J., & Mark, R. G. (2017). Reproducibility in critical care: A mortality prediction case study. In F. Doshi-Velez, J. Fackler, D. Kale, R. Ranganath, B. Wallace, & J. Wiens (Eds.), Proceedings of the 2nd machine learning for healthcare conference (Vol. 68, pp. 361–376). Boston, Massachusetts: PMLR. http://proceedings.mlr.press/v68/johnson17a.html
Johnson, A. E., Pollard, T. J., Shen, L., Li-Wei, H. L., Feng, M., Ghassemi, M., et al. (2016). MIMIC-III, a freely accessible critical care database. Scientific data, 3(1), 1–9.
Jordão, A. M., Vilela, A., & Cosme, F. (2015). From sugar of grape to alcohol of wine: Sensorial impact of alcohol in wine. Beverages, 1(4), 292–310. https://doi.org/10.3390/beverages1040292
Kaladharan, S., Vishvanathan, S., Gopalakrishnan, E. A., & Kp, S. (2020). Explainable artificial intelligence for heart rate variability in ECG signal. Healthcare Technology Letters, 7, 146–154. https://doi.org/10.1049/htl.2020.0033
Karim, M. R., Döhmen, T., Rebholz-Schuhmann, D., Decker, S., Cochez, M., & Beyan, O. (2020). DeepCOVIDExplainer: Explainable COVID-19 diagnosis from chest x-ray images. IEEE. https://doi.org/10.1109/BIBM49941.2020.9313304
Kather, J. N., Zöllner, F. G., Bianconi, F., Melchers, S. M., Schad, L. R., Gaiser, T., et al. (2016, May). Collection of textures in colorectal cancer histology. Zenodo. https://doi.org/10.5281/zenodo.53169
Kaushal, A., Altman, R., & Langlotz, C. (2020). Health Care AI Systems Are Biased. Scientific American. https://www.scientificamerican.com/article/health-care-ai-systems-are-biased
Kennedy, K. (2013). Credit scoring using machine learning (PhD thesis). Technological University Dublin. Retrieved from https://arrow.tudublin.ie/sciendoc/137/
Khalifa, N. E. M., Taha, M. H. N., Hassanien, A. E., & Elghamrawy, S. (2020). Detection of coronavirus (COVID-19) associated pneumonia based on generative adversarial networks and a fine-tuned deep transfer learning model using chest x-ray dataset. https://arxiv.org/abs/2004.01184
Khedkar, S., Subramanian, V., Shinde, G., & Gandhi, P. (2019). Explainable AI in healthcare. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3367686
Kieseberg, P., Schantl, J., Fruehwirt, P., Weippl, E., & Holzinger, A. (2015). Witnesses for the doctor in the loop, 9250, 369–378. https://doi.org/10.1007/978-3-319-23344-4_36
Klambauer, G., Unterthiner, T., Mayr, A., & Hochreiter, S. (2017). Self-Normalizing Neural Networks. arXiv:1706.02515. https://arxiv.org/abs/1706.02515
Komisarczyk, Konrad and Maksymiuk, Szymon and Koźmiński, Paweł and Biecek, Przemysław. (2020). treeshap: Fast SHAP values computation for ensemble models. R package. https://github.com/ModelOriented/treeshap
Kowsari, K., Brown, D. E., Heidarysafa, M., Jafari Meimandi, K., Gerber, M. S. and, & Barnes, L. E. (2017). HDLTex: Hierarchical deep learning for text classification. In Machine learning and applications (ICMLA), 2017 16th IEEE international conference on. IEEE.
Kowsari, K., Heidarysafa, M., Brown, D. E., Meimandi, K. J., & Barnes, L. E. (2018b). Rmdl: Random multimodel deep learning for classification. In Proceedings of the 2nd international conference on information system and data mining (pp. 19–28).
Kowsari, K., Heidarysafa, M., Brown, D. E., Meimandi, K. J., & Barnes, L. E. (2018a). Rmdl: Random multimodel deep learning for classification. In Proceedings of the 2nd international conference on information system and data mining (pp. 19–28).
Krittanawong, C., Zhang, H., Wang, Z., Aydar, M., & Kitai, T. (2017). Artificial intelligence in precision cardiovascular medicine. Journal of the American College of Cardiology, 69(21), 2657–2664. https://doi.org/10.1016/j.jacc.2017.03.571
Law, S. (2017). Defining street-based local area and measuring its effect on house price using a hedonic price approach: The case study of metropolitan london. Cities, 60, 166–179. https://doi.org/10.1016/j.cities.2016.08.008
Lerman, R. I., & Yitzhaki, S. (1984). A note on the calculation and interpretation of the gini index. Economics Letters, 15(3-4), 363–368. https://doi.org/10.1016/0165-1765(84)90126-5
Li, X., Ge, P., Zhu, J., Li, H., Graham, J., Singer, A., et al. (2020). Deep learning prediction of likelihood of ICU admission and mortality in COVID-19 patients using clinical variables. PeerJ, 8. https://peerj.com/articles/10337/
Liaw, A., & Wiener, M. (2002a). Classification and regression by randomForest. R News, 2(3), 18–22. https://CRAN.R-project.org/doc/Rnews/
Liaw, A., & Wiener, M. (2002b). Classification and regression by randomForest. R News, 2(3), 18–22. https://CRAN.R-project.org/doc/Rnews/
Linardatos, P., Papastefanopoulos, V., & Kotsiantis, S. (2021). Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy. https://www.mdpi.com/1099-4300/23/1/18/pdf
Liu, C., Gao, C., Xia, X., Lo, D., Grundy, J., & Yang, X. (2020). On the replicability and reproducibility of deep learning in software engineering. https://arxiv.org/abs/2006.14244
Loyola-González, O. (2019). Black-box vs. White-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access, 7, 154096–154113. https://doi.org/10.1109/ACCESS.2019.2949286
Łukasz Rączkowski, J. Z., Marcin Możejko. (2019). ARA: Accurate, reliable and active histopathological image classification framework with bayesian deep learning. Springer Nature, 14, 1–11. https://doi.org/10.1038/s41598-019-50587-1
Lundberg, Scott M., Erion, G. G., & Lee, S.-I. (2019). Consistent Individualized Feature Attribution for Tree Ensembles. ICML Workshop. https://arxiv.org/abs/1802.03888
Lundberg, Scott M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems 30 (pp. 4765–4774). Montreal: Curran Associates. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
Lundberg, S., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874. https://doi.org/arXiv:1705.07874
Lundegård, Z. (2019). Current AI technologies for medical imaging and ethical dilemmas created by them (Master’s thesis). Åbo Akademi. Retrieved from https://core.ac.uk/download/pdf/186507687.pdf
M. Barhoom, A., Abu-Naser, S., Abu-Nasser, B., Alajrami, E., Musleh, M., & Khalil, A. (2019). Blood donation prediction using artificial neural network, 1–7. https://philarchive.org/archive/BARBDP-14
Ma, X., Ng, M., Xu, S., Xu, Z., Qiu, H., Liu, Y., et al. (2020). Development and validation of prognosis model of mortality risk in patients with COVID-19. Epidemiology and Infection, 148. http://doi.org/10.1017/S0950268820001727
Machine learning in medicine: A practical introduction. (n.d.). https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-019-0681-4#ref-CR2
Maksymiuk, S., & Biecek, P. (2020b). DALEXtra: Extension for ’DALEX’ Package. https://CRAN.R-project.org/package=DALEXtra
Maksymiuk, S., & Biecek, P. (2020a). DALEXtra: Extension for ’DALEX’ Package. https://CRAN.R-project.org/package=DALEXtra
Maksymiuk, S., Gosiewska, A., & Biecek, P. (2020b). Landscape of r packages for eXplainable artificial intelligence. arXiv. https://arxiv.org/abs/2009.13248
Maksymiuk, S., Gosiewska, A., & Biecek, P. (2020a). Landscape of r packages for eXplainable artificial intelligence. arXiv preprint arXiv:2009.13248. https://doi.org/arXiv:2009.13248
Maksymiuk, S., Gosiewska, A., & Biecek, P. (2021). Landscape of r packages for eXplainable artificial intelligence. https://arxiv.org/abs/2009.13248
Mattern, F., Staake, T., & Weiss, M. (2010). ICT for green: How computers can help us to conserve energy. In Proceedings of the 1st international conference on energy-efficient computing and networking (pp. 1–10). https://doi.org/10.1145/1791314.1791316
Mendez, D., Graziotin, D., Wagner, S., & Seibold, H. (2020). Open science in software engineering. Contemporary Empirical Methods in Software Engineering, 477–501. https://doi.org/10.1007/978-3-030-32489-6_17
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2020). e1071: Misc functions of the department of statistics, probability theory group (formerly: E1071), TU wien. https://CRAN.R-project.org/package=e1071
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C.-C., & Lin, C.-C. (2021). e1071: Misc Functions of the Department of Statistics, Probability Theory Group. R package. https://CRAN.R-project.org/package=e1071
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546. http://arxiv.org/abs/1310.4546
Molnar, C. (2019). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book
Neff, T., Payer, C., Stern, D., & Urschler, M. (2017). Generative adversarial network based synthesis for supervised medical image segmentation. In Proc. OAGM and ARW joint workshop.
Ngoc Anh, H. (2016). Smartphone industry: The new era of competition and strategy (pp. 1–46).
O’Dea, S. (2021a). Global smartphone sales to end users since 2007. https://www.statista.com/statistics/263437/global-smartphone-sales-to-end-users-since-2007/
O’Dea, S. (2021b). Global smartphone market share from 4th quarter 2009 to 4th quarter 2020. https://www.statista.com/statistics/271496/global-market-share-held-by-smartphone-vendors-since-4th-quarter-2009/
Oktay, O., Schlemper, J., Folgoc, L. L., Lee, M., Heinrich, M., Misawa, K., et al., et al. (2018). Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999.
Pace, R. K., & Barry, R. (1997). Sparse spatial autoregressions. Statistics & Probability Letters, 33(3), 291–297.
Pal, M. (2005). Random forest classifier for remote sensing classification. International journal of remote sensing, 26(1), 217–222. https://doi.org/10.1080/01431160412331269698
Pandala, S. R. (2019). Lazy Predict. Python package. https://github.com/shankarpandala/lazypredict
Park, B., & Bae, J. (2015). Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data. Expert Systems with Applications, 42. https://doi.org/10.1016/j.eswa.2014.11.040
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011a). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Pedregosa, Fabian, Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, Édouard. (2011b). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011b). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Pedregosa, Fabian, Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, douard. (2011a). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Pekala, K., Woznica, K., & Biecek, P. (2021a). Triplot: Model agnostic measures and visualisations for variable importance in predictive models that take into account the hierarchical correlation structure. CoRR, abs/2104.03403. https://arxiv.org/abs/2104.03403
Pekala, K., Woznica, K., & Biecek, P. (2021b). Triplot: Model agnostic measures and visualisations for variable importance in predictive models that take into account the hierarchical correlation structure. arXiv preprint arXiv:2104.03403. https://doi.org/arXiv:2104.03403
Peng, Z., Huang, Q., & Han, Y. (2019). Model research on forecast of second-hand house price in chengdu based on XGboost algorithm, 168–172. https://doi.org/10.1109/ICAIT.2019.8935894
Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., d’Alché-Buc, F., et al. (2020). Improving reproducibility in machine learning research (A report from the NeurIPS 2019 reproducibility program). CoRR, abs/2003.12206. https://arxiv.org/abs/2003.12206
Plumb, G., Molitor, D., & Talwalkar, A. (2019). Model agnostic supervised local explanations. https://arxiv.org/abs/1807.02910
Probst, P., Boulesteix, A.-L., & Bischl, B. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(53), 1–32. http://jmlr.org/papers/v20/18-444.html
Quandt, R. E. (1983). Computational problems and methods. Handbook of econometrics, 1, 699–764. https://doi.org/10.1016/S1573-4412(83)01016-8
Quanjel, M. J. R., Holten, T. C. van, Gunst-van der Vliet, P. C., Wielaard, J., Karakaya, B., Söhne, M., et al. (2021). Replication of a mortality prediction model in dutch patients with COVID-19. Nature Machine Intelligence, 3(1), 23–24. https://doi.org/10.1038/s42256-020-00253-3
Quanjel, M. J. R., Holten, T. C. van, Vliet, P. C. G. der, Wielaard, J., Karakaya, B., Söhne, M., et al. (2021). Replication of a mortality prediction model in Dutch patients with COVID-19. Nature Machine Intelligence, 3, 23–24. https://doi.org/10.1038/s42256-020-00253-3
R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/
R Core Team. (2021). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/
Raghavan, V., Bollmann, P., & Jung, G. S. (1989). A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst., 7(3), 205–229. https://doi.org/10.1145/65943.65945
Rahimzadeh, M., Attar, A., & Sakhaei, S. M. (2021). A fully automated deep learning-based network for detecting COVID-19 from a new and large lung CT scan dataset. Biomedical Signal Processing and Control, 102588. https://doi.org/https://doi.org/10.1016/j.bspc.2021.102588
Rahman, T., Khandakar, A., Qiblawey, Y., Tahir, A., Kiranyaz, S., Abul Kashem, S. B., et al. (2021). Exploring the effect of image enhancement techniques on COVID-19 detection using chest x-ray images. Computers in Biology and Medicine, 132, 104319. https://doi.org/https://doi.org/10.1016/j.compbiomed.2021.104319
Rai, A. (2020). Explainable AI: from black box to glass box. Journal of the Academy of Marketing Science, 48, 137–141. https://link.springer.com/article/10.1007/s11747-019-00710-5
Religia, Y., Pranoto, G. T., & Santosa, E. D. (2020). South german credit data classification using random forest algorithm to predict bank credit receipts. JISA (Jurnal Informatika dan Sains), 3(2), 62–66. https://doi.org/10.31326/jisa.v3i2.837
Riasi, A., Schwartz, Z., & Chen, C.-C. (2019). A paradigm shift in revenue management? The new landscape of hotel cancellation policies. Journal of Revenue and Pricing Management, 18(6), 434–440. https://doi.org/10.1057/s41272-019-00189-3
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016a). "Why should I trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD san francisco, CA (pp. 1135–1144). New York, NY: Association for Computing Machinery.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016b). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, san francisco, CA, USA, august 13-17, 2016 (pp. 1135–1144). https://doi.org/10.18653/v1/n16-3020
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016c). " why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135–1144). https://doi.org/10.1145/2939672.2939778
Roberto Castro Sundin, A. S. G. &. S. W., Tony Rönnqvist. (2020). Siamesifying the COVID-net. https://people.kth.se/~rosun/deep-learning/
Ruder, S. (2017). An overview of multi-task learning in deep neural networks. https://arxiv.org/abs/1706.05098
Rudin, C. (2019b). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x
Rudin, C. (2019a). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206–215. https://doi.org/10.1038/s42256-019-0048
Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., & Zhong, C. (2021). Interpretable machine learning: Fundamental principles and 10 grand challenges. arXiv preprint arXiv:2103.11251.
Saarela, M., & Jauhiainen, S. (2021). Comparison of feature importance measures as explanations for classification models. SN Applied Sciences, 3(2). https://doi.org/10.1007/s42452-021-04148-9
Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. Plos One, 10(3). https://doi.org/10.1371/journal.pone.0118432
Sánchez-Medina, A. J., & C-Sánchez, E. (2020). Using machine learning and big data for efficient forecasting of hotel booking cancellations. International Journal of Hospitality Management, 89, 102546. https://doi.org/10.1016/j.ijhm.2020.102546
Sandfort, V., Yan, K., Pickhardt, P. J., & Summers, R. M. (2019). Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Scientific reports, 9(1), 1–9.
Semenova, L., Rudin, C., & Parr, R. (2019). A study in rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning. arXiv preprint arXiv:1908.01755.
Shea, A. M., Hammill, B. G., Curtis, L. H., Szczech, L. A., & Schulman, K. A. (2008). Medical costs of abnormal serum sodium levels. Journal of the American Society of Nephrology, 19(4), 764–770. https://doi.org/10.1681/ASN.2007070752
Siler, W. (2013). Computers in life science research (Vol. 2). Springer Science & Business Media.
Singh, R. (2021). Exploratory data analysis and customer segmentation for smartphones.
Slack, D., Hilgard, S., Jia, E., Singh, S., & Lakkaraju, H. (2020). Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM conference on AI, ethics, and society (pp. 180–186). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3375627.3375830
Smith, S. J., Parsa, H. G., Bujisic, M., & van der Rest, J.-P. (2015). Hotel cancelation policies, distributive and procedural fairness, and consumer patronage: A study of the lodging industry. Journal of Travel & Tourism Marketing, 32, 886–906. https://doi.org/10.1080/10548408.2015.1063864
Sofaer, H. R., Hoeting, J. A., & Jarnevich, C. S. (2019). The area under the precision-recall curve as a performance metric for rare binary events. Methods in Ecology and Evolution, 10(4), 565–577. https://doi.org/10.1111/2041-210X.13140
Staniak, M., & Biecek, P. (2018a). Explanations of model predictions with live and breakDown packages.
Staniak, M., & Biecek, P. (2018b). Explanations of model predictions with live and breakDown packages. arXiv preprint arXiv:1804.01955. https://doi.org/10.32614/RJ-2018-072
Staniak, M., Kuzba, M., & Biecek, P. (2018). Local explanations of complex machine learning models. https://doi.org/10.13140/RG.2.2.23637.58084
Suzuki, K. (2017). Overview of deep learning in medical imaging. Radiological physics and technology, 10(3), 257–273.
Tang, F., Xiao, C., Wang, F., & Zhou, J. (2018). Predictive modeling in urgent care: A comparative study of machine learning approaches. Jamia Open, 1(1), 87–98.
Tatman, R., VanderPlas, J., & Dane, S. (2018). A practical taxonomy of reproducibility for machine learning research.
Thompson, N. C., Greenewald, K., Lee, K., & Manso, G. F. (2020). The computational limits of deep learning. arXiv preprint arXiv:2007.05558.
Tonekaboni, S., Joshi, S., McCradden, M. D., & Goldenberg, A. (2019). What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use. Machine Learning for Healthcare. http://proceedings.mlr.press/v106/tonekaboni19a.html
Tsai, Simpson, E. (2020). Data from the medical imaging data resource center - RSNA international COVID radiology database release 1a - chest CT covid+ (MIDRC-RICORD-1a). The Cancer Imaging Archive. https://doi.org/10.7937/VTW4-X588
Tsai, Simpson, E. B. (2021). Medical imaging data resource center (MIDRC) - RSNA international COVID open research database (RICORD) release 1b - chest CT covid- [data set]. The Cancer Imaging Archive.
Turney, P. D. (1995). Cost-sensitive classification: Empirical evaluationof a hybrid genetic decision tree induction algorithm. https://www.jair.org/index.php/jair/article/view/10129/23991
Ucar, F., & Korkmaz, D. (2020). COVIDiagnosis-net: Deep bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from x-ray images. Medical Hypotheses, 140, 109761–109761.
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in r. Journal of Statistical Software, 45(3), 1–67. https://www.jstatsoft.org/v45/i03/
Vandewalle, P., Kovacevic, J., & Vetterli, M. (2009). Reproducible research in signal processing. IEEE Signal Processing Magazine, 26(3), 37–47.
Vanschoren, J., Rijn, J. N. van, Bischl, B., & Torgo, L. (2013). OpenML: Networked science in machine learning. SIGKDD Explorations, 15(2), 49–60. https://doi.org/10.1145/2641190.2641198
Varma, A., Sarma, A., Doshi, S., & Nair, R. (2018). House price prediction using machine learning and neural networks, 1936–1939. https://doi.org/10.1109/ICICCT.2018.8473231
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
Vayena E, C. I., Blasimme A. (2018). Machine learning in medicine: Addressing ethical challenges. PLOT Medicine, 15(11), 1–4. https://doi.org/10.1371/journal.pmed.1002689
Wang, J., Li, M., Hu, Y., & Zhu, Y. (2009). Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models. BMC Health Services Research, 9(1). https://doi.org/10.1186/1472-6963-9-161
Wang, L., Lin, Z. Q., & Wong, A. (2020a). COVID-net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest x-ray images. Scientific Reports, 10(1), 19549. https://doi.org/10.1038/s41598-020-76550-z
Wang, L., Lin, Z. Q., & Wong, A. (2020b). COVID-net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest x-ray images. Scientific Reports, 10(1), 19549. https://doi.org/10.1038/s41598-020-76550-z
Wang, L., Lin, Z. Q., & Wong, A. (2020c). COVID-net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest x-ray images. Scientific Reports, 10(1), 19549. https://doi.org/10.1038/s41598-020-76550-z
Wang, R., Wang, X., & Inouye, D. I. (2021). Shapley Explanation Networks. ICLR. https://openreview.net/forum?id=vsU0efpivw
Wang, S., Zha, Y., Li, W., Wu, Q., Li, X., Niu, M., et al. (2020a). A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis. European Respiratory Journal, 56(2). https://doi.org/10.1183/13993003.00775-2020
Wang, S., Zha, Y., Li, W., Wu, Q., Li, X., Niu, M., et al. (2020b). A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis. medRxiv. https://doi.org/10.1101/2020.03.24.20042317
Wiens, J., Guttag, J., & Horvitz, E. (2014). A study in transfer learning: leveraging data from multiple hospitals to enhance hospital-specific predictions. Journal of the American Medical Informatics Association, 21(4), 699–706. https://doi.org/10.1136/amiajnl-2013-002162
Wright, M. N., & Ziegler, A. (2016). XGBoost: A Scalable Tree Boosting System. SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939785
Wright, M. N., & Ziegler, A. (2017c). ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://doi.org/10.18637/jss.v077.i01
Wright, M. N., & Ziegler, A. (2017d). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://doi.org/10.18637/jss.v077.i01
Wright, M. N., & Ziegler, A. (2017a). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://doi.org/10.18637/jss.v077.i01
Wright, M. N., & Ziegler, A. (2017b). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://doi.org/10.18637/jss.v077.i01
WUoT. (2020). ML case studies: Reproducibility of scientific papers. https://mini-pw.github.io/2020L-WB-Book/reproducibility.html
XGBoost: A scalable tree boosting system. (2016). CoRR, abs/1603.02754. http://arxiv.org/abs/1603.02754
Yan, L., Zhang, H.-T., Goncalves, J., Xiao, Y., Wang, M., Guo, Y., et al. (2020a). An interpretable mortality prediction model for COVID-19 patients. Nature Machine Intelligence, 2(5), 283–288. https://doi.org/10.1038/s42256-020-0180-7
Yan, L., Zhang, H.-T., Goncalves, J., Xiao, Y., Wang, M., Guo, Y., et al. (2020b). An interpretable mortality prediction model for COVID-19 patients. Nature Machine Intelligence, 2(5), 283–288. https://doi.org/10.1038/s42256-020-0180-7
Yan, L., Zhang, H.-T., Goncalves, J., Xiao, Y., Wang, M., Guo, Y., et al. (2020c). An interpretable mortality prediction model for COVID-19 patients. Nature Machine Intelligence, 2(5), 283--288. https://www.nature.com/articles/s42256-020-0180-7
Yan, L., Zhang, H.-T., Goncalves, J., Xiao, Y., Wang, M., Guo, Y., et al. (2020d). An interpretable mortality prediction model for COVID-19 patients. Nature Machine Intelligence, 2(5), 283--288. https://www.nature.com/articles/s42256-020-0180-7
Yildiz, B., Hung, H., Krijthe, J. H., Liem, C. C. S., Loog, M., Migut, G., et al. (2021). ReproducedPapers.org: Openly teaching and structuring machine learning reproducibility. In B. Kerautret, M. Colom, A. Krähenbühl, D. Lopresti, P. Monasse, & H. Talbot (Eds.), Reproducible research in pattern recognition (pp. 3–11). Cham: Springer International Publishing.
Yoo, Y. (2010). Computing in everyday life: A call for research on experiential computing. MIS quarterly, 213–231. https://doi.org/10.2307/20721425
Yu, H., Huang, X., Hu, X., & Cai, H. (2010). A comparative study on data mining algorithms for individual credit risk evaluation. In 2010 international conference on management of e-commerce and e-government (pp. 35–38). IEEE. https://doi.org/10.1109/ICMeCG.2010.16
Zaimi Aldo, W. M., Herman, V., Antonsanti, P.-L., Perone, C. S., & Cohen-Adad, J. (2018). AxonDeepSeg: Automatic axon and myelin segmentation from microscopy data using convolutional neural networks. Scientific Reports, 8(1), 3816. https://doi.org/10.1038/s41598-018-22181-4
Zhao, Q., Meng, M., Kumar, R., Wu, Y., Huang, J., Deng, Y., et al. (2020). Lymphopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: A systemic review and meta-analysis. International Journal of Infectious Diseases, 96, 131–135. https://doi.org/10.1016/j.ijid.2020.04.086
Zhao, Y., Chetty, G., & Tran, D. (2019). Deep learning with XGBoost for real estate appraisal, 1396–1401. https://doi.org/10.1109/SSCI44817.2019.9002790
Zheng, Y., Zhu, Y., Ji, M., Wang, R., Liu, X., Zhang, M., et al. (2020a). A Learning-Based Model to Evaluate Hospitalization Priority in COVID-19 Pandemics. Patterns, 1(6), 100092. https://doi.org/10.1016/j.patter.2020.100092
Zheng, Y., Zhu, Y., Ji, M., Wang, R., Liu, X., Zhang, M., et al. (2020b). A Learning-Based Model to Evaluate Hospitalization Priority in COVID-19 Pandemics. Patterns, 1(9), 100173. https://doi.org/10.1016/j.patter.2020.100173
Zhou, Z.-H., & Feng, J. (2017). Deep forest: Towards an alternative to deep neural networks. CoRR, abs/1702.08835. http://arxiv.org/abs/1702.08835
Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N., & Liang, J. (2019). UNet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging.

References

Breiman, L. et al. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3), 199–231.
Fisher, A., Rudin, C., & Dominici, F. (2019e). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177), 1–81.
Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., & Zhong, C. (2021). Interpretable machine learning: Fundamental principles and 10 grand challenges. arXiv preprint arXiv:2103.11251.
Semenova, L., Rudin, C., & Parr, R. (2019). A study in rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning. arXiv preprint arXiv:1908.01755.
Tang, F., Xiao, C., Wang, F., & Zhou, J. (2018). Predictive modeling in urgent care: A comparative study of machine learning approaches. Jamia Open, 1(1), 87–98.