Results of machine learning experiments for "Multi-classifier prediction of knee osteoarthritis progression from incomplete imbalanced longitudinal data"
datasetposted on 30.10.2019 by Paweł Widera
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The archive file includes results of machine learning experiments performed for the article "Multi-classifier prediction of knee osteoarthritis progression from incomplete imbalanced longitudinal data". The hypothesis of the article is that prediction models trained on historical data will be more effective at identifying fast progressing knee OA patients than conventional inclusion criteria.
For all experiments the first level folder hierarchy indicates the method used. Where parameter tuning is performed, the second level folders indicate algorithm parameters. Each experiment output is stored in a xz compressed text file in JSON format.
In experiments measuring the learning curves (training-*), each results file describes:
* experiment setup (algorithm, number of subsets, down-sampled class size)* list of training set sizes
* performance measure statistics for all subsets at each training size (flat list) including min, median and max score, and median deviation from median (mad), given for both test and training set instances
In parameter tuning experiments (prediction-multi-*), each results file contains:
* experiment setup (method / algorithm, number of CV repeats, number of model runs)* imputer parameters (not important, kept constant in all experiments)
* classifier parameters (for random forest)
* true class for each instance
* class predictions by the median model from each CV-repeat
* class probabilities estimated by the median model from each CV-repeat
* performance measure statistics for each CV-repeat including min, median and max score, and median deviation from median (mad)
In RFE experiments (prediction-multi-rfe-*) the results additionally include:
* scores for all RFE steps for each CV-repeat
* number of times each feature was selected (across all folds and CV-repeats)