Assessing the calibration in toxicological in vitro models with conformal prediction

Table 2 Overview of the experiments discussed in this work. Note that all splits were performed randomly stratified

Nr.	Name	Explanation
1	internal_CV	A fivefold CV, training one ACP per fold, is performed on the Tox21Train dataset and internally evaluated on the respective hold out data.
2	pred_score	Using the CV-models trained within the above described CV, the Tox21Score data are predicted.
3	pred_score_SCP	The same CV splits are applied as described above. The training set is then split into a fixed calibration set and four proportionate sub-proper training sets. For each of the four corresponding sub-proper training sets, an ML model is trained. Predictions are made for Tox21Score (and the calibration set compounds) with every model; the four nonconformity scores (ncs) are averaged before calculating the p-values.
4	train_update	The training set from the CV is combined with the Tox21Test set. This updated training set is then split into proper training and calibration set to train new ACP models for the CV set-up. Tox21Score data are predicted with the new models.
5	cal_update	The CV-models from experiment 1 are used, but the calibration is updated with the Tox21Test data to predict Tox21Score.
6	cal_update_2	The CV-models from experiment 1 are used, but the calibration is updated with 50% of Tox21Score data. The other 50% of Tox21Score are predicted. In every fold of the CV, Tox21Score is split in two equal subsets.

ISSN: 1758-2946