Skip to main content

Table 3 Average percentage improvement between RF and PRF probabilities in relation to ideal y-label values across different emulated train-test standard deviations (SDs) when pChEMBL threshold equals 5

From: Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty

Standard Deviation in train and test set y-ideal range (N) Better-performing Algorithm % improvement Average Percentage of SE data
SD-train: 0.0–0.4 & SD-test: 0.0–0.4 0.0–0.2 (104,345) PRF 6.63 38.66
0.2–0.4 (42,075) PRF 5.19 34.03
0.4–0.6 (63,520) PRF 6.42 36.74
0.6–0.8 (86,27) PRF 3.19 36.01
0.8–1.0 (530,080) RF 6.96 31.57
SD-train: 0.4–0.6 & SD-test:0.4–0.6 0.0–0.2 (92,720) PRF 0.23 42.68
0.2–0.4 (106,755) PRF 11.65 35.82
0.4–0.6 (173,070) PRF 16.76 36.08
0.6–0.8 (314,270) PRF 11.60 33.52
0.8–1.0 (3,022,800) RF 9.48 29.99