Skip to main content

Table 3 Average percentage improvement between RF and PRF probabilities in relation to ideal y-label values across different emulated train-test standard deviations (SDs) when pChEMBL threshold equals 5

From: Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty

Standard Deviation in train and test set

y-ideal range (N)

Better-performing Algorithm

% improvement

Average Percentage of SE data

SD-train: 0.0–0.4 & SD-test: 0.0–0.4

0.0–0.2 (104,345)

PRF

6.63

38.66

0.2–0.4 (42,075)

PRF

5.19

34.03

0.4–0.6 (63,520)

PRF

6.42

36.74

0.6–0.8 (86,27)

PRF

3.19

36.01

0.8–1.0 (530,080)

RF

6.96

31.57

SD-train: 0.4–0.6 & SD-test:0.4–0.6

0.0–0.2 (92,720)

PRF

0.23

42.68

0.2–0.4 (106,755)

PRF

11.65

35.82

0.4–0.6 (173,070)

PRF

16.76

36.08

0.6–0.8 (314,270)

PRF

11.60

33.52

0.8–1.0 (3,022,800)

RF

9.48

29.99