Skip to main content

Table 6 Active learning results

From: Evaluating uncertainty-based active learning for accelerating the generalization of molecular property prediction

Target

Model

Sampling

Standard

Generalization

Whole

OOD Data Bin

Whole

ID Bins

Solubility

MDM

Data Density (EB)

0.14%

0.25%

0.08%

0.03%

\(\varvec{p=0.015}\)

\(\varvec{p=0.013}\)

\(p=0.146\)

\(p=0.326\)

Diversity (EB)

0.00%

0.13%

−0.03%

−0.07%

\(p=0.483\)

\(p=0.098\)

\(p=0.674\)

\(p=0.793\)

MCDO

0.13%

0.08%

0.10%

0.11%

\(\varvec{p=0.032}\)

\(p=0.204\)

\(\varvec{p=0.036}\)

\(\varvec{p=0.024}\)

OOD Only

1.93%

0.18%

−0.28%

\(\varvec{p<0.001}\)

\(\varvec{p=0.007}\)

\(p=0.992\)

Solubility

GBM

Data density (EB)

−0.01%

0.24%

0.04%

−0.03%

\(p=0.714\)

\(\varvec{p<0.001}\)

\(p=0.084\)

\(p=0.868\)

GBM

0.06%

0.11%

0.05%

0.03%

\(\varvec{p=0.002}\)

\(\varvec{p=0.012}\)

\(\varvec{p=0.045}\)

\(p=0.203\)

OOD only

1.82%

0.14%

−0.31%

\(\varvec{p<0.001}\)

\(\varvec{p=0.009}\)

\(p=0.996\)

Redox

MDM

Data density (EB)

−0.09%

0.42%

0.06%

−0.05%

\(p=0.986\)

\(\varvec{p<0.001}\)

\(p=0.055\)

\(p=0.900\)

MCDO

0.01%

0.18%

0.05%

0.01%

\(p=0.377\)

\(\varvec{p=0.001}\)

\(p=0.102\)

\(p=0.487\)

OOD only

2.09%

0.41%

−0.10%

\(\varvec{p<0.001}\)

\(\varvec{p<0.001}\)

\(p=0.977\)

Redox

GBM

Data density (EB)

0.02%

0.21%

0.06%

0.01%

\(p=0.168\)

\(\varvec{p<0.001}\)

\(\varvec{p=0.003}\)

\(p=0.265\)

GBM

0.02%

0.12%

0.05%

0.02%

\(p=0.123\)

\(\varvec{p=0.007}\)

\(\varvec{p=0.013}\)

\(p=0.076\)

OOD only

-

1.60%

0.11%

−0.29%

-

\(\varvec{p<0.001}\)

\(\varvec{p=0.025}\)

\(p=0.994\)

  1. For the density method, EB refers to embedding-based similarity. Shown here are the percentage decrease in RMSE compared to random sampling. Shown below are the p-values of corresponding paired t-tests with the alternative hypothesis of AL performing better than random sampling. Significant test results are bold