From: Improving chemical reaction yield prediction using pre-trained graph neural networks
Dataset | Split | Previous studies | Existing GNN pre-training methods | Proposed method | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
 |  | MFF [4] | YieldBERT [6] | YieldBERT-DA [7] | YieldMPNN [8] | From-Scratch | MolCLR [13] | DGI [18] | ContextPred [21] | AttrMasking [21] | MolDescPred-MPNN | MolDescPred |
Buchwald-Hartwig (Random Split) | 70/30 | 0.932±0.008 | 0.951±0.005 | 0.969±0.004 | 0.974±0.001 | 0.971±0.002 | 0.974±0.001 | 0.974±0.001 | 0.974±0.001 | 0.974±0.002 | 0.974±0.002 | 0.974±0.001 |
50/50 | 0.913±0.007 | 0.928±0.004 | 0.953±0.006 | 0.961±0.003 | 0.949±0.019 | 0.962±0.003 | 0.961±0.003 | 0.962±0.003 | 0.962±0.003 | 0.962±0.003 | 0.963±0.003 | |
30/70 | 0.878±0.010 | 0.882±0.011 | 0.917±0.010 | 0.934±0.008 | 0.923±0.010 | 0.937±0.007 | 0.934±0.008 | 0.935±0.008 | 0.935±0.008 | 0.936±0.008 | 0.937±0.008 | |
20/80 | 0.852±0.007 | 0.857±0.008 | 0.886±0.017 | 0.909±0.008 | 0.883±0.018 | 0.913±0.009 | 0.908±0.011 | 0.910±0.007 | 0.908±0.009 | 0.912±0.008 | 0.913±0.009 | |
10/90 | 0.791±0.011 | 0.793±0.016 | 0.818±0.009 | 0.841±0.013 | 0.763±0.032 | 0.842±0.016 | 0.839±0.017 | 0.837±0.014 | 0.839±0.020 | 0.838±0.014 | 0.847±0.016 | |
5/95 | 0.697±0.024 | 0.622±0.042 | 0.733±0.027 | 0.734±0.019 | 0.546±0.146 | 0.741±0.018 | 0.733±0.028 | 0.739±0.023 | 0.726±0.020 | 0.753±0.025 | 0.768±0.029 | |
2.5/97.5 | 0.576±0.047 | 0.436±0.034 | 0.604±0.031 | 0.628±0.062 | 0.391±0.194 | 0.636±0.051 | 0.616±0.061 | 0.583±0.082 | 0.623±0.042 | 0.619±0.042 | 0.662±0.053 | |
 | avg. rank | 10.29±0.88 | 9.86±0.35 | 7.86±0.99 | 4.14±1.73 | 9.57±1.29 | 1.71±0.70 | 5.00±1.77 | 4.29±2.31 | 4.14±2.17 | 3.14±1.64 | 1.00±0.00 |
Suzuki-Miyaura (Random Split) | 70/30 | 0.834±0.010 | 0.815±0.013 | 0.859±0.012 | 0.886±0.010 | 0.879±0.011 | 0.890±0.011 | 0.887±0.011 | 0.890±0.010 | 0.892±0.010 | 0.891±0.009 | 0.889±0.010 |
50/50 | 0.810±0.006 | 0.780±0.009 | 0.823±0.007 | 0.867±0.003 | 0.855±0.004 | 0.869±0.004 | 0.867±0.005 | 0.870±0.004 | 0.869±0.004 | 0.870±0.004 | 0.869±0.004 | |
30/70 | 0.774±0.006 | 0.729±0.014 | 0.774±0.012 | 0.829±0.004 | 0.803±0.014 | 0.831±0.005 | 0.824±0.005 | 0.830±0.005 | 0.827±0.004 | 0.832±0.005 | 0.831±0.006 | |
20/80 | 0.738±0.013 | 0.676±0.015 | 0.719±0.022 | 0.794±0.011 | 0.735±0.035 | 0.794±0.010 | 0.783±0.012 | 0.790±0.012 | 0.788±0.011 | 0.797±0.010 | 0.794±0.007 | |
10/90 | 0.672±0.018 | 0.554±0.025 | 0.627±0.030 | 0.708±0.013 | 0.595±0.058 | 0.705±0.015 | 0.694±0.017 | 0.700±0.018 | 0.685±0.021 | 0.715±0.015 | 0.712±0.009 | |
5/95 | 0.592±0.022 | 0.430±0.040 | 0.491±0.034 | 0.565±0.018 | 0.454±0.103 | 0.542±0.048 | 0.573±0.020 | 0.566±0.021 | 0.520±0.038 | 0.601±0.021 | 0.594±0.016 | |
2.5/97.5 | 0.481±0.057 | 0.330±0.047 | 0.282±0.047 | 0.331±0.051 | 0.265±0.204 | 0.342±0.120 | 0.357±0.055 | 0.356±0.044 | 0.323±0.048 | 0.395±0.042 | 0.421±0.049 | |
 | avg. rank | 7.00±3.30 | 10.57±1.05 | 9.29±0.45 | 5.14±1.81 | 9.14±1.12 | 3.86±1.81 | 5.71±1.16 | 4.00±1.41 | 5.71±2.60 | 1.43±0.73 | 2.57±1.05 |
Buchwald-Hartwig (Out-Of-Sample Split) | Test 1 | 0.882±0.004 | 0.824±0.010 | 0.811±0.047 | 0.744±0.042 | 0.609±0.086 | 0.876±0.023 | 0.755±0.023 | 0.756±0.051 | 0.859±0.018 | 0.827±0.011 | 0.883±0.009 |
Test 2 | 0.727±0.006 | 0.829±0.037 | 0.866±0.020 | 0.876±0.026 | 0.877±0.021 | 0.882±0.026 | 0.816±0.056 | 0.877±0.030 | 0.892±0.017 | 0.866±0.038 | 0.913±0.010 | |
Test 3 | 0.650±0.006 | 0.741±0.030 | 0.585±0.067 | 0.717±0.024 | 0.610±0.081 | 0.603±0.034 | 0.631±0.019 | 0.658±0.049 | 0.650±0.013 | 0.648±0.026 | 0.761±0.028 | |
Test 4 | 0.388±0.008 | 0.444±0.077 | 0.157±0.034 | 0.496±0.031 | 0.420±0.186 | 0.481±0.020 | 0.224±0.016 | 0.252±0.071 | 0.471±0.032 | 0.455±0.042 | 0.382±0.045 | |
 | avg.rank | 6.25±3.27 | 5.50±2.50 | 9.00±2.00 | 5.00±3.39 | 7.50±2.69 | 4.50±3.20 | 9.25±0.83 | 6.25±2.28 | 3.50±1.12 | 5.75±1.30 | 2.75±3.03 |