From: Improving chemical reaction yield prediction using pre-trained graph neural networks
Dataset | Split | Previous studies | Existing GNN pre-training methods | Proposed method | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
 |  | MFF [4] | YieldBERT [6] | YieldBERT-DA [7] | YieldMPNN [8] | From-Scratch | MolCLR [13] | DGI [18] | ContextPred [21] | AttrMasking [21] | MolDescPred-MPNN | MolDescPred |
Buchwald-Hartwig (Random Split) | 70/30 | 4.694±0.116 | 3.990±0.153 | 3.090±0.118 | 2.920±0.056 | 3.038±0.096 | 2.896±0.060 | 2.909±0.060 | 2.888±0.060 | 2.905±0.049 | 2.921±0.054 | 2.899±0.061 |
50/50 | 5.370±0.134 | 4.792±0.124 | 3.744±0.150 | 3.497±0.090 | 3.957±0.796 | 3.420±0.054 | 3.488±0.074 | 3.465±0.057 | 3.485±0.078 | 3.463±0.082 | 3.439±0.054 | |
30/70 | 6.471±0.183 | 6.075±0.222 | 4.833±0.167 | 4.483±0.165 | 4.873±0.244 | 4.400±0.152 | 4.489±0.150 | 4.462±0.132 | 4.496±0.160 | 4.439±0.137 | 4.408±0.147 | |
20/80 | 7.271±0.200 | 6.862±0.212 | 5.781±0.252 | 5.311±0.154 | 6.119±0.415 | 5.197±0.169 | 5.345±0.203 | 5.309±0.146 | 5.392±0.170 | 5.240±0.170 | 5.196±0.187 | |
10/90 | 8.962±0.308 | 8.607±0.387 | 7.705±0.236 | 7.196±0.274 | 9.077±0.809 | 7.158±0.269 | 7.304±0.268 | 7.286±0.209 | 7.269±0.359 | 7.266±0.250 | 7.061±0.262 | |
5/95 | 11.085±0.322 | 12.117±0.789 | 9.651±0.338 | 9.677±0.408 | 14.043±2.879 | 9.932±0.408 | 9.688±0.467 | 9.614±0.393 | 9.716±0.392 | 9.434±0.418 | 9.058±0.463 | |
2.5/97.5 | 13.592±0.950 | 15.979±0.817 | 12.243±0.631 | 11.747±1.005 | 16.003±2.434 | 11.903±0.815 | 11.870±0.823 | 12.512±1.239 | 11.775±0.647 | 12.075±0.622 | 11.304±0.952 | |
 | avg. rank | 10.29±0.88 | 9.86±0.35 | 7.43±1.50 | 4.71±1.58 | 9.71±1.16 | 3.00±2.39 | 5.71±0.88 | 4.29±2.05 | 5.43±1.50 | 4.00±1.69 | 1.57±0.73 |
Suzuki-Miyaura (Random Split) | 70/30 | 7.904±0.169 | 8.128±0.344 | 6.598±0.270 | 6.116±0.223 | 6.323±0.245 | 6.038±0.264 | 6.096±0.263 | 6.053±0.253 | 6.037±0.243 | 6.038±0.226 | 6.045±0.218 |
50/50 | 8.522±0.118 | 8.922±0.235 | 7.539±0.153 | 6.725±0.089 | 7.053±0.133 | 6.676±0.088 | 6.729±0.138 | 6.661±0.119 | 6.702±0.141 | 6.629±0.112 | 6.667±0.101 | |
30/70 | 9.502±0.106 | 10.094±0.346 | 8.804±0.249 | 7.847±0.094 | 8.502±0.295 | 7.778±0.134 | 7.953±0.109 | 7.822±0.120 | 7.887±0.116 | 7.751±0.082 | 7.793±0.147 | |
20/80 | 10.360±0.212 | 11.229±0.247 | 10.017±0.338 | 8.793±0.191 | 10.008±0.613 | 8.785±0.181 | 9.022±0.194 | 8.890±0.227 | 8.918±0.207 | 8.691±0.213 | 8.775±0.161 | |
10/90 | 11.890±0.268 | 13.528±0.395 | 11.954±0.443 | 10.739±0.211 | 12.839±1.154 | 10.863±0.249 | 11.017±0.304 | 10.948±0.320 | 11.171±0.330 | 10.591±0.233 | 10.781±0.182 | |
5/95 | 13.545±0.281 | 15.695±0.618 | 14.294±0.507 | 13.451±0.353 | 15.307±1.530 | 14.691±1.191 | 13.381±0.301 | 13.543±0.248 | 14.120±0.513 | 12.934±0.364 | 13.236±0.299 | |
2.5/97.5 | 15.640±0.813 | 17.666±0.496 | 17.587±0.690 | 17.189±0.813 | 18.289±2.538 | 18.129±2.291 | 16.928±0.737 | 16.817±0.467 | 16.997±0.716 | 16.324±0.593 | 16.114±0.697 | |
 | avg. rank | 7.86±3.14 | 10.71±0.70 | 8.71±0.45 | 5.00±1.69 | 9.00±1.20 | 4.86±3.04 | 5.86±1.36 | 4.29±1.03 | 5.43±1.92 | 1.43±0.73 | 2.71±0.70 |
Buchwald-Hartwig (Out-Of-Sample Split) | Test 1 | 6.682±0.101 | 7.351±0.099 | 7.015±0.758 | 8.082±0.827 | 10.941±1.385 | 6.358±0.605 | 7.955±0.344 | 8.357±1.108 | 6.609±0.411 | 7.020±0.173 | 5.980±0.231 |
Test 2 | 9.459±0.112 | 7.266±0.724 | 6.588±0.328 | 6.300±0.647 | 6.359±0.524 | 6.412±0.637 | 7.649±0.893 | 6.421±0.607 | 5.997±0.499 | 6.398±0.785 | 5.469±0.396 | |
Test 3 | 10.282±0.150 | 9.129±0.745 | 11.052±0.950 | 8.986±0.314 | 11.021±1.509 | 11.154±0.596 | 10.240±0.546 | 9.780±1.087 | 10.106±0.268 | 10.639±0.576 | 8.340±0.351 | |
Test 4 | 14.874±0.050 | 13.671±1.067 | 18.422±0.620 | 13.190±0.754 | 14.414±2.982 | 13.231±0.266 | 16.719±0.598 | 16.084±1.174 | 13.910±0.320 | 13.616±0.597 | 13.870±0.393 | |
 | avg.rank | 7.50±2.50 | 5.75±2.38 | 8.50±2.29 | 3.75±3.11 | 7.75±2.59 | 5.25±3.70 | 8.50±1.66 | 7.50±2.29 | 4.00±1.58 | 5.50±1.80 | 2.00±1.73 |