Improving chemical reaction yield prediction using pre-trained graph neural networks

Han, Jongmin; Kwon, Youngchun; Choi, Youn-Suk; Kang, Seokho

doi:10.1186/s13321-024-00818-z

Journal of Cheminformatics

Table 1 Comparison of predictive performance in terms of RMSE

From: Improving chemical reaction yield prediction using pre-trained graph neural networks

Dataset	Split	Previous studies				Existing GNN pre-training methods					Proposed method
		MFF [4]	YieldBERT [6]	YieldBERT-DA [7]	YieldMPNN [8]	From-Scratch	MolCLR [13]	DGI [18]	ContextPred [21]	AttrMasking [21]	MolDescPred-MPNN	MolDescPred
Buchwald-Hartwig (Random Split)	70/30	7.116±0.327	6.014±0.272	4.799±0.261	4.433±0.085	4.616±0.163	4.405±0.091	4.408±0.097	4.388±0.092	4.386±0.125	4.430±0.104	4.407±0.089
	50/50	8.051±0.322	7.288±0.198	5.877±0.348	5.387±0.202	6.088±0.982	5.279±0.167	5.364±0.222	5.327±0.183	5.328±0.216	5.326±0.231	5.263±0.181
	30/70	9.492±0.364	9.338±0.424	7.822±0.463	6.970±0.403	7.557±0.473	6.837±0.387	6.963±0.403	6.947±0.400	6.944±0.407	6.899±0.394	6.850±0.400
	20/80	10.487±0.259	10.306±0.303	9.164±0.668	8.204±0.372	9.317±0.713	8.040±0.399	8.271±0.498	8.175±0.333	8.268±0.398	8.093±0.365	8.043±0.426
	10/90	12.450±0.357	12.393±0.499	11.633±0.293	10.875±0.448	13.232±0.880	10.816±0.537	10.935±0.553	10.982±0.473	10.912±0.672	10.945±0.466	10.648±0.544
	5/95	14.994±0.593	16.740±0.950	14.073±0.687	14.041±0.492	18.188±2.789	13.873±0.485	14.068±0.728	13.911±0.601	14.250±0.537	13.542±0.681	13.117±0.792
	2.5/97.5	17.731±0.970	20.463±0.623	17.151±0.677	16.586±1.364	21.081±3.116	16.414±1.134	16.845±1.334	17.526±1.680	16.722±0.938	16.798±0.935	15.817±1.250
	avg. rank	10.29±0.88	9.86±0.35	8.00±0.76	5.29±1.67	9.57±1.29	2.00±0.76	5.86±0.64	4.86±1.88	4.57±1.99	4.00±1.51	1.71±1.03
Suzuki-Miyaura (Random Split)	70/30	11.428±0.341	12.073±0.463	10.524±0.482	9.467±0.459	9.742±0.489	9.289±0.516	9.430±0.474	9.297±0.462	9.225±0.465	9.271±0.446	9.333±0.478
	50/50	12.208±0.169	13.148±0.270	11.797±0.250	10.225±0.135	10.691±0.171	10.155±0.142	10.222±0.191	10.091±0.164	10.156±0.183	10.097±0.157	10.133±0.164
	30/70	13.347±0.148	14.614±0.381	13.337±0.357	11.593±0.136	12.449±0.450	11.542±0.190	11.771±0.181	11.569±0.194	11.654±0.159	11.507±0.175	11.550±0.222
	20/80	14.347±0.335	15.966±0.381	14.851±0.576	12.734±0.347	14.404±0.902	12.736±0.322	13.051±0.351	12.837±0.363	12.911±0.345	12.650±0.324	12.717±0.225
	10/90	16.062±0.445	18.734±0.530	17.129±0.683	15.164±0.344	17.813±1.236	15.239±0.399	15.520±0.444	15.371±0.452	15.739±0.523	14.973±0.395	15.050±0.256
	5/95	17.927±0.484	21.181±0.724	20.016±0.661	18.511±0.392	20.665±1.823	18.982±1.000	18.332±0.421	18.487±0.431	19.430±0.760	17.720±0.466	17.891±0.351
	2.5/97.5	20.199±1.096	22.967±0.804	23.780±0.793	22.943±0.887	23.878±3.170	22.692±2.048	22.495±0.965	22.519±0.762	23.088±0.806	21.829±0.774	21.338±0.908
	avg. rank	7.14±3.40	10.57±1.05	9.29±0.45	5.43±1.68	9.14±1.12	4.29±1.58	5.71±1.16	4.14±1.36	6.00±2.39	1.57±0.73	2.71±1.03
Buchwald-Hartwig (Out-Of-Sample Split)	Test 1	9.369±0.151	11.441±0.342	11.761±1.398	13.746±1.175	16.956±1.913	9.559±0.871	13.484±0.636	13.398±1.480	10.219±0.646	11.343±0.346	9.320±0.376
	Test 2	14.163±0.155	11.144±1.267	9.886±0.741	9.476±1.027	9.474±0.829	9.274±1.016	11.511±1.711	9.439±1.103	8.883±0.697	9.860±1.349	8.002±0.472
	Test 3	16.629±0.141	14.276±0.820	18.041±1.395	14.939±0.622	17.471±1.777	17.681±0.757	17.053±0.429	16.404±1.127	16.608±0.310	16.659±0.616	13.726±0.814
	Test 4	20.698±0.135	19.679±1.397	24.279±0.494	18.774±0.566	19.954±3.058	19.044±0.370	23.295±0.244	22.858±1.064	19.229±0.587	19.507±0.745	20.780±0.767
	avg.rank	6.50±3.20	5.50±2.50	9.25±1.79	5.00±3.39	7.75±2.38	4.50±3.20	9.25±0.83	6.25±2.28	3.50±1.12	5.75±1.30	2.75±3.03

The best and second-best cases are highlighted in bold and underlined font, respectively

Back to article page

ISSN: 1758-2946

Contact us

Submission enquiries: journalsubmissions@springernature.com