Skip to main content

Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Table 4 Summary statistics for the five best-performing XGB models for chemicals with acidic and basic pKas

From: Open-source QSAR models for pKa prediction using multiple machine learning approaches

Data option Dataset Feature sets Number of features Train Test
R2 RMSE R2 RMSE
1 Acidic Fingerprints (D1) 4901 0.684 1.865 0.754 1.679
1 Acidic Fingerprints (D2) 4234 0.673 1.897 0.739 1.728
1 Acidic MACCS (D2) 145 0.658 1.951 0.725 1.775
1 Acidic Fingerprints (D3) 1663 0.655 1.948 0.710 1.825
1 Acidic MACCS (D1) 153 0.657 1.953 0.706 1.834
2 Basic Fingerprints (D2) 4009 0.752 1.540 0.728 1.694
2 Basic Fingerprints (D1) 4665 0.749 1.551 0.723 1.709
2 Basic PUBCHEM (D2) 488 0.727 1.622 0.720 1.718
2 Basic MACCS (D3) 98 0.714 1.663 0.714 1.736
2 Basic MACCS (D1) 153 0.734 1.601 0.712 1.744
  1. Each group of statistics is ordered by test set RMSE, with the best-performing models listed first. D1 indicates the data set with variables of all 0’s and all 1’s removed. D2 indicates the D1 data set with highly correlated variables removed. D3 indicates the D2 data set with low variance features removed