Skip to main content

Table 4 Summary statistics for the five best-performing XGB models for chemicals with acidic and basic pKas

From: Open-source QSAR models for pKa prediction using multiple machine learning approaches

Data option

Dataset

Feature sets

Number of features

Train

Test

R2

RMSE

R2

RMSE

1

Acidic

Fingerprints (D1)

4901

0.684

1.865

0.754

1.679

1

Acidic

Fingerprints (D2)

4234

0.673

1.897

0.739

1.728

1

Acidic

MACCS (D2)

145

0.658

1.951

0.725

1.775

1

Acidic

Fingerprints (D3)

1663

0.655

1.948

0.710

1.825

1

Acidic

MACCS (D1)

153

0.657

1.953

0.706

1.834

2

Basic

Fingerprints (D2)

4009

0.752

1.540

0.728

1.694

2

Basic

Fingerprints (D1)

4665

0.749

1.551

0.723

1.709

2

Basic

PUBCHEM (D2)

488

0.727

1.622

0.720

1.718

2

Basic

MACCS (D3)

98

0.714

1.663

0.714

1.736

2

Basic

MACCS (D1)

153

0.734

1.601

0.712

1.744

  1. Each group of statistics is ordered by test set RMSE, with the best-performing models listed first. D1 indicates the data set with variables of all 0’s and all 1’s removed. D2 indicates the D1 data set with highly correlated variables removed. D3 indicates the D2 data set with low variance features removed