Skip to main content

Table 1 Performance of SVM models using three data options with continuous descriptors, fingerprints and fragment counts

From: Open-source QSAR models for pKa prediction using multiple machine learning approaches

Data option

Data set

Feature sets

Number of features

Train

Fivefold CV

Test

R2

RMSE

Q2

RMSE

R2

RMSE

1

Acidic

Continuous

870

0.96

0.65

0.58

2.18

0.68

1.91

1

Acidic

Fingerprints

1548

0.91

1.00

0.64

2.02

0.71

1.81

1

Acidic

Fingerprints + counts

2104

0.94

0.80

0.64

2.02

0.72

1.80

1

Basic

Continuous

876

0.96

0.64

0.65

1.94

0.65

1.93

1

Basic

Fingerprints

1535

0.91

0.99

0.69

1.84

0.69

1.83

1

Basic

Fingerprints + counts

2079

0.93

0.87

0.72

1.73

0.70

1.80

2

Acidic

Continuous

913

0.98

0.49

0.61

2.10

0.69

1.89

2

Acidic

Fingerprints

1552

0.9

1.05

0.63

2.04

0.69

1.87

2

Acidic

Fingerprints + counts

2141

0.94

0.85

0.63

2.05

0.71

1.81

2

Basic

Continuous

913

0.97

0.52

0.67

1.88

0.66

1.88

2

Basic

Fingerprints

1534

0.90

1.02

0.68

1.83

0.75

1.63

2

Basic

Fingerprints + counts

2085

0.93

0.88

0.71

1.76

0.78

1.53

3

Acidic

Continuous

510

0.96

0.66

0.59

2.17

0.57

2.20

3

Acidic

Fingerprint

1580

0.91

1.00

0.64

2.01

0.68

1.91

3

Acidic

Fingerprints + counts

2395

0.93

0.86

0.65

1.99

0.69

1.87

3

Basic

Continuous

510

0.95

0.75

0.61

2.01

0.6

2.09

3

Basic

Fingerprints

1543

0.91

0.94

0.72

1.72

0.67

1.90

3

Basic

Fingerprints + counts

2358

0.93

0.84

0.73

1.67

0.71

1.79