Skip to main content

Table 1 Amino acid descriptor sets compared in the current study

From: Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

Descriptor set

Type

Derived by

# of components

Variance explained

AAs covered

BLOSUM

Physicochemical and substitution matrix

VARIMAX

10

n/a

20

FASGAI

Physicochemical

Factor Analysis

6

84%

20

MSWHIM

3D electrostatic potential

PCA

3

61%

20

ProtFP (PCA3)

Physicochemical

PCA

3

75%

20

ProtFP (PCA5)

Physicochemical

PCA

5

83%

20

ProtFP (PCA8)

Physicochemical

PCA

8

92%

20

ProtFP (Feature)

Feature based

Hashing

n/a

n/a

20

ST-scales

Topological

PCA

5

91%

167

T-scales

Topological

PCA

8

72%

135

VHSE

Physicochemical

PCA

8

77%

20

Z-scales (3)

Physicochemical

PCA

3

n/a

87

Z-scales (5)

Physicochemical

PCA

5

87%

87

Z-scales (Binned)

Physicochemical

PCA followed by binning

n/a

n/a

20

ProtFP (Feature) and Z-Scales (3)

Physicochemical and Feature Based

PCA and Hashing

n/a

n/a

20

Z-Scales (3) and Z-Scales (Avg)

Physicochemical

PCA and target average

n/a

n/a

20

ProtFP (PCA3) and Z-Scales (Binned)

Physicochemical

PCA and binning

n/a

n/a

20

  1. The first column contains the name of the descriptor set as used in the main text. Further listed are the type, dimensionality reduction, number of components and variance of the original matrix explained. The last column differentiates between descriptor sets only covering the natural amino acids or more. Not available is abbreviated by n/a.