Skip to main content

Table 1 Amino acid descriptor sets analyzed in the current study

From: Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets

Descriptor set

Type

Derived by

# of components

Variance explained

AAs covered

BLOSUM

Physicochemical and substitution matrix

VARIMAX

10

n/a

20

FASGAI

Physicochemical

Factor Analysis

6

84%

20

MSWHIM

3D electrostatic potential

PCA

3

61%

20

ProtFP (PCA3)

Physicochemical

PCA

3

75%

20

ProtFP (PCA5)

Physicochemical

PCA

5

83%

20

ProtFP (PCA8)

Physicochemical

PCA

8

92%

20

ProtFP (Feature)

Feature based

Hashing

n/a

n/a

20

ST-scales

Topological

PCA

5

91%

167

T-scales

Topological

PCA

8

72%

135

VHSE

Physicochemical

PCA

8

77%

20

Z-scales (3)

Physicochemical

PCA

3

n/a

87

Z-scales (5)

Physicochemical

PCA

5

87%

87

Z-scales (Binned)

Physicochemical

PCA followed by binning

n/a

n/a

20

  1. The first column contains the name of the descriptor set as used in the main text. Further listed are the type, dimensionality reduction, number of components and variance of the original matrix explained. The last column differentiates between descriptor sets only covering the natural amino acids or more. Not available is abbreviated by n/a.