Table 1 Amino acid descriptor sets compared in the current study

From: Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

Descriptor set Type Derived by # of components Variance explained AAs covered
BLOSUM Physicochemical and substitution matrix VARIMAX 10 n/a 20
FASGAI Physicochemical Factor Analysis 6 84% 20
MSWHIM 3D electrostatic potential PCA 3 61% 20
ProtFP (PCA3) Physicochemical PCA 3 75% 20
ProtFP (PCA5) Physicochemical PCA 5 83% 20
ProtFP (PCA8) Physicochemical PCA 8 92% 20
ProtFP (Feature) Feature based Hashing n/a n/a 20
ST-scales Topological PCA 5 91% 167
T-scales Topological PCA 8 72% 135
VHSE Physicochemical PCA 8 77% 20
Z-scales (3) Physicochemical PCA 3 n/a 87
Z-scales (5) Physicochemical PCA 5 87% 87
Z-scales (Binned) Physicochemical PCA followed by binning n/a n/a 20
ProtFP (Feature) and Z-Scales (3) Physicochemical and Feature Based PCA and Hashing n/a n/a 20
Z-Scales (3) and Z-Scales (Avg) Physicochemical PCA and target average n/a n/a 20
ProtFP (PCA3) and Z-Scales (Binned) Physicochemical PCA and binning n/a n/a 20
  1. The first column contains the name of the descriptor set as used in the main text. Further listed are the type, dimensionality reduction, number of components and variance of the original matrix explained. The last column differentiates between descriptor sets only covering the natural amino acids or more. Not available is abbreviated by n/a.