Skip to main content

Table 5 Binary classification results of task II

From: Development of Natural Compound Molecular Fingerprint (NC-MFP) with the Dictionary of Natural Products (DNP) for natural product-based drug development

Protein targetsPerformancea of each molecular fingerprint obtained by averaging ten external validation tasksb
NC-MFPMACCSPubChemFPGraphFPAPFP
ACCc (%)F1d (%)MCCeACCc (%)F1d (%)MCCeACCc (%)F1d (%)MCCeACCc (%)F1d (%)MCCeACCc (%)F1d (%)MCCe
Protein-tyrosine phosphatase 1B (NPT 178)78.9880.650.5766.9072.560.3269.6674.400.3667.2471.880.3361.0358.070.29
Acetylcholinesterase (NPT 204)73.4276.420.4970.7975.750.4270.0076.150.4166.5872.050.3059.7463.940.18
Aldose reductase (NPT 68)83.2083.510.7676.0077.350.5675.6075.030.5969.6071.010.4159.2047.030.24
Beta-secretase (NPT 740)87.2088.640.8377.2080.480.5573.2077.460.4577.2081.440.5371.2074.780.48
Cyclooxygenase-2 (NPT 31)84.7686.370.7874.2879.300.5669.5274.690.4573.3377.360.4563.8160.260.35
Butyrylcholinesterase (NPT 439)87.8988.820.8878.9581.530.6471.0575.050.5174.7477.130.5577.3578.570.56
Cyclooxygenase-1 (NPT 324)88.3389.420.7679.4582.930.6378.8983.320.6577.7882.390.6573.8973.730.52
Average83.4084.830.7274.8078.560.5372.5676.590.4972.3576.180.4666.6065.200.37
  1. The seven target proteins of task II and the compounds summarized in Table 1
  2. aThe performance index consist of accuracy (ACC), F1-score (F1) and the Matthews Correlation Coefficient (MCC)
  3. bThe result of performance about the binary classification task II. The external validation data set for each target is randomly selected 10 times from both active and inactive compound set of the target protein as of 20% in each target proteins. “NC-MFP” stands for Natural Compound Molecular Fingerprints and “APFP” for AtomPairs2DFingerprint and “GraphFP” for GraphOnlyFingerprint. “MACCS” reports Molecular Access System keys fingerprints and “PubChemFP” stands for PubChem fingerprint
  4. CThe accuracy (ACC) is the proportion of the total number of correct predictions
  5. dF1-score (F1) is the harmonic average of precision and sensitivity
  6. eMatthews Correlation Coefficient (MCC) is used to evaluate the binary classification performance. MCC has a range of − 1 to 1 where − 1 means a completely wrong binary classifier while 1 means an entirely correct binary classifier