Skip to main content
Fig. 3 | Journal of Cheminformatics

Fig. 3

From: A multi-label approach to target prediction taking ligand promiscuity into account

Fig. 3

Workflow and datasets of the study. The workflow of this study and datasets used to test the hypothesis. Multi-label dataset, model (MMM) and evaluation procedures are shown in blue and single-label dataset, model (SMM) and evaluation steps are shown in green. The ChEMBL 17 dataset consists of single-label and multi-label compounds. This dataset was randomly split into 2 portions: 70 % as a training set and 30 % as a test set. The MMM was trained on the available training set whereas the SMM was trained only on single-label training set. This single-label training set was extracted from the multi-label training set by simply assuming that each compound belongs to only one target. Out of 19,676 test set compounds, 16,344 test compounds were single-label and 3,332 test compounds were multi-label. Hence, single-label test data set was built from 16,344 single-label test compounds while multi-label test set was built from 3,332 multi-label test compounds. SMM and MMM were tested on both single-label and multi-label test sets. To evaluate the performance of SMM and MMM models on single-label test set “Recall-Precision” and McNemar’s test were employed. On the multi-label test set, ranking scheme was utilised to compare the generalisation ability of the two models

Back to article page