Skip to main content
Fig. 3 | Journal of Cheminformatics

Fig. 3

From: Machine intelligence-driven framework for optimized hit selection in virtual screening

Fig. 3

a Chemical space module architecture for hit/lead identification. The first module of A-HIOT identifies hit/lead molecules emphasizing chemical space (CS). Here, as per concept, the chemical structures of known inhibitors for CXCR4 protein were collected, transformed into feature vectors, and preprocessed to achieve a machine-readable dataset. The chemical space leverages random forest (RF), extreme gradient boost (XGB), and deep neural networks or deep learning (DNN/DL) algorithms to construct a predictive classification model. We combined these distinctive models into the stacked ensemble where RF and XGB serve as tier-0 learners, receive input data as feature vectors, train h1… ht predictive models and produce z1…zt predictions. The tier-0 predictions serve as input for the tier-1 learner that is DNN (H). The tier-1 algorithm is termed a meta-learner. The wb (b = 1,…,B) indicates the weights assigned to base learners, h(x) (ht(x)…hT(x)) indicates the base-learner vectors, and ε is the normal distribution error. The true positives produced by the CS-driven stacked ensemble framework were the identified leads/hits because the framework learned the inhibitors-like representative feature instances that resulted in a high-performance classification prognostic model. This step ensures reducing the huge and complex dataset to a meaningful one that still demands further optimization. Thus, the CS-driven stacked ensemble framework in the A-HIOT framework achieves hit identification and is herein represented as the red ring. b Protein space module workflow for hits/leads optimization. The second protein space (PS) module of the A-HIOT optimizes hit/lead molecules emphasizing protein–ligand interaction patterns. Initially, the protein structure is obtained and explored for potential binding sites, binding residues within the binding pocket. Furthermore, the balanced dataset collected from chemical space comprising true positives and true negatives. The interaction patterns are established among protein and identified molecules employing docking simulation. The binary fingerprints for each protein–ligand complex are reckoned to assess binding-pattern. These fingerprints serve as deep neural network input and a robust predictive model (PS-driven DNNs framework). The true positives produced by the model were further concatenated along with protein–ligand interaction profile (PLIP) score (di) and re-ranked following binding interaction threshold. The collected molecules implemented in the A-HIOT framework named optimized leads are represented as the blue ring. We have devised this module using CXCR4 as a protein case under study. The \(D\) represents DNN ready dataset where the DNNs output f(\(\alpha\)) for the classification model. Further concatenation with (di) yielded β that produced optimized hit molecules

Back to article page