Skip to main content
Fig. 1 | Journal of Cheminformatics

Fig. 1

From: Sequence-based prediction of protein binding regions and drug–target interactions

Fig. 1

HoTS model overview. We first constructed a DTI dataset from DrugBank, KEGG, and IUPHAR. We also collected 3D complexes and their binding information (BI) to construct a BR dataset from scPDB and PDBBind. From the collected BI, we generated true BRs to train the BR prediction model. HoTS considers protein sequences of individual proteins and Morgan/circular fingerprints of drug compounds. Therefore, subsequences are extracted by a CNN, and the maximum values are pooled from each protein grid. Compound and protein grids are taken into transformers as queries, keys, and values to model interactions between subsequences and individual compounds. Closely related subsequences and compounds will have high attention, and as much as their attention, values of related subsequences/compounds are merged into new values. After passing the transformers, a compound token is used to predict DTIs, and individual protein grids are used to reflect the BRs. For DTI prediction, HoTS calculates a prediction score PDTI ranging from 0 to 1, as well as center (C), length (W), and confidence (P) scores for BRs. We evaluated the DTI prediction performance using the PubChem Bioassay and BR prediction performance with the COACH and HOLO4K datasets

Back to article page