 Review
 Open access
 Published:
Structurebased, deeplearning models for proteinligand binding affinity prediction
Journal of Cheminformatics volumeÂ 16, ArticleÂ number:Â 2 (2024)
Abstract
The launch of AlphaFold series has brought deeplearning techniques into the molecular structural science. As another crucial problem, structurebased prediction of proteinligand binding affinity urgently calls for advanced computational techniques. Is deep learning ready to decode this problem? Here we review mainstream structurebased, deeplearning approaches for this problem, focusing on molecular representations, learning architectures and model interpretability. A model taxonomy has been generated. To compensate for the lack of valid comparisons among those models, we realized and evaluated representatives from a uniform basis, with the advantages and shortcomings discussed. This review will potentially benefit structurebased drug discovery and related areas.
Graphical Abstract
Introduction
Proteins, which frequently interact with other molecules to perform functions, are key participators in a wide spectrum of cellular processes. Interactions may occur between proteins and diverse ligand types, such as small organic molecules, nucleic acids and protein peptides. Particularly, inhibitors that bind to specific proteins to mediate disease progression (e.g. Gefitinib to EGFR protein in cancer therapies [1]) are examples of smallmolecule ligands, making the interactions between such ligands and the target proteins a valuable objective of drugdevelopment research.
Studies of proteinligand interactions are mainly focused on the sites, modes or affinities of binding [2]. A druglike ligand typically interacts with the target protein in a specific binding site (mostly a deep pocket), through a favorable binding orientation. The ligands that bind to the protein with high affinities are the initial aim of a drugdiscovery pipeline. Determining the binding poses (site and orientation) for ligands to a target protein and estimating the binding affinities have therefore become two essential problems in computational drug discovery (CDD). Molecular docking is a welldeveloped class of computational methods that determine ligandbinding poses by efficiently searching the structural space and scoring the candidate poses [3]. Current docking methods can fastly produce binding poses that are quite close to the Xray conformations (RMSD within 2\(\mathring{A}\)) [4], offering a possible alternative to experimentally resolved binding poses (e.g. by Xray crystallography [5] and NMR spectroscopy [6]). A docking method commonly leverages a forcefield [7,8,9,10,11] to estimate the intermolecular forces (e.g. electrostatic interactions, van der Waals forces and desolvation effects), and recommends those binding poses with better forcefield scores. Although such scoring schemes are capable of measuring binding poses, they often fail in further tasks like distinguishing binders from nonbinders and ranking the ligands for target proteins. Binding affinities, commonly quantified by dissociation constant (\(K_d\)) or inhibition constant (\(K_i\)), are more competent scores in these tasks. Effectively predicting such binding affinities is thus crucial, but has long been an open challenge in CDD.
Although a group of models for proteinligand binding affinity prediction (PLBAP) rely on simple protein sequences and their evolutionary information (e.g. DeepDTA [12], DeepFusionDTA [13], GraphDTA [14] and CAPLA [15]), decoding the affinities from a deeper, structural perspective is always of high interests. The rapid release of proteinligand binding structures (poses), by either docking engines or experimental techniques, provides a structural basis for rational PLBAP. Alongside the structural data, the increasingly revealed experimental affinity data (e.g. \(K_{d/i}\) and IC50) [16, 17] has further facilitated supervised learning for PLBAP. Earlier machinelearning PLBAP models place a heavy emphasis on feature engineering, where proteinligand interactions are estimated by domainexpertisedriven rules [18] or represented by exhaustive relevant factors [19, 20]. Later, there is a trend towards simplified feature engineering [21,22,23,24] and more powerful learning processes in PLBAP. Nevertheless, traditional machinelearning models (e.g. random forests and shallow neural networks) often have limited learning capabilities that hardly achieve favorable predictions.
In recent decades, deep neural networks (DNNs), which are credited with the strong learning capability on less engineered and unstructured data, have come into play in PLBAP. DNNs can absorb simple inputs, like atom coordinates and types [25] or statistics forms of pairwise atomcontacts [26], and learn them to predict proteinligand binding affinity in an endtoend manner. Beyond that, DNNs are prevalently used to learn geometric representations of proteinligand complex structures [27, 28], such as voxelized grids [29] or molecular graphs [30], to provide highquality PLBAP. Noteworthily, most of these works encounter heterogeneous data processing, coding platforms and validation procedures, calling for a comprehensive review and evaluation on them. On the other hand, although showing great potential in predictive accuracy in PLBAP, most DNNs are frequently questioned of their low interpretability. A reasonable discussion on their interpretabilities at the model level or in the posthoc analysis stage [31,32,33,34,35] is the other goal of this work. Last but not least, there is a lack of exploring the screening performances of those deeplearning models in current works, bearing down on their practical value and requiring a study on their screening power. In what follows, we review mainstream deeplearning PLBAP models with a focus on the feature representations, learning architectures and interpretability. To compensate for the lack of valid and fair comparisons among them, a series of evaluations on the scoring and screening power of those models have been accomplished.
Deeplearning PLBAP models
According to the feature representations and learning architectures, deeplearning PLBAP models are roughly categorized as in Table 1.
PLBAP based on \(T_{ACNN}\) models
Gomes and coworkers have devised Atomic Convolutional Neural Networks (ACNNs), which absorb the coordinates \(\varvec{{\mathcal {C}}}=\{\varvec{{\mathcal {C}}}_ii=1,\ldots ,N\}=\{(x_i, y_i, z_i)i=1,\ldots ,N\}\) and types \(\varvec{\mathcal {ATP}}=\{atp_ii=1,\ldots ,N\}\) of atoms in a molecular structure (Fig.Â 1A) and output the estimated energy E of this molecule [25]. A molecule is represented by a feature tensor \(\textbf{T}(i,j,k)\) outlining the local chemical environments of each atom. \(\textbf{T}(i,j,k)\) is generated by applying atomtype convolutions to the distance matrix (\(\in {\mathbb {R}}^{N\times M}\)) [45] and atomtype matrix (\(\in {\mathbb {R}}^{N\times M}\)), which are derivatives of \(\varvec{{\mathcal {C}}}\) and \(\varvec{\mathcal {ATP}}\). It can be expressed as:
where \({\mathcal {C}}_i\) represents the coordinates of the ith atom \(\textbf{a}_i\) (\(i=1,\ldots ,N\)), \(\textbf{a}_{i_j}\) (\(j=1,\ldots ,M\)) is the jth nearest spatial neighbor of \(\textbf{a}_i\), and \(\omega _k\in \Omega\) (\(k=1,\ldots ,K\)) indicates a specific atom type (e.g. C, O and N). Such a feature tensor (\(\in {\mathbb {R}}^{N\times M\times K}\)) is fed into a radialpooling layer to prevent overfitting and reduce parameters. A pooling filter \(f_q\) (\(q=1,\ldots ,Q\)) combines the pairwise interactions between an atom \(\textbf{a}_i\) and its neighbors having a specific type \(\omega _k\) as:
where \(R_c\) is a distance threshold (e.g. \(12\mathring{A}\)), and \(r_q\) and \(\sigma _q\) are learnable parameters. The feature tensor after pooling (\(\in {\mathbb {R}}^{N\times K\times Q}\)) is flattened and fed rowwise into several atomistic dense layers. Outputs for each row indicate the estimated atomic energy (\(E_i\)), and combining them yields the total estimated energy (E) of the molecule.
ACNNbased PLBAP adopts a learning architecture that implies a ligandbinding thermodynamic cycle (Fig.Â 1B). The binding affinity in this architecture is estimated as the energy difference between the complex and the two binding molecules (\(y=\Delta G = G_{complex}G_{protein}G_{ligand}\)). As reported in this work, simply employing 15 atom types (C, N, O, F, Na, Mg, P, S, Cl, Ca, Mn, Zn, Br, I and others regarded as a single type), 3 radial filters (\(r_q\) = 0, 4.0 or 8.0, \(\sigma _q^2\) = 2.5) and 3 atomistic dense layers (sizes of 32, 32 and 16) can yield stateoftheart prediction performances (validated on PDBbind benchmarks).
Model Interpretability \(T_{ACNN}\) models possess a hierarchical structural of modellevel interpretability. The atomtype convolutions and radial pooling operations lead to the estimation of atomic pairwise interactions, providing the interpretability at an elementary level. The atomistic fully connected layers then increase this interpretability to a molecular level, by accumulating pairwise interaction energies into the total energy of a molecule. At the top level, a thermodynamic cycle of ligandbinding process is imposed to achieve an overall interpretability in physicochemical mechanisms.
PLBAP based on \(T_{IMCCNN}\) models
This category represents proteinligand interactions with intermolecular contacts (IMCs), and feeds the reorganized features (e.g. matrices) to 2dimensional convolutional neural networks (2DCNNs) for learning the data relationships. An intermolecular contact is defined as a pair of atoms, one from the protein \({\textbf{a}}_i^P\) and the other from the ligand \({\textbf{a}}_j^L\), within a distance threshold \(d_{cut}\) [21]. Considering all atom types for the protein (\(\Omega ^P\)) and ligand (\(\Omega ^L\)), it leads to \(M=\Omega ^P\times \Omega ^L\) types of IMCs. These IMCs can be further refined using the concept of shell space [26]. Regarding \(\textbf{a}_j^L\) as a spherical center, the space between two spherical boundaries (with radii of \(d_{cut1}\) and \(d_{cut2}\)) forms a shell and any protein atom \(\textbf{a}_i^P\) within this shell will form a refined IMC with \(\textbf{a}_j^L\). For a proteinligand complex, M IMC types \(\Omega ^{IMC}=\{\omega ^{IMC}_m\}=\{(\omega _1^m,\omega _2^m)\omega _1^m \in \Omega ^P,\omega _2^m\in \Omega ^L,m=1,\ldots ,M\}\) and K distance shells \(\Delta =\{\delta _k\}=\{(d_{cut1}^k,d_{cut2}^k]k=1,\ldots ,K\}\) result in a feature matrix (\(\in {\mathbb {R}}^{M\times K}\)) exhibiting multirange intermolecular interactions (Eq.Â 3).
OnionNet employs \(K=60\) shells spanning from 0 to \(30\mathring{A}\) (\(\delta _1=(0,1\mathring{A}]\), \(\delta _2\sim \delta _{60}\) with fixed intervals of \(0.5\mathring{A}\)), and 8 types for both protein and ligand atoms (\(\Omega ^P=\Omega ^L=\{\)C, N, O, H, P, S, HAX and Du\(\}\)) to identify IMCs. Similarly, OnionNet2 profiles the contacts between protein residues and ligand atoms in different distance shells [36]. Regarding each type of IMCs (\(\omega ^{IMC}_m\)) within a distance shell (\(\delta _k\)) as a specific type of interactions, we can profile these interactions using quantities, average contact distances and other properties (e.g. pharmacophoric features). IMCPScore [37] simply profiles such interactions by quantity of the contacts and the average atomic distances of them (Eq.Â 4).
IMCbased features can be arranged as matrices or tensors (Fig.Â 2A) to be fed into 2DCNNs. Conventional 2DCNN architectures are commonly adopted for learning these features, and Fig.Â 2B presents the one used by OnionNet [26]. It includes 3 consecutive convolution layers (\(4\times 4\) kernels with stride 1), 1 flattening layer, 3 consecutive dense layers (400, 200 and 100 units) and 1 output layer. In the modeltraining phase, a customized loss function, involving both the Personâ€™s correlation coefficient and the rootmeansquare error, is adopted by OnionNet. This category of models are easy to generate, and have led to competitive PLBAP (validated on PDBbind benchmarks).
Model Interpretability: Although neither modellevel nor posthoc interpretability was provided in the original works of \(T_{IMCCNN}\) Models, they can be partly explained in a posthoc manner, such as by measuring the feature importance in affinity predictions.
PLBAP based on \(T_{GridCNN}\) models
This category leverages molecular grids to represent proteinligand complexes, and employs threedimensional CNNs (3DCNNs) to learn the grids. The molecular grid representation of a proteinligand complex structure \(\varvec{{\mathcal {S}}}\) emphasizes the binding area instead of the whole structure, in order to ease the computational burden. It captures the features of the binding area at regularly spaced intervals (resolution). Suppose the binding area of \(\varvec{{\mathcal {S}}}\) is represented as a grid with the size of \(X\mathring{A}\times Y\mathring{A}\times Z\mathring{A}\) and the resolution of \(r\mathring{A}\). Each cell \(\textbf{c}\) (\(r\mathring{A}\times r\mathring{A}\times r\mathring{A}\)) in the grid is delineated as a feature vector \(\textbf{f}^\textbf{c} = (f_1^\textbf{c}, f_2^\textbf{c}, \ldots , f_K^\textbf{c})\), indicating a multichannel voxel. Integrating all these voxels leads to a 4D tensor as follows,
Here (x,Â y,Â z) indicates the center of \(\textbf{c}\). Given a complex structure and the grid size (e.g. \(X=Y=Z=24\) and \(r=1\) in KDEEP [29]), the key to constructing a molecular grid is properly assigning features to each cell.
All \(T_{GridCNN}\) models start from the atomlevel features. They mostly cover general properties (e.g. atom types) [29, 38, 39, 41, 46], physicochemical properties (e.g. excluded volume, partial charge, heavyatom neighbors, heteroatom neighbors, and hybridization) [29, 38, 46] and pharmacophoric properties (e.g. hydrophobicity, aromaticity, Hbond donor/acceptor, and ring member) [29, 38,39,40, 46]. These properties are commonly estimated by SMARTS patterns [47, 48] or simple geometric rules [48, 49]. Each atom \(\textbf{a}_i\) is characterized by K properties as \(\textbf{p}^{\textbf{a}_i} = (p_1^{\textbf{a}_i}, p_2^{\textbf{a}_i}, \ldots , p_K^{\textbf{a}_i})\), which can be used to fill in the molecular grid having a coincident center with the ligand. There are two common strategies for filling information in the grids. KDEEP, DeepAtom and CNNScore adopt an expensive method that measures the contributions of each atom \(\textbf{a}_i\) to each cell \(\textbf{c}_j\) and accumulates the contributions for \(\textbf{c}_j\). As an instance, KDEEP quantifies the contributions by Euclidean distances and calculates the kth channel feature of cell \(\textbf{c}_j\) as Eq.Â 6.
Where \(r_{VDW}^{\textbf{a}_i}\) is the van der Waals radius of \(\textbf{a}_i\), and \({\mathcal {C}}_i^A\) and \({\mathcal {C}}_j^C\) are coordinates of the centers of \(\textbf{a}_i\) and \(\textbf{c}_j\). Another strategy is simply aggregating the features for atoms located in each cell. Pafnucy, DeepFusionNet [46] and Sfcnn employ this strategy, which is efficient but may lead to low interpretability (e.g. for categorical features). Given a gridfilling strategy, a complex can be represented by one filled grid covering all protein and ligand atoms (Fig.Â 3A), or two concatenated grids treating protein and ligand atoms separately (Fig.Â 3B). Due to the lack of rotation invariance of grid representations, data augmentation by rotating the grids is frequently adopted to strengthen the data (Fig.Â 3C).
The learning architectures employed by this category include simple (similar to Fig.Â 2B) [38], selfdeveloped [41] or welldeveloped architectures in other fields (e.g. SqueezeNet [29, 50], ShuffleNet [40, 51] and Caffe [39, 52]). As demonstrated in the work of Sfcnn, going deeper in CNN architectures did not promote prediction improvements. Considering the large resources (augmented grids) consumed here, lightweight learning architectures like SqueezeNet (used by KDEEP) is a fine option. SqueezeNet was first developed to compress the learnable parameters in earlier architectures like AlexNet [53], and inspired the architecture of KDEEP exceedingly (Fig.Â 3D). The grid representations will first go through a convolution layer (\(7\times 7\times 7\) kernels with stride 2) and a series of fire modules before the final output layer. Each fire module is composed of a squeeze layer (n \(1\times 1\times 1\) kernels) and an expand layer (4n \(1\times 1\times 1\) and 4n \(3\times 3\times 3\) kernels). For instance, Fire2 module involves 16 kernels in squeeze layer and 128 kernels (64 \(1\times 1\times 1\) and 64 \(3\times 3\times 3\) kernels) in expand layer. The pooling layers combine \(3\times 3\times 3\) voxels at strides of 2. This category plays a major role in deeplearning PLBAP models (validated on PDBbind benchmarks), while may be limited to the expensive computations.
Model Interpretability: KDEEP and DeepAtom lack both modellevel and posthoc interpretability [29, 40]. CNNScore provides a visualization strategy for evaluating predictionlevel posthoc interpretability. It applies masking [54] to various regions in a grid, and the maskinginduced differences in predicted scores yield a heatmap revealing important regions. Crucial residues in the binding area are often highlighted in such analyses, implying that CNNScore predicts binding affinities based on key features of proteinligand interactions. Pafnucy adopts two ways in posthoc interpretability analysis. L2regularized modeltraining provides the profile of feature importance by showing the weight distributions of the firsthiddenlayer convolutional filters. Widerrange weights are proposed to pass more information to the deeper layers and therefore have greater impact on the predictions. Aside from above datasetlevel interpretations, Pafnucy also provides a voxelremoval strategy for predictionlevel interpretations. By removing voxels (\(5\mathring{A}\times 5\mathring{A}\times 5\mathring{A}\)) at different positions in the featurization area (\(20\mathring{A}\times 20\mathring{A}\times 20\mathring{A}\)), the resulted prediction changes were investigated further. Key intermolecular interactions (e.g. Hydrogen bond, \(\pi\)\(\pi\) interaction and hydrophobic contacts) were revealed by such analysis. Sfcnn was explained at the predictionlevel, by hotspot areas of the input features that are closely related to the predictions [41]. These hotspot areas or heatmaps were generated based on gradientweighted class activation mapping (GradCAM) analysis [55] and visualized using Mayavi [56]. As uncovered in the work of Sfcnn, such hotspot areas highly corresponded to important proteinligand interactions like hydrophobic contacts and hydrogen bonds.
PLBAP based on \(T_{GraphGCN}\) models
This group of models represent a proteinligand complex by a graph \(\{\textbf{V}, \textbf{E}\}\), where \(\textbf{V}\) indicates the nodes and \(\textbf{E}\) the edges. (i) For PLBAP, \(\textbf{V}=\{\textbf{a}_ii=1,\ldots ,N\}\) generally covers all the ligand atoms and the atoms in the ligandbinding site of the protein (e.g. those within a predefined distance from any ligand atom). Practically, a fixed number N for a set of complexes, such as \(N=200\) adopted by GraphBAR [30], is required for batchcomputations. Each \(\textbf{a}_i\in \textbf{V}\) is characterized by M atomlevel features that resemble those in grid representations (Sect.Â PLBAP based on \(T_{GridCNN}\) models), leading to a nodefeature matrix \({\mathcal {M}}_V\in {\mathbb {R}}^{N\times M}\) for each complex. (ii) Originally, \(\textbf{E}\) of a molecular graph encompasses all the covalent bonds, which can be encoded in an adjacency matrix \(\textbf{A}\in {\mathbb {R}}^{N\times N}\) with \(\textbf{A}_{ij}=1\) signifying a chemical bond between atoms \(\textbf{a}_i\) and \(\textbf{a}_j\). As an instance, APMNet [42] considers the covalent bonds as \(\textbf{E}\) in the graph representations for PLBAP. However, the binding between a protein and its ligands counts heavily on noncovalent interactions, such as hydrogen bonds and \(\pi \pi\) stacking. It necessitates the generalization of \(\textbf{A}\) to an adjacency tensor (\({\mathbb {R}}^{N\times N\times N_{et}}\)) as below.
Where \(N_{et}\) is the number of edge types, and any slice of the tensor \(\textbf{A}_{::k}\) indicates a specific type of adjacency. Different from the chemical bonds, noncovalent interactions are commonly determined according to pairwise atomic distances below some threshold values. PotentialNet [43] uses the first slice \(\textbf{A}_{::1}\) to show covalent adjacency, while the following \(\textbf{A}_{::k}\) \((k\ge 2)\) to indicate noncovalent interactions identified by distance thresholds (e.g. \(<3\mathring{A}\)). GraphBAR [30] relies on \(N_{et}\) distance shells \(\Delta =\{\delta _k\}=\{(\frac{4(k1)}{N_{et}}, \frac{4k}{N_{et}}]k=1,\ldots ,N_{et}\}\), and assigns \(\textbf{A}_{ijk}=1\) if the distance between \(\textbf{a}_i\) and \(\textbf{a}_j\) falls in the kth shell. DeepFusionNet [46] adopts two distance shells \(\Delta =\{\delta _1,\delta _2\}=\{(0, 1.5], (1.5, 4.5]\}\) to discriminate between covalent and noncovalent adjacencies, and directly utilizes the atomic distances as the adjacency values (Eq.Â 8).
Similarly, GraphDTI [44] presents the covalent adjacency by the first slice \(\textbf{A}_{::1}\) (logical), while combines the covalent and noncovalent interactions within \(5\mathring{A}\) in \(\textbf{A}_{::2}\) (Eq.Â 9).
Here the adjacency values for noncovalent interactions are weaker than those for covalent bonds. Beyond above, some models (e.g. APMNet [42]) further characterize the edges by onehot encoding of multiple bond types (e.g. single, double and triple bonds), leading to an edgefeature matrix \({\mathcal {M}}_E\). A schematic diagram of graph representations is displayed in Fig.Â 4A. Models like PLANET [57] and GraphscoreDTA [58] treat protein residues as nodes and connect consecutive residues by edges, which result in simple 1D graphs and are regarded as sequencebased. Accordingly, they are out of scope for this review.
Molecular graph representations that are invariant to rotations [27, 28] can be learned by Graph Convolutional Networks (GCNs) [59,60,61]. Most GCNs adopt a messagepassing mechanism, which iteratively updates the features of each node (\(h_i^{t+1}\)) by gathering information from its neighborhood (\(r_i^{t+1}\)) and generates a graphlevel feature vector (\(\hat{f}\)) based on updated node features. This process can be expressed as follows.
where \(h_i^0\) comes from the initial node features \({\mathcal {M}}_V\), \(Nr(\textbf{a}_i)\) indicates all the neighboring atoms of \(\textbf{a}_i\) upon a specific type of adjacency, T is the number of iterations, and \(MP_t\), \(U_t\) and Gr (permutationinvariant) are learned functions that differentiate among various GCN models. GraphBAR relies on a spectral GCN architecture (Fig.Â 4B) to learn the molecular graphs. The nodefeature matrix \({\varvec{{\mathcal {M}}}}_V\) is preprocessed (by dense layer with 128 units and a dropout rate of 0.5) before going into graph convolutional blocks \(GCB_k\) (\(k=1,\ldots ,N_{et}\)). The fundamental propagation rule for layers in \(GCB_k\) is \(\textbf{H}_k^{t+1} =\sigma (\textbf{L}_k\textbf{H}^{t}_k\Theta ^t_k)\), where \(\textbf{H}^{t}_k\) is the nodefeature matrix of the tth layer, \(\Theta _k\) is a matrix of trainable parameters (\(\in {\mathbb {R}}^{N_{in}\times N_{out}}\)), \(\sigma (\cdot )\) indicates an activation function (e.g. ReLU) and \(\textbf{L}_k\) cencerns the kth type of adjacency (\(\textbf{L}_k=\textbf{D}^{\frac{1}{2}}\tilde{\textbf{A}}^k\textbf{D}^{\frac{1}{2}}=\textbf{D}^{\frac{1}{2}}(\textbf{A}_{::k}+\textbf{I}_N)\textbf{D}^{\frac{1}{2}}\) with \(\textbf{D}_{ii}=\sum _{j}\tilde{\textbf{A}}^k_{ij}\)). Each \(GCB_k\) includes three convolutional layers (128, 128 and 32 filters) and three dense layers (128, 128 and \(16N_{et}\) units) with a dropout rate of 0.5. Aggregating all node features in \(GCB_k\) (\(\hat{f}_k\)), concatenating them (\(\mathbin \Vert _{k}\hat{f}_k\)) and connecting them to a dense layer (128 units with dropout) finally lead to the output of binding affinity. APMNet primarily involves two messagepassing modules in its learning architecture. Module 1 includes a series of graph convolutional skip blocks \(GCSB_k\), with each block considering the intial nodefeature matrix (\({\mathcal {M}}_V\)) and sharing the weights during feature propagations. The outputs \(\textbf{H}^{T}_k\) from \(GCSB_k\) (\(k=1,\ldots ,K\)) in module 1 are averaged (\(\bar{\textbf{H}}\)) and fed into module 2 for further learning, with \({\mathcal {M}}_E\) taken into consideration. The outputs of module 2 are aggregated at the nodelevel and connected to the dense/output layer for PLBAP. PotentialNet connects two gated graph neural network (GGNN) modules in a cascade way, and gathers the graph features at a nodelevel (ligand atoms only) to feed them into a number of dense layers. GraphDTI [44] leverages the gated graph attention (distanceaware) layers to update node features and learn noncovalent interactions at the binding site. The updated features after T layers for all ligand atoms are aggregated and fed to dense layers for predictions. Favorable PLBAP performances have been yielded from this category of models (validated on PDBbind benchmarks).
Model Interpretability: GraphBAR is to some extent explainable at the modellevel. Each filter corresponding to \(\textbf{A}_{::k}\) convolves the firstorder neighborhood of a node and generates related node features. Summed features of all nodes (rowwise aggregation of \(\textbf{H}_k^T\)) imply specific proteinligand interactions in the binding site, and concatenating various interactions for a proteinligand pair then leads to the total binding affinity. Analogously, other models such as APMNet and GraphDTI can also be interpreted at the modellevel from the perspective of energies. Beyond that, these models can also be explained by measuring the feature importance in the predictions, as a posthoc analysis.
Evalution of models
Evaluation of scoring performances
To generally evaluate the four types of models (\(T_{ACNN}\), \(T_{IMCCNN}\), \(T_{GridCNN}\) and \(T_{GraphGCN}\)), we have constructed representatives using the uniform training data and propertygeneration rules. Training and validation data. The frequently accessed PDBbind Refined Set (V2020) [16, 62] was employed for model training, with the Core Set used for hyperparameter tuning. Two CSARHiQ data sets [63, 64] from another source were adopted for testing the models. These sets (details in Additional file 1: Table S1) are all comprised of experimentally determined proteinligand complex structures with their binding constants (\(K_{d/i}\)). The original sizes of them are 5,316 for Refined Set, 285 for Core Set, 175 for CSARHiQ Set 1 and 167 for CSARHiQ Set 2, respectively. 460 overlapped complexes between the Refined Set and the others were removed from the Refined Set, resulting in a final training set of 4856 complexes. A PLBAP model attempts to correlate the structure of a proteinligand complex with the binding affinity (\(\log K_{d/i}\) in this study). Atomic property generation. General and pharmacophoric properties of atoms in the proteinligand complexes were generated by OpenBabel [65] and RDKit [66]. Standing on atomic properties, different molecular representations for \(T_{ACNN}\), \(T_{IMCCNN}\), \(T_{GridCNN}\) and \(T_{GraphGCN}\) models can be generated. Model training. Given a feature representation (e.g. atom coordinates/types, IMC matrix, grid or graph), we mainly tuned the parameters (e.g. batch size bs and number of epochs epc) related to the training process, with the majority of model parameters fixed (from wellvalidated architectures). The learning architectures were realized using Tensorflow with the loss function of mean squared error and the optimizer of Adam. Hyperparameters were tuned by KerasTuner, and all computations were GPUaccelerated. Model construction details can be found in the Additional file. Evaluation rules. Pearsonâ€™s Correlation (PC) and rootmeansquared error (RMSE) between the predicted and true binding affinities were adopted as the evaluation indices. A higher PC and a lower RMSE indicate a better prediction performance.
By combining different feature representations with various model architectures, we have trained 26 representatives (\(M_1\sim M_{26}\)) belonging to the four types of models (\(T_{ACNN}\): \(M_1\sim M_6\), \(T_{IMCCNN}\): \(M_7\sim M_{10}\), \(T_{GridCNN}\): \(M_{11}\sim M_{18}\) and \(T_{GraphGCN}\): \(M_{19}\sim M_{26}\)). The scoring performances of these models (details in Additional file 2: Table S2) are now presented in Fig.Â 5, where a band covers the performances of all the models in each group and a line shows the median performance of each model group.
Considering both the training and testing phases, \(\mathbf {T_{GridCNN}}\) models are more easily to overfit the training data (a high training PC  median of 0.9899, but moderate testing PCs  medians of 0.6128/0.7090 for the two CSARHiQ sets). In the testing phase, \(\mathbf {T_{IMCCNN}}\) and \(\mathbf {T_{GraphGCN}}\) models stand out as two strong competitors (median testing PCs of 0.6396/0.6847 for \(\mathbf {T_{IMCCNN}}\) and 0.6424/0.7054 for \(\mathbf {T_{GraphGCN}}\)), while \(\mathbf {T_{ACNN}}\) models generally perform inadequately in the predictions (median testing PCs of 0.5363/0.6785). The \(\mathbf {T_{GridCNN}}\) models have a wider span in PC, mostly because of the marked difference between augmented grids and original data. However, the large computational resources consumed in the learning of augmented data by \(\mathbf {T_{GridCNN}}\) models strongly hinder the further development of such models. As shown in our experiments, quadrupled grids led to an approximately fourtime growth in training time and storing memory (Additional file1: Table S3). Taking into account the prediction accuracy and required computational resources, \(\mathbf {T_{GraphGCN}}\) models are arguably the most promising and refinable methods in current PLBAP tasks.
Regarding the 26 representative models, the best performers in terms of the validation PC (\(M_5\), \(M_9\), \(M_{12}\) and \(M_{26}\) in Additional file 1: Table S2) were selected to stand for the four types of models. These models are described as follows.

\(M_5\) is a \(T_{ACNN}\) model. It employs 12 neighbors and 15 atom types in the atomtype convolution layer. A distance threshold of \(R_c=12\mathring{A}\), 6 filters (interval of \(2\mathring{A}\) for \(r_q\)) and \(\sigma _q^2=2.5\) are adopted for radial pooling. 3 atomistic dense layers (sizes of 32, 32 and 16) are stacked to yield the molecular energy. The whole model was trained with 200 epochs and a batch size of 24.

\(M_9\) is a \(T_{IMCCNN}\) model. Its feature representation (\(64\times 60\) matrix) concerns 64 IMCs and 60 distance shells (from OnionNet). The model, with a similar architecture as OnionNet (\(conv1 = 16\), \(conv2 = 64\) and \(conv3 = 128\)), was trained with 200 epochs and a batch size of 128.

\(M_{12}\) is a \(T_{GridCNN}\) model. Its feature representation (\(21\times 21\times 21\times 16\) tensor) emphasizes a \(20\mathring{A}\times 20\mathring{A}\times 20\mathring{A}\) grid with a resolution of \(1\mathring{A}\), and captures the properties of protein and ligand atoms separately (each for 8 properties from KDEEP) at each voxel. The final model, with a lightweight architecture from KDEEP, was trained with 100 epochs, a batch size of 64 and a learning rate of \(10^{5}\) (L2regularization adopted to prevent from overfitting).

\(M_{26}\) is a \(T_{GraphGCN}\) model. A threshold of \(6\mathring{A}\), which crops a binding area of \(<400\) atoms for each complex, is adopted by this model. Its feature representation then involves a nodefeature matrix (\(400\times 18\)) concerning 18 atomic properties from Pafnucy, and an adjacency tensor (\(400\times 400\times 3\)) with each slice indicating intermolecular contacts in a certain range (\(0\sim 2\mathring{A}\), \(2\mathring{A}\sim 4\mathring{A}\) or \(4\mathring{A}\sim 6\mathring{A}\)). The model, with a similar architecture as GraphBAR (4 layers in each convolutional block), was trained with 200 epochs and a batch size of 64.
The scoring performances of these models are exhibited in Table 2.
Model interpretability
\(T_{ACNN}\) models can be explained, to some extent, at the modellevel (Fig.Â 6A). While the other three types of models (\(T_{IMCCNN}\), \(T_{GridCNN}\) and \(T_{GraphGCN}\)) can be interpreted in a posthoc manner, mostly by revealing the feature significance and detecting hotspot areas. Based on the three best performers in Table 2 (\(M_9\) for \(T_{IMCCNN}\), \(M_{12}\) for \(T_{GridCNN}\) and \(M_{26}\) for \(T_{GraphGCN}\)), we leveraged a datasetlevel masking technique to uncover important features for each model. We first evaluated each model on the validation set (PDBbind Core Set), yielding the PC of \(pc_0\) and the RMSE of \(rmse_0\). Then specific features were masked (set to zero) for all complexes in the validation set, and the masked data were fed into the model for a reevaluation (yielding \(pc_i\) and \(rmse_i\)). A larger PC drop (\(\Delta pc_i = pc_ipc_0\)) or RMSE increase (\(\Delta rmse_i = rmse_irmse_0\)) implies higher importance of the masked features.
\(\mathbf {T_{IMCCNN}}\). \(M_9\) represents a complex by an IMC matrix (\(64\times 60\)), where each position (j,Â k) in this matrix is a specific feature and its importance can be measured through the masking scheme. By collecting the importance data with respect to all the positions, the heatmaps regarding PC drops and RMSE increases were generated (Fig.Â 6B). Here intermolecular contacts in distance shells \(s_{20}\sim s_{26}\) (\(11\mathring{A}\sim 14\mathring{A}\)) are more highlighted for a PC drop, and those in \(s_{44}\sim s_{52}\) (\(23\mathring{A}\sim 27\mathring{A}\)) are more important for an RMSE increase. Another model \(M_7\) in this category can be explained similarly, as displayed in Additional file 1: Figure S1. \(\mathbf {T_{GridCNN}}\). \(M_{12}\) characterizes a complex by a molecular grid (\(21\times 21\times 21\times 16\)), and we masked the features in two ways. First, each position (j,Â k,Â l) (\(1\le j,k,l\le 21\)) in the grid was masked for importance investigation (Fig.Â 6C). Here the origin is the ligand center and the protein atoms around this center show higher importance in PC drops or RMSE increases. Due to the various proteinligand binding orientations, this datasetlevel study can only show a rough picture of the position importance. Second, we masked each property channel of the grid voxels (total of 16 channels), leading to an importance plot in Fig.Â 6E. Apparently, the ligandrelated channels play a more important role than the proteinrelated channels, and the increase in RMSE is more correlated with the excluded volume of ligand atoms. A similar interpretation for \(M_{11}\) in this category is shown in Additional file 1: Figures S2\(\sim\)3. \(\mathbf {T_{GraphGCN}}\). \(M_{26}\) represents a complex by a nodefeature matrix (\(400\times 18\)) and an adjacency tensor (\(400\times 400\times 3\)). Each node feature (total of 18 features) was examined according to the masking technique, generating an importance plot in Fig.Â 6D. As shown here, features like partial charge, ring membership, hydrophobicity and hydrogenbond donor are more important for a PC drop. The hybridization type stands out for an increase in RMSE, followed by partial charge and ring membership. As another example, \(M_{23}\) in this category can be interpreted by Additional file 1: Figure S4.
Evaluation of screening performances
As another evaluation of above models, the screening powers that show the capability of identifying active binders (actives) from nonbinders (decoys) were estimated. Validation data. As a frequentlyaccessed database in molecular docking tasks, the enhanced directory of useful decoys (DUDE) provides challenging decoys to active compounds binding to specific target proteins. Two targets, muscle glycogen phosphorylase (PYGM) and epidermal growth factor receptor (EGFR), from DUDE were considered. PYGM concerns 114 actives and 4045 decoys, leading to a small set of 4159 PYGMligand pairs. EGFR has 832 actives and 35,441 decoys, constituting a large set of 36,273 EGFRligand pairs. These two sets (details in Additional file 1 : Table S1) were used to contrastively investigate the screening powers of the deeplearning PLBAP models. The decoytoactive ratios (\(r_{DTA}=\frac{n_{decoy}}{n_{active}}\)) of these two sets are approximately 35.5 and 42.6. Generating proteinligand complexes. Due to the lack of complex structures, the data in DUDE could not be fed into deeplearning BAP models directly. As such, AutoDOCK Vina was leveraged to generate the proteinligand complex structures (binding poses), each with a docking grid of \(20\mathring{A}\times 20\mathring{A}\times 20\mathring{A}\) placed at the ligandcenter position of the template structure (PDB:1C8K for PYGMligand pairs and PDB:2RGP for EGFRligand pairs). When docking each pair of molecules using Vina, 32 consecutive MonteCarlo samplings were conducted and the best pose was outputted during the search. These parameters are commonly adopted in docking applications. Evaluation rules. Relying on a deeplearning PLBAP model, the binding affinities for targetligand complexes can be predicted and ranked. The proportion of actives in the top \(X\%\) of ranked ligands, namely the enrichment factor (\(EF^{X}\)), is a crucial indicator showing the screening power of the model. Given an \(r_{DTA}\) (\(1,2,\ldots ,r_{DTA}^{max}\)), the decoys can be randomly selected from the decoy pool, and we can calculate \(EF^{X}\) for the actives coupled with selected decoys. The top \(1\sim 5\%\) of ranked ligands (\(X=1,2,\ldots ,5\)) were investigated in the enrichment analysis. To prevent from randomness, 10 selections were drawn and averaged to produce the final \(EF^{X}\) for each \(r_{DTA}\) and X values. A higher \(EF^{X}\) normally indicates a better screening performance.
The enrichment analysis was conducted to reveal the screening powers of PLBAP models on the PYGM and EGFR datasets (Figs.Â 7\(\sim\)8). Here \(M_5\), \(M_9\) and \(M_{26}\) (Table 2) were selected to stand for \(T_{ACNN}\), \(T_{IMCCNN}\) and \(T_{GraphGCN}\) models. Since \(T_{GridCNN}\) models have a severer overfitting problem (as shown in Table 2), we adopted model \(M_{14}\), which is computationally more expensive (built on augmented data) but with a better testing performance (Additional file 1 : Tables S2\(\sim\)3), to represent \(T_{GridCNN}\). Generally speaking for Figs.Â 7\(\sim\)8, as \(r_{DTA}\) increases, \(EF^X\) decreases dramatically. The real applications often involve a high \(r_{DTA}\) as actives are always the minority in the broad compound space, which puts a major obstacle to current PLBAP works. For the small PYGM dataset, the \(T_{GridCNN}\) model performs marginally better as \(r_{DTA}\) increases, particularly for the top \(1\%\) complexes. For the larger EGFR set that is more similar to the real states, \(T_{GraphGCN}\) and \(T_{IMCCNN}\) models are more competitive. Especially, the \(T_{GraphGCN}\) model retains an EF of \(10\sim 20\) as \(r_{DTA}\) reaches 40, for the top \(1\%\) complexes. As such, \(T_{GraphGCN}\) models have better potential to be developed into more powerful screening machines.
Conclusions
Deeplearning PLBAP models have their pros and cons that need to be weighted up for specific scoring tasks. \(\mathbf {T_{ACNN}}\) models can be explained from the perspective of energy and thermodynamic cycle, and it is friendly to largescale computations. However, they often have insufficient learning abilities for scoring or screening tasks. \(\mathbf {T_{IMCCNN}}\) models count on the learning of multirange intermolecular contact features by 2DCNN models. The feature representations are simple and can be efficiently learned. But such representations oversimplify the proteinligand interactions and ignore the spatial information of the molecules, making the explanation from the structural and physicochemical perspectives more difficult. \(\mathbf {T_{GridCNN}}\) models leverage the molecular structural information and voxelization techniques, laying a foundation of structural interpretation of proteinligand interactions. But the generation of such voxel features is resourceintensive, rendering the generalization to largescale computations impractical. The lack of rotational invariance puts even more obstacles to such models, particularly in screening tasks. \(\mathbf {T_{GraphGCN}}\) models have demonstrated great potential recently. They are less resourceintensive but can capture molecular topologies more flexibly than \(\mathbf {T_{GridCNN}}\) models, making them competitive in scoring and screening tasks. Refining the graph representations, developing neat but powerful learning architectures, and enhancing the interpretability can be promising ways to explore the potential of such models deeply. Devising more powerful machines, which are accurate in scoring tasks and also robust to tough screening tasks (with high \(r_{DTA}\)), will be a key direction for future developments of PLBAP works.
Availibility of data materials
The data for PLBAPmodel construction (training and hyperparametertuning) are from the PDBbind database (http://www.pdbbind.org.cn/). The test sets for evaluating the scoring performances of constructed models stem from CASR (http://csardock.org/). The screening powers of those models were measured using the PYGM and EGFR targets from DUDE (https://dude.docking.org/). The coding and experiment guidelines can be found from the online GitHub repository (https://github.com/debbydanwang/DLPLBAP).
Abbreviations
 CDD:

Computational drug discovery
 RMSD:

Rootmeansquare deviation
 PLBAP:

Proteinligand binding affinity prediction
 DNN:

Deep neural network
 ACNN:

Atomic convolutional neural network
 2DCNN:

Twodimensional convolutional neural network
 3DCNN:

Threedimensional convolutional neural network
 GCN:

Graph convolutional network
 GGNN:

Gated graph neural network
 IMC:

Intermolecular contact
 IMCP:

Intermolecular contact profile
 PC:

Pearsonâ€™s correlation
 RMSE:

Rootmeansquared error
 DUDE:

Enhanced directory of useful decoys
 PYGM:

Muscle glycogen phosphorylase
 EGFR:

Epidermal growth factor receptor
References
Kobayashi Susumu, Boggon Titus J, Dayaram Tajhal, JÃ¤nne Pasi A, Kocher Olivier, Meyerson Matthew, Johnson Bruce E, Eck Michael J, Tenen Daniel G, Halmos BalÃ¡zs (2005) Egfr mutation and resistance of nonsmallcell lung cancer to gefitinib. New England J Med 352(8):786â€“792
Ashwin Dhakal, Cole McKay, Tanner John J, Jianlin Cheng (2022) Artificial intelligence in the prediction of proteinligand interactions: recent advances and future directions. Briefings Bioinform 23(1):bba476
Morris Garrett M, Marguerita LimWilby (2008) Molecular docking. Mol Model Proteins. https://doi.org/10.1007/9781597451772_19
Pagadala Nataraj S, Khajamohiddin Syed, Jack Tuszynski (2017) Software for molecular docking: a review. Biophys Rev 9:91â€“102
Charles Ladd Marcus Frederick, Alfred Palmer Rex, Alfred Palmer Rex (1977) Structure determination by Xray crystallography. Springer, Berlin
WÃ¼thrich Kurt (1990) Protein structure determination in solution by nmr spectroscopy. J Biol Chem 265(36):22059â€“22062
Wang Junmei, Wolf Romain M, Caldwell James W, Kollman Peter A, Case David A (2004) Development and testing of a general amber force field. J Comput Chem 25(9):1157â€“1174
Yin Shuangye, Biedermannova Lada, Vondrasek Jiri, Dokholyan Nikolay V (2008) Medusascore: an accurate force fieldbased scoring function for virtual drug screening. J Chem Inform Model 48(8):1656â€“1662
Huang ShengYou, Grinter Sam Z, Zou Xiaoqin (2010) Scoring functions and their evaluation methods for proteinligand docking: recent advances and future directions. Phys Chem Chem Phys 12(40):12899â€“12908
Grosdidier AurÃ©lien, Zoete Vincent, Michielin Olivier (2011) Fast docking using the charmm force field with eadock dss. J Comput Chem 32(10):2149â€“2159
Eberhardt Jerome, SantosMartins Diogo, Tillack Andreas F, Forli Stefano (2021) Autodock vina 1.2. 0: New docking methods, expanded force field, and python bindings. J Chem Inform Model 61(8):3891â€“3898
Ã–ztÃ¼rk Hakime, Ã–zgÃ¼r Arzucan, Ozkirimli Elif (2018) Deepdta: deep drugtarget binding affinity prediction. Bioinformatics 34(17):i821â€“i829
Yuqian Pu, Li Jiawei, Tang Jijun, Guo Fei (2021) Deepfusiondta: drugtarget binding affinity prediction with information fusion and hybrid deeplearning ensemble model. IEEE/ACM Trans Comput Biol Bioinform 19(5):2760â€“2769
Nguyen Thin, Le Hang, Quinn Thomas P, Nguyen Tri, Le Thuc Duy, Venkatesh Svetha (2021) Graphdta: Predicting drugtarget binding affinity with graph neural networks. Bioinformatics 37(8):1140â€“1147
Jin Zhi, Tingfang Wu, Chen Taoning, Pan Deng, Wang Xuejiao, Xie Jingxin, Quan Lijun, Lyu Qiang (2023) Capla: improved prediction of proteinligand binding affinity by a deep learning approach based on a crossattention mechanism. Bioinformatics 39(2):btad049
Wang Renxiao, Fang Xueliang, Yipin Lu, Wang Shaomeng (2004) The pdbbind database: collection of binding affinities for protein ligand complexes with known threedimensional structures. J Med Chem 47(12):2977â€“2980
Liegi Hu, Benson Mark L, Smith Richard D, Lerner Michael G, Carlson Heather A (2005) Binding moad (mother of all databases). Proteins Structure Function Bioinform 60(3):333â€“340
Liu Qian, Kwoh Chee Keong, Li Jinyan (2013) Binding affinity prediction for proteinligand complexes based on \(\beta\) contacts and b factor. J Chem Inform Model 53(11):3076â€“3085
Li GuoBo, Yang LingLing, Wang WenJing, Li LinLi, Yang ShengYong (2013) Idscore: a new empirical scoring function based on a comprehensive set of descriptors related to proteinligand interactions. J Chem Inform Model 53(3):592â€“600
Zilian David, Sotriffer Christoph A (2013) Sfcscore rf: a random forestbased scoring function for improved affinity prediction of proteinligand complexes. J Chem Inform Model 53(8):1923â€“1933
Ballester Pedro J, Mitchell John BO (2010) A machine learning approach to predicting proteinligand binding affinity with applications to molecular docking. Bioinformatics 26(9):1169â€“1175
Durrant Jacob D, Andrew McCammon J (2010) Nnscore: a neuralnetworkbased scoring function for the characterization of protein ligand complexes. J Chem Inform Model 50(10):1865â€“1871
Xuchang Ouyang, Daniel Handoko Stephanus, Keong Kwoh Chee (2011) Cscore: a simple yet effective scoring function for proteinligand binding affinity prediction using modified cmac learning architecture. J Bioinform Comput Biol 9(supp01):1â€“14
SÃ¡nchezCruz Norberto, MedinaFranco JosÃ© L, Mestres Jordi, Barril Xavier (2021) Extended connectivity interaction features: improving binding affinity prediction through chemical description. Bioinformatics 37(10):1376â€“1382
Gomes Joseph, Ramsundar Bharath, Feinberg Evan N, Pande Vijay S (2017) Atomic convolutional networks for predicting proteinligand binding affinity. arXiv preprint arXiv:1703.10603
Zheng Liangzhen, Fan Jingrong, Yuguang Mu (2019) Onionnet: a multiplelayer intermolecularcontactbased convolutional neural network for proteinligand binding affinity prediction. ACS Omega 4(14):15956â€“15965
Atz Kenneth, Grisoni Francesca, Schneider Gisbert (2021) Geometric deep learning on molecular representations. Nature Machine Intell 3(12):1023â€“1032
Isert Clemens, Atz Kenneth, Schneider Gisbert (2023) Structurebased drug design with geometric deep learning. Current Opin Struct Biol 79:102548
JimÃ©nez JosÃ©, Skalic Miha, MartinezRosell Gerard, De Fabritiis Gianni (2018) K deep: proteinligand absolute binding affinity prediction via 3dconvolutional neural networks. J Chem Inform Model 58(2):287â€“296
Son Jeongtae, Kim Dongsup (2021) Development of a graph convolutional neural network model for efficient prediction of proteinligand binding affinities. PLoS ONE 16(4):e0249404
Perner Petra (2011) How to interpret decision trees? In Advances in Data Mining. Applications and Theoretical Aspects: 11th Industrial Conference, ICDM 2011, New York, NY, USA, August 30â€“September 3, 2011. Proceedings 11, pages 40â€“55. Springer
Mengnan Du, Liu Ninghao, Xia Hu (2019) Techniques for interpretable machine learning. Commun ACM 63(1):68â€“77
James Murdoch W, Chandan Singh, Karl Kumbier, AbbasiAsl Reza Yu, Bin, (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci 116(44):22071â€“22080
Samek Wojciech, Montavon GrÃ©goire, Lapuschkin Sebastian, Anders Christopher J, MÃ¼ller KlausRobert (2021) Explaining deep neural networks and beyond: a review of methods and applications. Proc IEEE 109(3):247â€“278
Burkart Nadia, Huber Marco F (2021) A survey on the explainability of supervised machine learning. J Artif Intell Res 70:245â€“317
Zechen Wang, Liangzhen Zheng, Liu Yang Qu, Yuanyuan Li YongQiang, Zhao Mingwen Mu, Yuguang Li Weifeng (2021) Onionnet2: a convolutional neural network model for predicting proteinligand binding affinity based on residueatom contacting shells. Front Chem. https://doi.org/10.3389/fchem.2021.753002
Wang Debby D, Chan MoonTong (2022) Proteinligand binding affinity prediction based on profiles of intermolecular contacts. Comput Struct Biotechnol J 20:1088â€“1096
StepniewskaDziubinska Marta M, Zielenkiewicz Piotr, Siedlecki Pawel (2018) Development and evaluation of a deep learning model for proteinligand binding affinity prediction. Bioinformatics 34(21):3666â€“3674
Ragoza Matthew, Hochuli Joshua, Idrobo Elisa, Sunseri Jocelyn, Koes David Ryan (2017) Proteinligand scoring with convolutional neural networks. J Chem Inform Model 57(4):942â€“957
Rezaei Mohammad A, Li Yanjun, Dapeng Wu, Li Xiaolin, Li Chenglong (2020) Deep learning in drug design: proteinligand binding affinity prediction. IEEE/ACM Tran Comput Biol Bioinform 19(1):407â€“417
Wang Yu, Wei Zhengxiao, Xi Lei (2022) Sfcnn: a novel scoring function based on 3d convolutional neural network for accurate and stable proteinligand affinity prediction. BMC Bioinform 23(1):222
Shen Huimin, Zhang Youzhi, Zheng Chunhou, Wang Bing, Chen Peng (2021) A cascade graph convolutional network for predicting proteinligand binding affinity. Int J Mol Sci 22(8):4023
Feinberg Evan N, Sur Debnil, Zhenqin Wu, Husic Brooke E, Mai Huanghao, Li Yang, Sun Saisai, Yang Jianyi, Ramsundar Bharath, Pande Vijay S (2018) Potentialnet for molecular property prediction. ACS Central Sci 4(11):1520â€“1530
Lim Jaechang, Ryu Seongok, Park Kyubyong, Choe Yo Joong, Ham Jiyeon, Kim Woo Youn (2019) Predicting drugtarget interaction using a novel graph neural network with 3d structureembedded graph representation. J Chem Inform Model 59(9):3981â€“3988
Yip Virginia, Elber Ron (1989) Calculations of a list of neighbors in molecular dynamics simulations. J Comput Chem 10(7):921â€“927
Jones Derek, Kim Hyojin, Zhang Xiaohua, Zemla Adam, Garrett Stevenson WF, Bennett Drew, Kirshner Daniel, Wong Sergio E, Lightstone Felice C, Allen Jonathan E (2021) Improved proteinligand binding affinity prediction with structurebased deep fusion inference. J Chem Inform Model 61(4):1583â€“1592
StepniewskaDziubinska Marta M, Zielenkiewicz Piotr, Siedlecki Pawel (2017) Decafdiscrimination, comparison, alignment tool for 2d pharmacophores. Molecules 22(7):1128
Jubb Harry C, Higueruelo Alicia P, OchoaMontaÃ±o Bernardo, Pitt Will R, Ascher David B, Blundell Tom L (2017) Arpeggio: a web server for calculating and visualising interatomic interactions in protein structures. J Mol Biol 429(3):365â€“371
Desaphy Jeremy, Raimbaud Eric, Ducrot Pierre, Rognan Didier (2013) Encoding proteinligand interaction patterns in fingerprints and graphs. J Chem Inform Model 53(3):623â€“637
Iandola Forrest N, Han Song, Moskewicz Matthew W, Ashraf Khalid, Dally William J, Keutzer Kurt (2016) Squeezenet: Alexnetlevel accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360
Ma Ningning, Zhang Xiangyu, Zheng HaiTao, Sun Jian (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV), pages 116â€“131
Cengil Emine, Ã‡Ä±nar Ahmet, Ã–zbay Erdal (2017) Image classification with caffe deep learning framework. In 2017 International Conference on Computer Science and Engineering (UBMK), pages 440â€“444. IEEE
Krizhevsky Alex, Sutskever Ilya, Hinton Geoffrey E (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84â€“90
Szegedy Christian, Toshev Alexander, Erhan Dumitru (2013) Deep neural networks for object detection. Adv Neural Inform Process Syst 26
Selvaraju Ramprasaath R, Cogswell Michael, Das Abhishek, Vedantam Ramakrishna, Parikh Devi, Batra Dhruv (2017) Gradcam: Visual explanations from deep networks via gradientbased localization. In Proceedings of the IEEE international conference on computer vision, pages 618â€“626
Ramachandran Prabhu, Varoquaux GaÃ«l (2011) Mayavi: 3d visualization of scientific data. Comput Sci Eng 13(2):40â€“51
Xiangying Zhang, Haotian Gao, Haojie Wang, Zhihang Chen, Zhe Zhang, Xinchong Chen, Yan Li, Yifei Qi, Renxiao Wang (2023) Planet: a multiobjective graph neural network model for proteinligand binding affinity prediction. J Chem Inform Model. https://doi.org/10.1021/acs.jcim.3c00253
Wang Kaili, Zhou Renyi, Tang Jing, Li Min (2023) Graphscoredta: optimized graph neural network for proteinligand binding affinity prediction. Bioinformatics 39(6):btad340
Kipf Thomas N, Welling Max (2016) Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Zhang Si, Tong Hanghang, Jiejun Xu, Maciejewski Ross (2019) Graph convolutional networks: a comprehensive review. Comput Soc Networks 6(1):1â€“23
Sun Mengying, Zhao Sendong, Gilvary Coryandar, Elemento Olivier, Zhou Jiayu, Wang Fei (2020) Graph convolutional networks for computational drug development and discovery. Briefings Bioinform 21(3):919â€“935
Wang Renxiao, Fang Xueliang, Yipin Lu, Yang ChaoYie, Wang Shaomeng (2005) The pdbbind database: methodologies and updates. J Med Chem 48(12):4111â€“4119
Dunbar Jr James B, Smith Richard D, DammGanamet Kelly L, Ahmed Aqeel, Esposito Emilio Xavier, Delproposto James, Chinnaswamy Krishnapriya, Kang YouNa, Kubish Ginger, Gestwicki Jason E et al (2013) Csar data set release 2012: ligands, affinities, complexes, and docking decoys. J Chem Inform Model 53(8):1842â€“1852
Gabel Joffrey, Desaphy JÃ©rÃ©my, Rognan Didier (2014) Beware of machine learningbased scoring functions on the danger of developing black boxes. J Chem Inform Model 54(10):2807â€“2815
Oâ€™Boyle Noel M, Morley Chris, Hutchison Geoffrey R (2008) Pybel: a python wrapper for the openbabel cheminformatics toolkit. Chem Central J 2(1):1â€“7
Landrum Greg et al (2013) Rdkit: a software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8:31
Acknowledgements
Not applicable.
Funding
This work is supported by National Natural Science Foundation of China (Grants No.62203304, No.62176160 and No.62376162) and the Guangdong Basic and Applied Basic Research Foundation (Grant 2022A1515010791).
Author information
Authors and Affiliations
Contributions
DDW conceived the original idea. DDW and WW planned and carried out the experiment. DDW wrote the manuscript with support from RW. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors report no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Table S1.
Description about the datasets in this study. Table S2. Scoring performances of deeplearning PLBAP models. Table S3. Training times of some goodperforming PLBAP models. To make a fair comparison, a 20trial random search for hyperparameter tuning was adopted for each model to yield the time costs. The higher time costs for each type of models are highlighted. Figure S1. Heatmaps showing the importance of features, in terms of PC drop and RMSE increase, for M_{7} model. These features concern 30 distance shells (s0 âˆ¼ s29) and 36 types of intermolecular contacts (c0 âˆ¼ c35). Figure S2. Heatmaps showing the importance of positions, in terms of PC drop and RMSE increase, for M_{11} model. Each position is a voxel, characterized by 9 channels (hydrophobicity, hydrogenbond donor, hydrogenbond acceptor, aromaticity, positivelyionizable, negatively ionizable, metallicity, excluded volume, and sign for a protein/ligand atom). Figure S3. Importance of voxel channels, in terms of PC drop and RMSE increase, for M_{11} model. Figure S4. Importance of node features, in terms of PC drop and RMSE increase, for M_{23} model.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, D.D., Wu, W. & Wang, R. Structurebased, deeplearning models for proteinligand binding affinity prediction. J Cheminform 16, 2 (2024). https://doi.org/10.1186/s13321023007959
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13321023007959