Predicting drug–drug interactions through drug structural similarities and interaction networks incorporating pharmacokinetics and pharmacodynamics knowledge

Drug–drug interactions (DDIs) may lead to adverse effects and potentially result in drug withdrawal from the market. Predicting DDIs during drug development would help reduce development costs and time by rigorous evaluation of drug candidates. The primary mechanisms of DDIs are based on pharmacokinetics (PK) and pharmacodynamics (PD). This study examines the effects of 2D structural similarities of drugs on DDI prediction through interaction networks including both PD and PK knowledge. Our assumption was that a query drug (Dq) and a drug to be examined (De) likely have DDI if the drugs in the interaction network of De are structurally similar to Dq. A network of De describes the associations between the drugs and the proteins relating to PK and PD for De. These include target proteins, proteins interacting with target proteins, enzymes, and transporters for De. We constructed logistic regression models for DDI prediction using only 2D structural similarities between each Dq and the drugs in the network of De. The results indicated that our models could effectively predict DDIs. It was found that integrating structural similarity scores of the drugs relating to both PK and PD of De was crucial for model performance. In particular, the combination of the target- and enzyme-related scores provided the largest increase of the predictive power.Graphical abstract . Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0200-8) contains supplementary material, which is available to authorized users.


Background
DDI occurs when a drug affects the efficacy of another drug that is co-administered. Between 2009 and 2012, 38.1% of U.S. adults ages 18-44 used three or more prescription drugs during a 30 day time period [1]. The percentages increased substantially as a function of age, with 67.2% for ages 45-64, and 89.8% for age 65 years or older, respectively. The number of incidents of adverse drug reaction increases exponentially if a patient takes four or more drugs [2]. Although DDI may have beneficial effects, it can cause serious adverse effects and sometimes lead to drug withdrawal [3]. During drug development, the prediction of such DDI would help reduce the time and costs by prioritizing drug candidates.
The main types of DDI are based on pharmacokinetics (PK) and pharmacodynamics (PD). PK is the body's response to a drug, which includes absorption, distribution, metabolism, and excretion (ADME). DDI occurs when two drugs share the same mechanism of excretion [4]. A significant number of studies on PK-based DDI have been done at the molecular level involving enzymes and transporters, and resulted in a large amount of experimental data [5]. For example, changes in gastric pH caused by a drug can affect the gastro-intestinal absorption of a co-administered drug [4]. If two drugs both binding to a same plasma protein are co-administered, the concentration of the free drugs in plasma may change [4]. Also, various drugs are substrates, inhibitors, or inducers of the CYP enzymes, the dominant metabolic enzymes. As a result, DDI can occur when an inhibitor and a substrate of a CYP enzyme are co-administered. The PD-based DDIs are found at the receptor level, the signal transduction level, and the physiological system level [6]. The most common ones occur at the receptor level where drugs compete for binding to the same receptor.
Many studies for predicting DDI have been reported based on various approaches such as physiologically based pharmacokinetic (PBPK) modeling, molecular structural similarity analysis, ontology and annotation based analysis, network modeling, QSAR modeling, and data mining from clinical data. A PBPK model consists of mathematical equations that describe the properties of ADME in the human body. For example, a PBPK model was developed using the results from a clinical pharmacokinetic study under single and multiple-dose conditions to predict the DDI for crizotinib with ketoconazole or rifampin [7]. Structural similarity for DDI prediction has been employed based on the idea that if there is a DDI between drug A and drug B, and drug C has a similar structure to drug A, there is likely a DDI between drug C and drug B [8]. Vilar et al. predicted DDIs with a matrix transformation approach using structural similarities of drugs with molecular fingerprints [8]. In subsequent studies, the authors reported prediction methods using integrated similarity measures including interaction profile similarities, adverse effect similarities, and target similarities [9]. Based on the similar idea, INferring drug interactions (INDI), was developed to predict CYP-related and PD-related DDI using drug chemical similarities, side effects similarities, ATC (Anatomical Therapeutic Chemical classification system) similarities, target sequence similarities, protein-protein interaction similarities, and Gene Ontology similarities [10]. In addition, 3D pharmacophoric similarity was used for the prediction of DDI, and the significance of 3D structure data was demonstrated, which captured the characteristics that were missed by using only 2D data [11]. Luo et al. developed a web server for DDI prediction through chemical-protein interaction profiles created by docking chemicals to the ligand binding pockets of the collected PDB structures [12]. DDI prediction using machine learning approaches was implemented on DDInetworks through integrated phenotypic, therapeutic, structural and genomic similarities [13]. QSAR models for DDI prediction were constructed for CYP1A2, 2C9, 2D6, and 3A4 by using two types of chemical descriptors and the balanced accuracy ranged from 72 to 79% [14].
There are also knowledge-based studies for DDI prediction. Herrero-Zazo et al. inferred DDI with DDI knowledge including types, mechanisms, and applications of DDIs using semantic web rule language [15]. Huang et al. predicted DDI using protein-protein interaction network, which demonstrated an accuracy of 0.82 and recall of 0.62 [16]. Cami et al. [17] predicted DDI using known DDI networks. Recently, a computational model for predicting DDI was developed through integrated clinical side effect information from the drug labels and FDA adverse event reporting system [18]. Electronic health records (EHRs) were also used to identify or prioritize drug-drug-adverse events [19,20].
Here, we proposed models for predicting DDIs using the structural similarities of drugs from the PK and PD networks and investigated the factors influencing DDIs for further improvement of the predictions. Our assumption is that a query drug (Dq) and a drug to be examined (De) tend to interact if Dq is structurally similar to the drugs in De's network that interact with the enzymes/ transporters/target proteins of De. The results of model assessment and two case studies were reported.

Characteristics of each score type in the network
The distributions of structural similarities between Dq and the drugs in a network of De for the DDI pairs and non-DDI pairs are shown in Fig. 1. The construction of the network of De and the score types (S d , S e , S eg , S tr , S trg , S ta , and S tag ) are described and defined in the Methods section. Note that the score values of −10 were excluded from Fig. 1. Panel (A) shows that the median of similarities for all score types together for DDI pairs is larger than that for non-DDI pairs in general. The interquartile range for DDI pairs is slightly narrower than that for non-DDI pairs. The same trends were observed in score distributions for each individual score type [panel (B)]. This suggests that the structural similarity scores based on integrated PK and PD interaction network can be used for the prediction of DDIs. The distributions of the scores S e , S tr , and S ta shifted to the higher value range comparing to those of the corresponding scores for pharmacogenetic associations (S eg , S trg , and S tag ), which implies that pharmacogenetic interactions are less structurally dependent than physical interactions.
The averages of S d in both cases of DDIs and non-DDIs (0.416 and 0.346 respectively) were the lowest among the scores, while the other scores (S e , S eg , S tr , S trg , S ta , and S tag ) ranged from 0.556 to 0.800 (Additional file 1: Table S1). Even though the structure of Dq is dissimilar with De, the other drugs interacting with the enzymes, transporters, or targets in De's network can be structurally similar to Dq. In this case, it is still possible that Dq interacts with those proteins of De and DDI between Dq and De may be observed. S e showed the highest average score value, which may be explained by the fact that an enzyme can metabolize many drugs, and therefore the probability for finding drugs structurally similar to Dq in the network can be higher.
The correlations between scores for DDI and non-DDI pairs are shown in Fig. 2. It appears that correlations among the scores are generally classified into three groups: S d , (S e , S eg , S tr , and S trg ), and (S ta and S tag ), for both DDI and non-DDI pairs. Enzyme (S e and S eg ) and transporter (S tr and S trg ) related scores correlated with each other to some degree, which may be related to the interplay between metabolizing enzymes and transporters. It is reported that metabolizing enzymes and transporters influence each other for the ADME of drugs and therefore may affect DDI [21]. For example, many drugs metabolized by CYP3A4 are also transported by P-glycoprotein [22]. Also, physical interaction and pharmacogenetic association correlated strongly, i.e. (S e and S eg ), (S tr and S trg ), and (S ta and S tag ). However, the correlations among the scores for DDI pairs are slightly weaker than those for non-DDI pairs.

Prediction results
Average area under the curve (AUC) values for the 4-fold cross-validations with a series of score combination schemes are shown in Table 1. Generally, combining similarity scores that include both information relating to PK and PD resulted in stronger predictions. Although AUCs of the regression models using Set 1 through Set 6 were not significantly different with the average values in the range of 0.84 and 0.83, ANOVA test revealed the importance of considering multiple scores. This implies the merit of information integration to our DDI prediction model using the interaction network. Results for Set 8 and Set 9, both integrating PK information regarding transporters and PD information (PKtr + PD), showed lower AUC than those for Set 1 through Set 6 which all included enzyme information. Interestingly, using a maximum score among scores in the entire network (Set 7) resulted in an AUC of 0.786 with a standard deviation of 0.012, which was close to the AUCs for the models using Set 1 through Set 6. As shown in Fig. 3, the interquartile ranges of the distributions for the maximum score in the whole network for DDI pairs and for non-DDI pairs hardly overlap, unlike the situation when all scores were considered as shown in Fig. 1a. These observations imply that the most structurally similar drug to Dq in the network is quite important to DDI prediction but it is not the decisive factor for the ultimate prediction in the network system as the AUC for this model is still smaller than that for the model using Set 1. Using only a single information type (enzymes, transporters, or targets) along with knowledge of the corresponding pharmacogenetic association resulted in lower prediction performance with AUC values ranging from 0.587 to 0.613 for the results of Set 13, Set 14, and Set 20.
There was a large increase in AUC for Set 1 through Set 6 when the PD-related information was integrated to the enzyme information (Set 20). AUC boosted from 0.593 (Set 20) to 0.827 (Set 6) with a 39% increase and from 0.627 (Set 14) to 0.827 (Set 6) with a 32% increase. The second largest improvement in AUC was observed When combined with the target information, the enzyme information contributed more to the prediction than the transporter information. When the transporter data (PKtr) were replaced by the enzymes data (PKe), AUC increased from 0.741 (Set 8) to 0.827 (Set 6) with an 11% change. The structural space of drugs covered by Set 6 (PKe + PD information) might be larger than that covered by Set 8 (PKtr + PD information), which may be attributed to the fact that the number of drugs with similar structures to Dq from the enzyme-related subnetwork of De is more than that from the transporterrelated sub-network. Prediction performance might be improved when the number of drugs associated with transporters increases. These results could also be due to the fact that the correlation between S e and S eg was lower than that between S tr and S trg do (Fig. 2).
Comparing the results using score (Set 2 and Set 3), (Set 20 and Set 21), (Set 13 and Set 18), and (Set 14 and Set 15) revealed that pharmacogenetic associations did not contribute much to DDI prediction in terms of AUC, although the ANOVA test result indicated the importance of integrating pharmacogenetic associated information to models. This observation might be due to the fact that the scores for S e , S tr and S ta are higher in general than the corresponding scores for S eg , S trg and S tag based on the distribution shown in Fig. 1 b. Also S e and S eg , S tr and S trg , and S ta and S tag correlated with each other in both DDI and non-DDI cases to some degree (Fig. 2), again indicating that these scores have less effect on DDI prediction.

Case studies
Two case studies of DDIs predictions are presented for warfarin and simvastatin. Warfarin is a blood thinner drug. One of warfarin's drawbacks is that it interacts with many medications that are co-administered. Simvastatin is a drug for lowering the level of low-density lipoprotein cholesterol and fats, and for raising the level of high-density lipoprotein cholesterol in the blood. It is on the WHO model list of essential medicines [23]. For each case study we re-built models using the entire dataset but leaving out the data for any warfarin-drug pairs, or the data for any simvastatin-drug pairs, respectively, instead of applying the models constructed during the 4-fold cross validations. The model constructed with Set 1 was used for the prediction based on its superior performance according to the ANOVA test results.

Warfarin
The top ten drugs with predicted DDI for warfarin are listed in the Additional file 1: Table S2-1. Four are reported in DrugBank to have DDI indications. Newly predicted DDI candidates for warfarin were dronabiol, quercetin, genistein, salicylic acid, fluorescein, and doxepin. Among them, the DDI between doxepin and warfarin is reported on Micromedex [24] with a moderate interaction that increases the risk of bleeding. Quercetin is reported in Drugs.com as having moderate interactions with warfarin that reduce the efficacy of warfarin [25]. The definition of "moderate" on Drugs.com is that the combination is moderately significant in clinical applications and usually the combination should be avoided or may be used only under special circumstances. Genistein is listed as having significant interaction with warfarin on rxlist.com. The definition of "significant" interaction in rxlist.com is that the combination potentially can cause dangerous DDI and should be used with cautions and close monitoring. It is reported that quercetin displaces warfarin bound to human serum albumin [26] due to competitive binding and that genistein also shares the binding sites in human serum albumin with warfarin [27]. It is reported that special precautions are necessary when taking dronabinol together with warfarin [28]. Overall, DDI between warfarin and eight out of the top ten predicted drugs were supported by reports in literature and databases. Comparing to the prediction results in the study by Vilar et al. [8], which also used drug structural similarities, two of the top ten drugs predicted by our model (i.e. salicyclic acid and estrone) were predicted to have DDI with warfarin in their study. Their predictions are based on the structural similarity between De and the drugs that are known to have DDI with warfarin (Dq) and therefore the chemical space searched is limited. On the other hand, our approach is based on the structural similarity between Dq and the drugs that interact with the proteins in the interaction network of De. Therefore, our approach explores a larger chemical space and is capable of picking up DDIs with the drugs, which may not be structurally similar to drugs having known DDIs with warfarin.

Simvastatin
The top ten drugs with predicted DDI for simvastatin are shown in the Additional file 1: Table S2-2. None of the top ten drugs in Additional file 1: Table S2-2 is reported in DrugBank. Among them, however, lovastatin, prednisolone, dexamethasone, prednisone, and tacrolimus are listed by Drugs.com as having moderate interaction with simvastatin [29][30][31][32][33]. It is not surprising to see structurally similar drugs to simvastatin, e.g. lovastatin. However, our model also predicted tacrolimus, whose structure is not similar to simvastatin. A study reported that lovastatin and simvastatin likely had DDIs through p-glycoprotein (MDR1) transporter [34]. Six out of the top ten drugs were steroid hormones: several from the glucocorticoid family (prednisolone, dexamethasone, and prednisone), testosterone, aldosterone, and norethisterone. Dehydroepiandrosterone sulfate is the metabolite of a steroid hormone, dehydroepiandrosterone. There is a recent report that simvastatin influenced the steroid hormone level in plasma in female patients who had non-classic congenital adrenal hyperplasm and were taking metformin [35]. Overall, five out of the top ten predicted drugs were supported by reports in the literature and databases. No false negative prediction of DDIs for simvastatin was made. All known drugs having DDI with simvastatin which include a total of 31 from DrugBank were picked up by our model. In comparison, only four of our top ten drugs (i.e. testosterone, prednisolone, prednisone, and lovastatin) were predicted to have DDI with simvastatin in the study by Vilar et al. [8].
These two case studies suggested that our approach could also be used to predict possible enzymes and transporters for a drug (Dq) in general. Cytochrome P450 2C9 metabolizes warfarin and seven out of the top ten drugs predicted having DDI with warfarin. Similarly, multidrug resistance protein 1 interacts with warfarin and transports eight out of the top ten drugs. Cytochrome P450 3A4 metabolizes simvastatin and seven out of the top ten drugs in the DDI prediction, and multidrug resistance protein 1 transports simvastatin and eight out of the top ten drugs. On the other hand, the warfarin case study suggested the limitation of this approach. Comparing to that no false negative prediction of DDIs was made for simvastatin, the DDI prediction for warfarin resulted in 18 false negatives out of 150 known DDIs. This is possibly due to the lack of relevant enzyme-and transporter-information for those drugs in the network. This limitation may be eliminated over time when additional experimental PK data becomes available.

Conclusions
In this study, we investigated the factors for predicting DDI through structural similarities and the interaction networks which contain PK and PD knowledge. Our work demonstrated: (1) structural similarities between Dq and the drugs in the network of De can be used for predicting DDIs between Dq and De; (2) the integration of both structural similarity scores relating to PK and PD was crucial for DDI prediction; (3) the inclusion of pharmacogenetically associated knowledge (scores: S eg , S trg , and S tag ) only made minor contribution to DDI predictions. Two case studies showed the ability of this approach for predicting DDI. Eight out of the top ten predicted DDIs with warfarin, and five out of the top ten predicted DDIs with simvastatin were supported by reports in literature and multiple databases.
A limitation for the current prediction method is that it requires enzyme or transporter information for De. Imputing enzymes or transporters for the drug may be a possible solution for future study. Another limitation lies in the fact that it can only apply to small molecule drugs (i.e. not to peptides or nucleic acids). For further improving prediction, integrating other knowledge may be one direction. For example, the population of the transporter protein may depend on the cell type and intracellular membranes type [36], and therefore, tissue specific population data of transporters might help further distinguish DDI from non-DDI pairs. For enzymes, the information of the drugs such as the inducer, inhibitor, or substrate information might help enhance DDI prediction as well. Furthermore, associating the information of absorption, signal transduction pathway, physiological agonism/antagonism, or excretion (e.g. half-life) might help improve prediction performance and understand the mechanism of DDIs.
The modeling process contains four steps. First, interaction network for each De was constructed; second, the structural similarities between Dq and all the drugs in the network of De including De were computed; third, DDI prediction models were constructed using the structural similarities with logistic regression approach; finally, 4-fold cross-validation was carried for model evaluation. Figure 4 illustrates a network of De which consists of two sub-networks that represent simplified PK and PD information (circled by orange and black line in Fig. 4, respectively). Short terms for describing the respective PK and PD protein types and associated drugs are provided in Fig. 4, and are used throughout the manuscript. Subnetwork system presenting PD relationship was previously used by Hansen et al. [40]. Here, our assumption is that Dq and De tend to have interactions if the structure of Dq is similar to the structures of the drugs in De's interaction network (from D1 through D12).
PK-related sub-network represents relationships between: De and the related enzymes (E1 and E2); De and its transporters (Tr1 and Tr2); E1, E2 and the drugs that interact with them (D1, D2, and D4); E1, E2 and the drugs that have pharmacogenetic associations with them (D3); Tr1, Tr2 and the drugs that they transport (D5 and D7); Tr1, Tr2 and the drugs that have pharmacogenetically related interactions with them (D6). PD-related subnetwork of De represents relationships between: De and its target proteins (T1 in Fig. 4); T1 and other drugs that also target T1 (D9); T1 and the drugs that have pharmacogenetic association with T1 (D10); T1 and the proteins that physically interact with T1 (P1 and P2); P1 and P2 and the drugs that target them (D8 and D12); and P1 and P2 and the drugs that have pharmacogenetic associations with them (D11).
Our approach requires only structural similarities as the input to predict DDIs. In this work, we used PubChem 2D fingerprint [41] and Tanimoto coefficient to calculate structural similarities. Seven structural similarity scores (i.e. S d , S e , S eg , S tr , S trg , S ta , and S tag as defined below) using different drug subset in De's network were used to build DDI prediction models with logistic regression approach. Independent variables for the regression models were the scores, and the values for the dependent variable were 1 for DDI pairs and 0 for non-DDI pairs, respectively.

Score type definitions
S d : the similarity score between Dq and De. S e : the maximum similarity score between Dq and the drugs in the network of De that interact with the enzymes of De (D1, D2, and D4). S e = max (S e1 , S e2 , S e3 ) in Fig. 4. S eg : the maximum similarity score between Dq and the drugs in the network of De that have pharmacogenetic associations with the genes of the enzymes of De (D3). S eg = S eg1 in Fig. 4. S tr : the maximum similarity score between Dq and the drugs in the network of De that are transported by the same transporters of De (D5 and D7). S tr = max (S tr1 , S tr2 ) in Fig. 4. S trg : the maximum similarity score between Dq and the drugs in the network of De that have pharmacogenetic associations with the genes of the transporters of De (D6). S trg = S trg1 in Fig. 4. S ta : the maximum similarity score between Dq and the drugs in the network of De that have physical interac-