 Research article
 Open Access
 Published:
Detecting drug communities and predicting comprehensive drug–drug interactions via balance regularized seminonnegative matrix factorization
Journal of Cheminformatics volume 11, Article number: 28 (2019)
Abstract
Background
Because drug–drug interactions (DDIs) may cause adverse drug reactions or contribute to complexdisease treatments, it is important to identify DDIs before multipledrug medications are prescribed. As the alternative of highcost experimental identifications, computational approaches provide a much cheaper screening for potential DDIs on a large scale manner. Nevertheless, most of them only predict whether or not one drug interacts with another, but neglect their enhancive (positive) and depressive (negative) changes of pharmacological effects. Moreover, these comprehensive DDIs do not occur at random, but exhibit a weakly balanced relationship (a structural property when considering the DDI network), which would help understand how highorder DDIs work.
Results
This work exploits the intrinsically structural relationship to solve two tasks, including drug community detection as well as comprehensive DDI prediction in the coldstart scenario. Accordingly, we first design a balance regularized seminonnegative matrix factorization (BRSNMF) to partition the drugs into communities. Then, to predict enhancive and degressive DDIs in the coldstart scenario, we develop a BRSNMFbased predictive approach, which technically leverages drugbinding proteins (DBP) as features to associate new drugs (having no known DDI) with other drugs (having known DDIs). Our experiments demonstrate that BRSNMF can generate the drug communities, which exhibit more reasonable sizes, the property of weak balance as well as pharmacological significances. Moreover, they demonstrate the superiority of DBP features and the inspiring ability of the BRSNMFbased predictive approach on comprehensive DDI prediction with 94% accuracy among top50 predicted enhancive and 86% accuracy among bottom50 predicted degressive DDIs.
Conclusions
Owing to the regularization of the weak balance property of the comprehensive DDI network into seminonnegative matrix factorization, our proposed BRSNMF is able to not only generate better drug communities but also provide an inspiring comprehensive DDI prediction in the coldstart scenario.
Introduction
When two or more drugs are taken together, their pharmacological effects or behaviors would be unexpectedly influenced by each other [1]. Such an influence is termed as Drug–Drug Interaction (DDI), which would reduce drug efficacy, increase unexpected toxicities, or induce other adverse drug reactions among the coprescribed drugs. Unidentified DDIs occur frequently in clinical usages. There exist ~ 15 DDIs out of every 100 drug pairs on average among approved small molecular drugs in DrugBank [2]. They would put patients, who are treated with multipledrug medications, in an unsafe situation [3,4,5,6]. Moreover, understanding DDI is the first step towards drug combination, which involves usually highorder DDIs [7] and becomes one of the promising treatments for multifactorial complex diseases [8]. Consequently, there is an urgent need to analyze and identify DDIs before clinical comedications are administered. However, traditional experimental approaches for DDI identification (e.g. testing cytochrome P450 [9] or transporterassociated interactions [10]) have high cost and long duration [11]. So far, only a few DDIs could be identified during drug development (usually the clinical trial phase), some of them are reported after the drugs are approved, and many are found in postmarket surveillance.
Computational approaches provide a promising alternative to discover potential DDIs on a large scale for further screening and have gained a lot of attention from both academy and industry recently [12, 13]. Datamining based approaches have been developed for detecting DDIs from different sources [11], such as scientific literatures [14, 15], electronic medical records [16], and the Adverse Event Reporting System of FDA (http://www.fda.gov). Even though these approaches can collect and report known DDIs, they cannot an early warning of potential DDIs before clinical medications are administered. In contrast, machine learningbased approaches (e.g. naïve similaritybased approach [17], network recommendationbased [11], classificationbased [18, 19] are able to provide such alerts by utilizing premarketed or postmarketed drug attributes [20], such as chemical structures [17, 21], targets [22], hierarchical classification codes [18] and side effects [11, 23].
Most of these existing machine learningbased approaches are designed for conventional binary prediction, which only indicates how likely a pair of drugs is a DDI. But two interacting drugs may change their own pharmacological behaviors or effects (e.g. increasing or decreasing serum concentration) in vivo [21, 23]. For example, the serum concentration of Quinine (DrugBank Id: DB00468) increases when it is taken with Aprepitant (DB00673), whereas its serum concentration decreases when taken with Mitotane (DrugBank Id: DB00648). We refer these two cases of DDIs as an enhancive DDI and a degressive DDI respectively and both of them as comprehensive DDIs, which contains drug changes in terms of pharmacological effects. It is much better to know whether a DDI is enhancive or degressive, especially when making optimal patient care, establishing drug dosage, or finding drug resistance to therapy [24].
On the other hand, the occurrence of both enhancive DDIs and degressive DDIs is not random, but exhibits a structural relationship among the drugs when considering the corresponding DDI network [21, 23]. Existing approaches have not yet exploited this structural property, which is, however, one of the most important steps to understand highorder drug interactions treating complex diseases [7]. Two of our recent works [21, 23] attempted to investigate these two issues: (1) predicting comprehensive DDIs instead of binary prediction; and (2) investigating the structural relationship of drugs in a DDI network. One of them proposed a model to predict enhancive and degressive DDIs for different predicting scenarios of new drugs (those with no known DDI) [23]. Another one observed that the numbers of enhancive and degressive DDIs of drugs, as well as their sum/difference, are correlated with drug communities [21]. More importantly, this latter work also reveals that the number of balanced triads (to be defined and explained in Fig. 1) is significantly larger than the number of unbalanced triads in a comprehensive DDI network. This observation is similar to that in signed social networks, which popularly exhibit the nature of global structural balance [25]. Upon the fundamental theorems of Strong Balance [26] and Weak Balance [27], this nature can be leveraged to predict signed links in the social networks [28].
Inspired by signed social networks, this current work exploits the weakly balanced relationship among drugs to solve two tasks: drug community detection as well as comprehensive DDI prediction in the coldstart scenario. The paper is organized as follows. "Methods" section first formulates the community partition in a comprehensive DDI network based on the weak balance theory [27] for signed networks. Then, for the first task, it represents a novel clustering algorithm, balance regularized seminonnegative matrix factorization (BRSNMF), which integrate a lowrank matrix decomposition with a weak balance regularization. After that, it depicts a BRSNMFbased predictive approach for the coldstart scenario that requires us to predict potential comprehensive DDIs for newly coming drugs having no known DDI. In Section Results and Discussions, after introducing the weakly balanced phenomenon in a real DDI network, we investigate the advantages of BRSNMF by two comparative experiments. In the first experiment, we compare BRSNMF to the traditional seminonnegative matrix factorization by investigating drug numbers, balances, and pharmacological significances across drug communities. In the second one, we compared our features based on drugbinding proteins (DBP) to the popular features based on drug chemical structures (e.g. PubChem fingerprints) under crossvalidation. Furthermore, leveraging our DBP features under a versionindependent test, we compared our BRSNMFbased approach with the stateoftheart approach DDINMF [21], which considers nothing about the weakly balanced relationship among drugs. In the last section, we draw our conclusions with discussions.
Methods
Community partition in comprehensive DDI network
Without loss of generality, let \({\mathbf{D}} = \left\{ {d_{i} } \right\},i = 1,2, \ldots ,m\) be a set of m approved drugs. Their interactions can be accordingly represented as an \(m \times m\) symmetric interaction matrix \({\mathbf{A}}_{m \times m} = \left\{ {a_{ij} } \right\}\). For the conventional DDIs, \(a_{ij} = 1\) if \(d_{i}\) interacts with \(d_{j}\), and \(a_{ij} = 0\) otherwise. For the comprehensive DDIs, \(a_{ij} \in \{  1,0, + 1\}\). Again, if \(d_{i}\) and \(d_{j}\) do not interact with each other, \(a_{ij} = 0\). When there is an enhancive DDI or a degressive DDI between \(d_{i}\) and \(d_{j}\), \(a_{ij} = + 1\) or \(a_{ij} =  1\) respectively. The conventional binary DDI matrix \({\mathbf{A}}_{b}\) can be obtained from the comprehensive DDI matrix by setting \({\mathbf{A}}_{b} = Binary({\mathbf{A}})\) (taking the absolute values of all elements). The comprehensive DDI matrix characterizes a signed network \(G(N,E)\), in which drugs are nodes and their interactions are edges.
According to Weak Balance Theory [27], the nodes of a weakly balanced signed network can be ideally clustered into k groups, such that the edges within groups are positive (enhancive) and the edges between groups are negative (degressive). In such a weakly balanced network, all its lcycles are strongly or weakly balanced. Here, an lcycle is defined as a simple path from some node to itself with length equal to l. We mainly consider the case of l = 3, where a 3cycle is called as a triad. There are four kinds of triads, labelled as PPP, NNP, NNN, and PPN respectively, where P denotes positive and N denotes negative edges in a triad (Fig. 1). The first two triads are strongly balanced, the third is weakly balanced and the last is unbalanced. The realworld signed networks (e.g. Epinions and Slashdot) are not purely balanced because they contain some (although much fewer) unbalanced triads (Fig. 1), which are caused by negative edges within groups or positive edges between groups.
Our DDI network is also such a network, which contains significantly more balanced triads than unbalanced triads [21]. We verify our observation using the real data in DrugBank (see "Dataset" section). In our DDI network, we also observe that it may contain a community, in which most edges are negative (i.e. most triads in the community are weakly balanced). Considering the above observations, we generalize the weak balance theory as follows: the nodes of a weakly balanced network can be ideally clustered into k groups, such that most edges within groups are positive (strongly balanced groups) or negative (weakly balanced groups) while most edges between groups are negative. In the context of such a comprehensive DDI network, a drug community is referred to as a cluster, in which the number of balanced lcycles is significantly greater than that of unbalanced lcycles. A real example of a small DDI subnetwork illustrates this idea (Fig. 1).
When given a DDI network, our problem can be formulated as a kway clustering problem (i.e. finding k communities \(\{ C_{1} , \ldots ,C_{k} \}\)). We anticipate (1) the clustering partitions the network into k evenly distributed drug clusters, of which each contains a sufficient number of drug nodes; (2) more importantly, most interactions within clusters are enhancive or degressive while most interactions between clusters are degressive. This clustering problem is NPhard [29]. To solve it, we present an approximate solution by designing a lowrank matrix decomposition, which maps the network into a lowdimensional space so as to reveal its underlying weakly balanced structure.
Clustering by balance regularized seminonnegative matrix factorization
For a nonnegative matrix A, nonnegative matrix factorization (NMF) decomposes it into two lowrank nonnegative factor matrices W and H, such that \({\mathbf{A}} \approx {\mathbf{WH}}^{T}\). The nonnegativity of NMF makes both W and H easier to interpret and provides an inherent clustering, in which the columns of W play the cluster centroids and the rows of H can be viewed as the cluster indicators for the columns of A. Since the strong constraint of nonnegativity of A, NMF cannot be applied in many problems (e.g. our problem). To accommodate more scenarios, one of its extensions, seminonnegative matrix factorization (SemiNMF) is proposed for a real matrix A with only one constraint of nonnegativity of H [30]. Motivated by SemiNMF, we design a variant of semiNMF, which not only inherits the advantages of SemiNMF but also represents the underlying weakly balanced structure of comprehensive DDI network. The novel SemiNMF on DDI networks is stated formally as a kway clustering problem in the following.
Given a comprehensive DDI matrix \({\mathbf{A}}_{m \times m} \in {\mathbb{R}}\), we aim to find a community centroid matrix \({\mathbf{W}}_{m \times k} = [{\mathbf{w}}_{1} ,{\mathbf{w}}_{2} , \ldots ,{\mathbf{w}}_{k} ] \in {\mathbb{R}}\) and a community indicator matrix \({\mathbf{H}}_{m \times k} = \{ h_{ij} \} \in {\mathbb{R}}^{ + }\), whose product can well approximate the original matrix \({\mathbf{A}}^{ \pm } \approx {\mathbf{W}}^{ \pm } ({\mathbf{H}}^{ + } )^{T}\), where \(k \ll rank({\mathbf{A}})\) and the element \(h_{ij}\) denotes the likelihood that node i belongs to the jth community.
Furthermore, we anticipate that most interactions within drug communities are enhancive and most edges between drug communities are degressive. To avoid partitioning where most clusters contain only a few nodes, we also prefer that each cluster contains substantial nodes. As a result, we introduce two graph regularization items, including a withincommunity criterion \(Gr_{1}\) and a betweencommunity criterion \(Gr_{2}\), to encode the balanced structure of DDI network. They are defined as follows:
where \({\mathbf{h}}_{.c}^{{}}\) is the cth column vector in H, \({\mathbf{L}}^{ + } = {\mathbf{D}}^{ + }  {\mathbf{A}}^{ + }\), \({\mathbf{D}}^{ + }\) is the diagonal degree matrix of \({\mathbf{A}}^{ + }\), and \(\forall i,j.\;a_{ij}^{ + } = {{(a_{ij}  + a_{ij} )} \mathord{\left/ {\vphantom {{(a_{ij}  + a_{ij} )} 2}} \right. \kern0pt} 2}\), \(a_{ij}^{  } = {{(a_{ij}   a_{ij} )} \mathord{\left/ {\vphantom {{(a_{ij}   a_{ij} )} 2}} \right. \kern0pt} 2}\).
Inspired by [31], we combine them together and obtain
when \({\mathbf{W}} = {\mathbf{I}}\) and \({\hat{\mathbf{K}}} = \sigma {\mathbf{I}}  \eta ({\mathbf{A}}^{  } + {\mathbf{L}}^{ + } )\), it becomes
where \(\sigma ,\eta > 0\) control the sizes of clusters [32].
In addition, we introduce another regularization item \(Sr\) to control the sparsity of H such that the drug nodes in DDI network belong to as few communities as possible. It is defined as,
where \({\mathbf{1}}\) is the \(k \times k\) matrix, of which all elements are 1.
Integrating all the regularization items into the lowrank matrix decomposition, we design the balance regularized seminonnegative matrix factorization (BRSNMF) as,
Since the constraint is \({\mathbf{H}} \in {\mathbb{R}}^{ + }\), we leverage the Lagrangian function and the Karush–Kuhn–Tucker conditions to solve it by the updating rules as follows
where the operators \({\mathbf{X}}_{ + }^{{}} = {{({\mathbf{A}} + {\mathbf{A}})} \mathord{\left/ {\vphantom {{({\mathbf{A}} + {\mathbf{A}})} 2}} \right. \kern0pt} 2},{\mathbf{X}}_{  }^{{}} = {{({\mathbf{A}}  {\mathbf{A}})} \mathord{\left/ {\vphantom {{({\mathbf{A}}  {\mathbf{A}})} 2}} \right. \kern0pt} 2}\), \({\mathbf{A}}\) is the elementwise absolute operation on A, \(\odot\) and \(\div\) are the elementwise product and division between two matrices. The solution of BRSNMF is presented in Algorithm 1. Obviously, the variant of BRSNMF without Sr and Gr degrades exactly to SemiNMF. More technical details about SemiNMF can be found in [21, 33]. Similar to NMF and SemiNMF, BRSNMF provides an intrinsic clustering, where the columns of W play as cluster centroids and the rows of H can be viewed as cluster indicators.
To reflect how well a signed network is partitioned into communities, the clustering is globally measured by a community balance index CBI, which is a community sizeweighted average number of balanced triads in community. It is defined as
where \(\# PPN_{c}\) is the number of unbalanced triads and \(\# triads_{c}\) is the total number of triads in community c, \(n_{c}\) denotes the community size and k is the total number of communities in the clustering. The greater the value of CBI, the better the clustering.
In addition, we define two local metrics, CommunityWithin Difference (Δ_{w}) and CommunityBetween Difference (Δ_{b}), as \(\Delta_{w} = \ln (R_{e}^{w} )  \ln (R_{d}^{w} )\) and \(\Delta_{b} = \ln (R_{e}^{b} )  \ln (R_{d}^{b} )\), where \(R_{e}^{w}\) is the ratio of enhancive DDIs to all the drug pairs,\(R_{d}^{w}\) is the ratio of degressive DDIs to all the drug pairs within a community. Similarly, \(R_{e}^{b}\) and \(R_{d}^{b}\) are two corresponding ratios between two communities. The larger difference, the more enhancive DDIs; the smaller the difference, the more degressive DDIs.
BRSNMFbased approaches for predicting potential comprehensive DDIs of new drugs
In this section, we show how to make use of BRSNMF to predict potential comprehensive DDIs focusing on the scenario of DDI prediction between ‘new drugs’ (without known DDIs) and ‘approved drugs’ (drugs with known DDIs) as the prediction problem is known to be difficult if new drugs are involved (Fig. 2a). New drugs can be regarded as isolated nodes in the DDI network [21]. This prediction scenario is analogous to the wellknown coldstart problem in social recommendation [34]. Such a prediction requires additional properties (or features) to relate new drugs with approved drugs. Unlike most of protein–protein interactions or drug–target interactions [35], pharmacological DDIs are not physical interactions (usually related to their chemical structures) between drugs, but indirect interactions which are mediated by proteins. Thus, we use drugprotein binding information as the features of the drugs to relate new drugs with approved drugs in the coldstart scenario. In addition, such a kind of features can capture particular pharmacological meanings of drug communities detected by BRSNMF (see the next section for more details).
We formally state the coldstart prediction problem as follows. Let \({\mathbf{D}} = \left\{ {d_{i} } \right\},i = 1,2, \ldots ,m\) be a set of m approved drugs, the interaction matrix of their DDI network be \({\mathbf{A}}_{m \times m} = \left\{ {a_{ij} } \right\}\), and \({\mathbf{D}}_{x} = \left\{ {d_{x} } \right\},x = 1,2, \ldots ,n\) be a set of n new drugs. Any of approved drugs \({\mathbf{D}}\) or new drugs \({\mathbf{D}}_{x}\), is represented as a pdimensional feature vector \({\mathbf{f}}_{i} = [f_{1} ,f_{2} , \ldots ,f_{p} ]\). All the drugs in \({\mathbf{D}}\) are sequentially stacked as an \(m \times p\) feature matrix \({\mathbf{F}}\). Similarly, the drugs in \({\mathbf{D}}_{x}\) are stacked as an \(n \times p\) feature matrix \({\mathbf{F}}_{x}\). Adopting the framework for the coldstart prediction in [21], Our BRSNMFbased approach in the scenario of predicting DDIs for new drugs includes a training phase and a predicting phase as follows and also illustrated in Fig. 2b.

1.
In the training phase, the approach obtains a matrix factorization \({\mathbf{A}}_{m \times m} \approx {\mathbf{W}}_{m \times k} \times ({\mathbf{H}}_{m \times k}^{{}} )^{T}\) BRSNMF and a linear regression \({\mathbf{H}}_{m \times k}^{{}} = {\mathbf{F}}_{m \times p}^{{}} \times {\mathbf{B}}_{p \times k}\) by Partial Least Square Regression (PLSR).

2.
In the predicting phase, the learned \({\mathbf{B}}_{p \times k}\) firstly maps \({\mathbf{F}}_{x}^{{}}\) into the \(n \times k\) latent space by \({\mathbf{H}}_{x}^{{}} = {\mathbf{F}}_{x}^{{}} \times {\mathbf{B}}\). Then the \(n \times m\) predicted interactions between the new drug and the approved drugs by
$${\mathbf{A}}_{x}^{{}} = {\mathbf{H}}_{x}^{{}} {\mathbf{W}}^{T} = ({\mathbf{F}}_{x}^{{}} {\mathbf{B}}){\mathbf{W}}^{T} .$$(11)
Specifically, PLSR combines the properties of PCA and multiple regression by projecting the predicted variables (drug cluster indicator matrix H) and the observable variables (features) to a new space, instead of finding hyperplanes of maximum variance between the response and independent variables. Thus, our BRSNMFbased approach, containing PLSR, implicitly considers the feature reduction, and it has only one parameter k to be tuned in the training phase (see also "Comprehensive DDI prediction in the coldstart scenario" section).
As shown in Fig. 2, the coldstart scenario requires the prediction of interactions between newly given drugs having no known DDI and a set of drugs interacting with each other in the form of a DDI network. To mimic such a scenario, we remove a part of drugs with their interactions from the dataset and attempt to predict their interactions by Step 2, while using the remaining drugs and their interactions to by Step 1 in each round of crossvalidation (CV). There is a slight difference between two typical CVs, leaveoneout CV (LOOCV) and nfold CV (nCV). LOOCV removes only one drug in each round whereas nCV randomly removes 1/n drugs. Their results have no significant difference when the samples are substantial.
The performance of DDI prediction under CV are illustrated by both the receiver operating characteristic curve (ROC) and the precision–recall curve (PR), and measured by the areas under them, denoted AUROC and AUPR respectively. As suggested by [36], AUPR is more appropriate than AUROC when the number of positive instances is significantly less than that of negative instances. The greater the values of AUROC and AUPR are, the better the prediction is. See their detailed calculation in [21]. In addition, under the consideration that noninteractions could be unknown drug pairs, Mean Percentile Ranking (MPR) is used as an extra performance metric when measuring DDI prediction. The smaller the value, the better the prediction. More technical details about MPR can be found in [37, 38].
Results and discussions
Dataset
We collect approved small molecular drugs and their DDIs from DrugBank [2, 39]. After collecting DDIs, we label enhancive DDIs by the keyword ‘increase’ or its synonyms and label degressive DDIs by the keywords ‘decrease’ or its synonyms according to the descriptions of DDI respectively. Two datasets, DB_V4 and DB_V5_Ex, are built according to the version of DrugBank as we need to use known DDIs to validate the accuracy of our prediction. All the drugs and DDIs in DB_V4 are included in DrugBank Version 4 [2], while all the drugs in DB_V5_Ex are newly included in DrugBank Version 5 but not found in DB_V4. The DDIs between the drugs in DB_V5_Ex and the drugs in DB_V4 are also extracted from DrugBank Version 5. The information of these two datasets is summarized in Table 1.
For all the drugs, we also collect their drugbinding proteins (DBP), including 1213 drug targets and 429 nontarget proteins, which play important roles in pharmacodynamic and pharmacokinetic processes of drugs. These proteins are used to investigate the pharmacological significance and leveraged as features so as to associate new drugs having no known with drugs having known DDIs in the prediction of comprehensive DDI. In the following sections, DB_V4 is first used to detect pharmacological communities ("Drug community partition" section). Then, it is used to validate the effectiveness of DBP features and train a predictive model of comprehensive DDIs while DB_V5_Ex is only used to validate the predicting model of our BRSNMPbased prediction method ("Comprehensive DDI prediction in the coldstart scenario" section).
Moreover, to verify our observation on the weakly balanced relationship among the drugs, we first make a statistics of triad types. Totally, the DDI network included in DB_V4 contains 50.96% PPP, 18.56% NNP, 7.11% NNN and 23.37% unbalanced PPN triads. Then, we investigate whether the subsampling of drugs influences the composition of the four triads. After removing a set of drugs (e.g. 1/20, 1/8, 1/4 and 1/2 drugs) randomly and the involving DDIs from DB_V4, we observe that the triad composition has no significant change. For instance, after we remove 1/8 drugs and their DDIs, the subnetwork of DDIs contains 51.18% PPP, 18.34% NNP, 7.16% NNN and 23.28% PPN triads. Last, we compare the DDI network with a randomized network, which is generated by randomly shuffling enhancive and degressive DDIs among the drugs. In such a randomized network, we observe a group of significant different triad compositions, which contain 55.6% balanced triads (including 33.1% PPP, 19.6% PNN, 2.9% NNN) and 44.4% unbalanced triads (PPN). The above pieces of evidence reveal that the real DDI network has an intrinsic property of weakly balanced relationship among drugs.
Drug community partition
In this section, we investigate the communities generated by BRSNMF and compared them with those generated by SemiNMF. Similar to the traditional clustering, kmeans, either our BRSNFM or SemiNMF require a parameter (k) to indicate the anticipated number of clusters in advance. In fact, clustering algorithms, no matter what they are, surely need a parameter to be specified. For example, centroidbased clustering algorithms (e.g. kmeans, kmedoids, fuzzy cmeans) need to specify the number of clusters (k); connectivitybased clustering algorithms (e.g. UPGMA) are able to provide a hierarchical clustering and still need a cutoff to determine the final clusters; distributionbased clustering algorithms (e.g. Gaussian mixture models) use a fixed number of distributions corresponding to the number of clusters; densitybased clustering algorithms (e.g. DBSCAN and Meanshift) define the clusters are areas of high density, which depends on a density criterion. Like kmeans, once the number of communities, k, is given, BRSNMF splits samples into k nonoverlapping groups. In the context of the comprehensive DDI network, BRSNMF partitions drugs into k communities.
Parameter tuning in community partition
Before performing the comparison, we check how the tuning parameters (α, β, η, and σ) in Formula 5 influence the clustering. Since β controls both σ and η (shown in Formula 5) simultaneously, we just tune α and β from 0.05, 0.25, 0.5, 1, and 5 respectively with fixing σ = 1 and η = 1.
First, we globally measure the influence by CBI (defined in Formula 10). By running the grid search of α and β, we obtain 25 values of CBI with each pair of α and β for a specific number of drug communities. Moreover, we measure the influence by two interaction ratioderived items, including \(SR_{{}}^{w} = R_{e}^{w} + R_{d}^{w}\) and \(DR_{{}}^{w} = R_{e}^{w}  R_{d}^{w}\). The first one denotes how dense the community is, while the second one reflects whether enhancive DDIs or degressive DDIs are dominant. Again, we obtain 25 pairwise values of \(SR_{{}}^{w}\) and \(DR_{{}}^{w}\) for each drug community in the case of a specific number of drug communities. The influence of these parameters on drug partition is measured by their standard deviations. The smaller the standard deviation, the less sensitive the partition to the parameters.
In the case of k = 3, for example, BRSNMF splits samples into 3 nonoverlapping drug communities, where both the first community and the third one are strongly balanced while the second one is weakly balanced. Overall, BRSNMF achieves CBI = 0.8958 ± 0.0080, which demonstrates that the balance across communities, on average, is less variable. On the other side, for the strongly balanced communities, their \(SR_{{}}^{w}\) are 0.3318 ± 0.0151 and 0.1422 ± 0.0015. Meanwhile, their \(DR_{{}}^{w}\) are 0.3057 ± 0.0181 and 0.1161 ± 0.0023. For the weakly balanced community, its \(SR_{{}}^{w}\) and \(DR_{{}}^{w}\) are 0.2938 ± 0.0225 and − 0.2307 ± 0.0282 respectively. These small standard deviations reflect that both the community dense and the dominant type of DDI in community changes trivially. Similar results are observed in other cases of k during the grid search of α and β. The experiments show that the generated communities in all the combinations of α and β are consistent.
To summarize, BRSNMF is robust to different values of parameters. Thus, for simplicity, we fixed all the tuning parameters with 1 (α = β = η = σ = 1).
Better drug community partitions achieved by BRSNMF
To demonstrate the superiority of BRSNMF, we run BRSNMF and SemiNMF to partition the comprehensive DDI network into communities respectively. First, we investigate the community sizes (drug numbers in community) when given different community numbers, where k = 2, 3, 4, 5, 6, 7, and 8 respectively (Table 2). In terms of community size, both Range and Standard Derivative measure the community partition (clustering). The smaller the value, the better the partition. Compared with SemiNMF, the results show that BRSNMF tends to generate the communities having both the smaller ranges and the smaller standard deviations significantly in terms of community size (Fig. 3 and Table 2). Specifically, all the communities generated by BRSNMF contain a substantial number of drugs, especially when k is large. For instance, in the case of k = 8, the smallest community generated by SemiNMF contains only 7 drugs while that generated by BRSNMF contains 45 drugs. In short, BRSNMF is able to partition drugs into the communities, of which each contains enough drugs and the number of its drugs is less dispersed across all the communities.
Moreover, we choose the case of three communities to take a deeper analysis, where SemiNMF generated three communities containing 1115, 151 and 296 drugs respectively while BRSNMF achieved the communities containing 469, 281 and 812 drugs respectively. We measure the communities generated by two approaches in terms of the global metric, CBI, defined in Formula 10. Our BRSNMF achieves 89.58% while SemiNMF achieves 82.54% in the case of 3 communities. We also measure them by two proposed local metrics, including communitywithin differences for each community and communitybetween differences for pairwise communities. The differences are grouped into matrices (Table 3), in which the diagonal entries list the values of Δ_{w} and the offdiagonal entries denote the values of Δ_{b}. The results show that the average Δ_{w} of strongly balanced communities achieved by SemiNMF and BRSNMF are 2.4437 and 2.7676 respectively and the average Δ_{b} are 1.0207 and 0.0952 respectively. According to our criteria about Δ_{w} and Δ_{b}, BRSNMF is significantly superior to SemiNMF (see also "Clustering by balance regularized seminonnegative matrix factorization" section ). In particular, except for two strongly balanced communities, BRSNMF is able to detect a weakly balanced community (its Δ_{w} < 0) whereas SemiNMF cannot. Compared with the whole DDI network, such a weakly balanced community shows a special triad composition that contains 0.85% PPP, 27.44% NNP, 67.19% NNN, and only 4.52% unbalanced PPN triads. In addition, after reordering the DDI matrix according to the communities generated by SemiNMF and BRSNMF respectively, we visualize these communities as two pseudocolor images, which provide an illustration consistent to Δ_{w} and Δ_{b} (Fig. 4b, c). Meanwhile, as a comparison, the original image of DDI matrix is also shown (Fig. 4a). In short, by capturing the intrinsic property of weakly balanced relationship among drugs, BRSNMF, compared with SemiNMF, is able to generate a better drug partition, where drugs within a cluster (drug community) tend to exhibit the strongly or weakly balanced relationship while drugs belonging to two different clusters tend to show the unbalanced relationship.
Pharmacological significance of balanced clusters
The generated clusters are valuable in clinics. Specifically, drugs attending in the multipledrug treatment would cause pharmacological changes due to their interactions. The result of pharmacological changes can be deduced if the drugs come from the same balanced community (usually forming a balanced lcycles), whereas it cannot be inferred if the drugs come from different communities. These pharmacological changes surely influence clinical medication, including dosage, medicine interval, therapeutic window, synergistic combination, and so on.
Furthermore, they are important to biology. The interaction between two drugs is always caused by their binding to common or functionally related proteins (DBP), which can be roughly grouped into target proteins and nontarget proteins. Drug targets are the proteins, which are bound by drugs to result in a desirable therapeutic effect, while nontarget proteins usually play varied roles, such as catalyzing chemical reactions involving a specific drug, shuttling drugs across cell membranes, or increasing the effectiveness of drug delivery to the target sites of pharmacological actions.
Their meaning, potential application and biological implication are depicted as follows.

1.
Meaning of balanced clusters
Assume that the drugs attend in a threedrug treatment and all the pairwise interactions between them change their serum concentration (SC). Such as a pharmacological change is enough to elucidate the meaning of balanced cluster though DDIs trigger varied pharmacological changes (i.e. the change of bioavailability, distribution, …) in reality. In this context, an enhancive interaction reflects the increment of SC while a degressive interaction indicates the decrement. We show a theoretical analysis of how the pharmacological changes derived from the drugs in a balanced cluster can be deduced in terms of drug triads as follows.
In a strongly balanced cluster, the pharmacological change (i.e. dose) of any drug in a triad (ideally a PPP triad or an NNP triad) surely causes the consistent influence on the triad. Let d_{i}, d_{j}, and d_{k} be three drugs in a triad. When the triad is a PPP triad, the slightly increasing dose of any of these drugs would increase the SCs of all of them, because any of them boosts the others. When the triad is an NNP, where both the interaction d_{i}–d_{j} and the interaction d_{i}–d_{k} are degressive and the interaction d_{j}–d_{k} is enhancive, the slightly increasing dose of d_{i} would decrease the SCs of d_{j} and d_{k} while the slightly decreasing dose of d_{j} or d_{k} would increase the SC of d_{i}. Obviously, the changes on the NNP triad from two sides are consistent as well.
In a weakly balanced cluster, only the coinstantaneous changes of all drugs in a triad (ideally an NNN triad) can generate a consistent influence on the triad, or it generates an unpredictable influence. When the triad consisting of d_{i}, d_{j} and d_{k} is an NNN triad, the slightly increasing dose of d_{i} would decrease the SCs of d_{j} and d_{k} however the degressive interaction between d_{j} and d_{k} could trigger an opposite influence on d_{j} or d_{k}. Obviously, two conflicted influences from two sides would result in a final unpredictable influence. The only possible condition to generate consistent influence on the triad is to increase doses of d_{i}, d_{j} and d_{k} with the right proportion to their original doses.
Remarkably, the dose change of drugs in an unbalanced triad (ideally a PPN triad) between two balanced clusters surely trigger unpredicted influences on the triad. The similar interpretation to that of NNN triads can be drawn, but there is no condition to generate a consistent influence on the triad.
Similarly, it is easy to make an extended interpretation of the pharmacological meaning in terms of balanced lcycles, which follows the naïve multiplication rule that the product of all the signs of a cycle’s edges is positive.

2.
Potential application of balanced clusters
The clusters can be directly applied with the consideration of drug intolerance. When multiple drugs in therapy are delivered throughout the body, any change triggered by their DDIs in the ADME (absorption, distribution, metabolism, and excretion) process would change their concentration in the blood.
In a strongly balanced cluster, for example, three drugs, Cyclosporine, Pravastatin, and Lovastatin forms a PPP triad, which increases their serum concentrations. Meanwhile, both of the first two have two degressive interactions with another drug, Efavirenz (an NNP triad). Since the pharmacological change of even one drug in the balanced triads definitely influences other drugs, a multipledrug treatment (e.g. the prophylaxis of graft rejection) involving them should investigate whether their interactions break their individual therapeutic windows, which are the differences between their minimum effective concentration (MEC) and minimum toxic concentration (MTC) respectively. When the concentration of a drug within the blood is less than its MEC, the drug cannot give rise to the intended therapeutic effect. When its concentration is greater than its MTC, the drug will trigger an unintended adverse drug event.
In addition, the clusters can be used to find synergistic drugs. For example, the pairwise interactions among Fluvoxamine (an antidepressant), Pregabalin (an anticonvulsant drug used for epilepsy and generalized anxiety disorder) and Magnesium sulfate (an anticonvulsant for preeclampsia and eclampsia) in a strongly balanced cluster can boost their therapeutic efficacies (a PPP triad). Therefore, their combination can be a potential synergistic multipledrug treatment.
In general, after integrating pharmacological knowledge of DDIs, these drug clusters can be applied to guide multipledrug treatments, such as optimizing drug doses, alerting risks and discovering synergistic drugs.

3.
Biological implication of balanced clusters
To understand the biological implication of the balanced clusters, we finally investigate both the drugs within clusters and those between clusters by exploiting DBPs, which play important roles in pharmacodynamic and pharmacokinetic processes of drugs. After counting the numbers of nontarget proteins and target proteins binding to each drug respectively, we calculate the averages of those numbers in each com cluster. The average numbers (a_{n}) of nontarget proteins binding to a drug are 2.35, 5.16 and 3.17, while the average numbers (a_{t}) of target proteins binding to a drug are 4.12, 2.87 and 2.64 in these three clusters respectively. The oneway analysis of variance across clusters on the two groups of numbers (with p value = 2.22e−16 and 1.68e−07 respectively) shows that the drugs in different drug communities bind to significantly different numbers of nontarget and target proteins on average. In particular, the investigation reveals interesting aspects: (1) drugs in the only weakly balanced community (the second one) tends to bind more nontarget proteins than target proteins; (2) drugs in the first strongly balanced community (containing 93.95% strongly balanced triads) tends to bind a fewer number of nontarget proteins but more target proteins. This observation, largely revealing the underlying mechanism of forming DDI, inspires us to propose a predictive model of comprehensive DDIs in the coldstart scenario.
Comprehensive DDI prediction in the coldstart scenario
Recall that we use drugbinding proteins (DBP) as features to perform DDI prediction (see also "BRSNMFbased approaches for predicting potential comprehensive DDIs of new drugs" section). Considering these proteins, we generate the proteinprofile feature as follows. Each drug is represented as a \(1 \times 1642\) binary vector, of which each element denotes whether or not a specific protein binds to it. Slightly different to community detection which focuses on balance structure DDI, DDI prediction emphasizes more on reconstruction error.
Parameter tuning in prediction
Before running the subsequent comparison, we first investigate how the parameter k (the dimension of latent space) influences the prediction by tuning its value from the list {rank(A)* (1/64, 1/32, 1/16, 1/10, 1/8, 1/6, 1/4, 1/2, 1/1)}, where A is the training DDI matrix. The investigation on DB_V4 under 10CV shows that the prediction is the best in the case of k = rank(A)/10 (Fig. 5). Meanwhile, this value also meets the need of lowrank matrix factorization. As a consequence, we use this value of k when performing the following coldstart prediction tasks, which require to infer the interactions between new drugs and approved drugs.
Coldstart DDI prediction boosted by DBPbased feature
To demonstrate the effectiveness of DBP, we compare the DBP feature with the popular PubChem fingerprint feature under both LOOCV and 10CV. The comparison is performed on DB_V4. Here, we adopt PubChem fingerprints (V 1.3) to represent each drug as a \(1 \times 881\) ordered binary vector, of which each element denotes whether a specific chemical substructure (fingerprint) is contained in the drug or not. These substructures involve hierarchic element counts, rings in a canonic extended smallest set of smallest rings, simple atom pairs, simple atom nearest neighbors, detailed atom neighborhoods, simple smarts patterns, and complex smarts patterns.
Both the ROC curve and the PR curve accounting for LOOCV are illustrated in Fig. 6. In addition, we make a comparison under 10CV and measured the prediction by the average AUROC and the average AUPR in all the rounds of 10CV. The prediction achieved by DBP achieves AUROC = 0.801 ± 0.019, AUPR = 0.634 ± 0.033 and MPR = 0.021 ± 0.017 while that achieved by PubChem fingerprints only achieves AUROC = 0.720 ± 0.018, AUPR = 0.455 ± 0.029 and MPR = 0.026 ± 0.018. The comparisons under both LOOCV and 10CV show that DBP is greatly superior to the PubChem fingerprints.
The results demonstrate the superiority of DBP features. The underlying reason is that pharmacological DDIs are not direct or physical bindings, which are usually related to drug structures, but they are indirect interactions where DBPs play as the mediator. This nature of DDIs is quite different from that of drug–target interactions [35], which heavily rely on the direct binding between drug structures and protein pockets.
For example, the interactions between Ritonavir and Saquinavir are mediated by intestinal CYP3A4. In details, Ritonavir increases the bioavailability (the fraction of an administered dose of the drug that reaches the systemic circulation) of HIV protease inhibitors (e.g. Saquinavir), because it strongly inhibits the activity of intestinal CYP3A4 (an enzyme DBP), which acts as a metabolizer of these HIV protease inhibitors so as to influence their absorption [40]. Furthermore, we calculated the Pearson correlation coefficients (PCC) between Ritonavir and Saquinavir with DBPbased features (PCC = 0.5961) and fingerprintbased features (PCC = 0.3624) respectively. The greater the PCC value, the better the features. The result shows that DBP is better than PubChem fingerprint when capturing the association between Ritonavir and Saquinavir.
On the other side, we check whether the higher dimension of features achieves a better prediction. First, after analyzing them by PCA, we find that the effective dimension (426) of DBP is actually less than that (576) of PubChem Fingerprint, though the former’s original dimension is greater than that of the latter. In addition, using the concatenation of DBPbased features and PubChem fingerprintbased features, we perform an extra experiment under 10CV. Compared with DBPbased features (AUROC = 0.801 ± 0.019 and AUPR = 0.634 ± 0.033), the result (AUROC = 0.804 ± 0.020 and AUPR = 0.636 ± 0.039) shows no significant improvement of DDI prediction. Obviously, compared with DBP, PubChem fingerprint doesn’t contain more information helpful to identify DDI. In short, the performance prediction doesn’t depend on the feature dimension but relies on the discriminant ability of feature, which reflects how well the feature can characterize DDI. Therefore, we believe that the proposed DBPbased feature is better than the popular fingerprintbased feature because the former is able to capture the nature of DDIs.
Accurate DDI prediction for new drugs by BRSNMFbased approach
To test the effectiveness of our BRSNMFbased approach in the real scenario of newly given drugs, we make a versionindependent validation, which uses the drugs in DB_V4 as the training drugs and those in DB_V5_Ex as the independent testing drugs respectively. The drugs pairs in DB_V4 are taken as training drug pairs, while the testing drug pairs are the pairs between the drugs in DB_V4 and the drugs in DB_V5_Ex.
According to DrugBank, both the training pairs and the testing pairs have real labels, which indicate interactions. In other words, we know the labels of the interactions between the drugs in DB_V4 and the drugs in DB_V5_Ex. Thus, we use those labels in DB_V5_Ex (V5.0 updated on 201776) to measure the prediction. Totally, there are 78.8% balanced triads (including PPP, NNP and NNN) and 21.2% unbalance triads (PPN) within the DDIs between DB_V4 and DB_V5_Ex. Again, DBP is used as drug features when running both our BRSNMFbased approach and the stateoftheart approach, DDINMF [21].
During measuring the predictions, we first sort the testing drug pairs according to their predicting scores (can be positive, negative or near zero) generated by the predictive approaches. Because the labels of enhancive DDIs, degressive DDIs and noninteractions are + 1, − 1, and 0 respectively, there are three expectations on predicting results. It is anticipated that (1) enhancive DDIs tend to have positive scores. The greater the predicting score, the higher the chance the drug pairs are enhancive DDIs; (2) degressive DDIs tend to have negative scores. The smaller the predicting score, the higher the chance the drug pairs are degressive DDIs; (3) noninteractions tend to have scores near to zero. The closer the value to zero, the higher the chance the drug pairs are noninteractive. In addition, the range of predicting scores also mainly depends on the value of parameter k. For example, the range of predicting scores generated by BRSNMFbased approach is [− 0.2873, 0.4691] in the case of k = 1 while that is [− 1.6770, 2.5097] in the case of k = rank(A)/2. The greater the value, the larger the range. Thus, it is inappropriate to set fixed cutoffs of scores to discriminant enhancive and degressive. We use the position in the sorted list of the testing drug pairs as the cutoff.
Then, topn out of predicted DDIs are selected out and checked for enhancive DDIs. According to their real labels in DrugBank, the drug pairs with positive labels among the topn candidates are counted. The accuracy of predicting enhancive DDIs is defined as the number of such drug pairs over n. Similarity, the number of drug pairs with negative labels among the bottomn divided by n is just the accuracy of predicting degressive DDIs. In addition, since DrugBank updates itself every half year, considering some entries in DB_V5_Ex are updated, we further double check the prediction by the labels provided by the latest version of DrugBank (V5.1.1 updated on 201888).
Finally, the prediction performance is measured in the case of n = 5, 10, 20, 30, 40 and 50 respectively (see the detailed results in Additional file 1: VersionIndTest.xslx). The ratios of correctly predicted DDIs are reported to measure the performance of the test (Table 4). The results show our BRSNMFbased approach achieves 94% accuracy among top50 enhancive candidates and 86% accuracy among bottom50 degressive candidates respectively. Like the metrics used in LOOCV, we also report the values of both AUROC and AUPR as the overall performance in the novel prediction. The overall performance of prediction achieved by our BRSNMF is AUROCAUPR = 0.6450.346 whereas that achieved by DDINMF is AUROCAUPR = 0.5970.299. In summary, it is demonstrated that our BRSNMF is significantly superior to DDINMF in a real application.
Though both the prediction achieved by our BRSNMF in the top50 and that in the bottom50 are inspiring, it is noted that the overall performance of prediction can still be improved. For this reason, we investigate those incorrectly predicted DDIs. After checking them case by case, we dig out three causes of wrong predictions.
The first is named as false positive drug pairs, which are inaccurately labeled as DDIs in DrugBank Version 4 but correctly labeled as nonDDIs in DrugBank Version 5. For example, the older version of DrugBank records that Apraclonidine (a sympathomimetic used in glaucoma therapy) increases the atrioventricular blocking activities of Alprenolol and Bevantolol, whereas the newer version removes it.
The second one is, on the contrary, called as false negative drug pairs, which are wrongly labeled as nonDDIs in DrugBank Version 4 but are corrected as newly reported DDIs in DrugBank Version 5 (e.g. the pair of Valrubicin and Ciclosporin as well as the pair of Ergocalciferol and Calcitriol). As the newer version of DrugBank reports, Valrubicin (for treating bladder cancer) increases the nephrotoxic activities of Cyclosporine (a powerful immunosuppressant with a specific action on Tlymphocytes), while the combined therapy of Calcitriol and Ergocalciferol increases the risk or severity of adverse effects in the multipledrug therapy.
The last one refers to missing DBPs. Some DBPs are not collected in DrugBank such that two interacting drugs (e.g. Ritonavir and Darunavir; Amiodarone and Sofosbuvir) have no common DBPs in the dataset. However, Ritonavir increases the bioavailability of Darunavir in fact, because it strongly inhibits the activity of intestinal CYP3A4 (a DBP), which acts as a metabolizer of Darunavir so as to influence its absorption in HIV therapy [40]. Similarly, the preferential binding of Amiodarone to Albumin (one of plasma proteins) forces Sofosbuvir to redistribute and bind to other unexpected proteins, such that an unexpected adverse effect (severe symptomatic bradycardia) occurs when Amiodarone joins into Sofosbuvircontaining HCV therapy [41].
Therefore, it is anticipated to improve the existing prediction by two ways in the coming future. One is to build a better dataset containing a fewer number of both false positive drug pairs and false negative drug pairs. Another is to recover missing DBPs or update DBPs for drugs.
Conclusions
It is more useful to know whether or not a drug pair is an enhancive DDI or a degressive DDI than to know whether or not a drug pair is a DDI. Without considering the pharmacological changes caused by DDIs, most existing approaches only report a binary prediction. Furthermore, the occurrence of both enhancive and degressive DDIs is not random but follows a weakly balanced relationship. However, none of existing approaches investigates and leverages this intrinsic property, which is one of the most crucial steps to understand highorder DDIs (involving three or more drugs) when treating complex diseases [7].
In this work, after representing the comprehensive DDI network containing pharmacological changes as a signed network, we’ve leveraged its weakly balanced structure to design a novel algorithm of balance regularized seminonnegative matrix factorization (BRSNMF). First, the proposed algorithm has been directly applied to detect drug communities. The comparison with the traditional SemiNMF shows that each of the drug communities achieved by BRSNMF contains substantial drugs and their sizes have less dispersion. More importantly, these communities exhibit the weakly balanced relationship among drugs as well as their pharmacokinetic and pharmacodynamic significance in terms of drugbinding proteins. This finding helps to understand how highorder DDIs work.
Secondly, focusing on the scenario of predicting DDIs for newly given drugs, BRSNMF has been used to design a predictive approach for comprehensive DDI prediction. The experiments under LOOCV and 10CV show that our DBP features are much better than popular PubChem fingerprints because pharmacological DDIs are not structurederived interactions between drugs, but indirect proteinmediated interactions. Moreover, the versionindependent test demonstrates that our BRSNMFbased predictive approach achieved the inspiring prediction of comprehensive DDIs and outperforms the stateoftheart approach DDINMF due to its explicit modeling of the weakly balanced relationship among drugs. This predictive approach helps screen DDIs with the change of pharmacological effects.
Finally, it is anticipated that the BRSNMFbased approach will be able to achieve better DDI prediction by the better dataset with a fewer of both false positive drug pairs and false negative drug pairs, as well as more drug features from other drug attributes, especially proteinrelated properties (e.g. proteinprotein network, side effects, ATC) in the coming future.
Abbreviations
 DDI:

drug–drug interaction
 BRSNMF:

balance regularized seminonnegative matrix factorization
 DBP:

drugbinding protein
 NMF:

nonnegative matrix factorization
 SemiNMF:

seminonnegative matrix factorization
 PLSR:

partial leastsquares regression
 AUROC:

the area under the receiver operating characteristic curve
 AUPR:

the area under precision–recall curve
 CV:

crossvalidation
 LOOCV:

leaveoneout crossvalidation
References
 1.
Wienkers LC, Heath TG (2005) Predicting in vivo drug interactions from in vitro drug discovery data. Nat Rev Drug Discov 4(10):825–833
 2.
Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V et al (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res 42(Database issue):D1091–D1097
 3.
Leape LL, Bates DW, Cullen DJ, Cooper J, Demonaco HJ, Gallivan T, Hallisey R, Ives J, Laird N, Laffel G et al (1995) Systems analysis of adverse drug events. ADE Prevention Study Group. Jama 274(1):35–43
 4.
Businaro R (2013) Why we need an efficient and careful pharmacovigilance. J Pharmacovigil 1:4
 5.
Karbownik A, Szałek E, Sobańska K, Grabowski T, Wolc A, Grześkowiak E (2017) Pharmacokinetic drug–drug interaction between erlotinib and paracetamol: a potential risk for clinical practice. Eur J Pharm Sci 102:55–62
 6.
Mulroy E, Highton J, Jordan S (2017) Giant cell arteritis treatment failure resulting from probable steroid/antiepileptic drug–drug interaction. N Z Med J 130(1450):102–104
 7.
Cokol M, Kuru N, Bicak E, LarkinsFord J, Aldridge BB (2017) Efficient measurement and factorization of highorder drug interactions in Mycobacterium tuberculosis. Sci Adv 3(10):e1701881
 8.
Zhao XM, Iskar M, Zeller G, Kuhn M, van Noort V, Bork P (2011) Prediction of drug combinations by integrating molecular and pharmacological data. PLoS Comput Biol 7(12):e1002323
 9.
Veith H, Southall N, Huang R, James T, Fayne D, Artemenko N, Shen M, Inglese J, Austin CP, Lloyd DG et al (2009) Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries. Nat Biotechnol 27(11):1050–1055
 10.
Huang SM, Temple R, Throckmorton DC, Lesko LJ (2007) Drug interaction studies: study design, data analysis, and implications for dosing and labeling. Clin Pharmacol Ther 81(2):298–304
 11.
Zhang P, Wang F, Hu J, Sorrentino R (2015) Label propagation prediction of drug–drug interactions based on clinical side effects. Sci Rep 5:12339
 12.
Wiśniowska B, Polak S (2016) The role of interaction model in simulation of drug interactions and QT prolongation. Curr Pharmacol Rep 2(6):339–344
 13.
Zhou D, Bui K, Sostek M, AlHuniti N (2016) Simulation and prediction of the drug–drug interaction potential of naloxegol by physiologically based pharmacokinetic modeling. CPT Pharmacomet Syst Pharmacol 5(5):250–257
 14.
Bui QC, Sloot PMA, van Mulligen EM, Kors JA (2014) A novel featurebased approach to extract drug–drug interactions from biomedical text. Bioinformatics 30(23):3365–3371
 15.
Zhang Y, Wu HY, Xu J, Wang J, Soysal E, Li L, Xu H (2016) Leveraging syntactic and semantic graph kernels to extract pharmacokinetic drug–drug interactions from biomedical literature. BMC Syst Biol 10(Suppl 3):67
 16.
Duke JD, Han X, Wang ZP, Subhadarshini A, Karnik SD, Li XC, Hall SD, Jin Y, Callaghan JT, Overhage MJ et al (2012) Literature based drug interaction prediction with clinical assessment using electronic medical records: novel myopathy associated drug interactions. PLoS Comput Biol 8(8):e1002614
 17.
Vilar S, Uriarte E, Santana L, Lorberbaum T, Hripcsak G, Friedman C, Tatonetti NP (2014) Similaritybased modeling in largescale prediction of drug–drug interactions. Nat Protoc 9(9):2147–2163
 18.
Cheng F, Zhao Z (2014) Machine learningbased prediction of drug–drug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties. J Am Med Inform Assoc JAMIA 21(e2):e278–e286
 19.
Ryu JY, Kim HU, Lee SY (2018) Deep learning improves prediction of drug–drug and drug–food interactions. Proc Natl Acad Sci USA 115:E4304–E4311
 20.
Shi JY, Li JX, Mao KT, Cao JB, Lei P, Lu HM, Yiu SM (2019) Predicting combinative drug pairs via multiple classifier system with positive samples only. Comput Methods Programs Biomed 168:1–10
 21.
Yu H, Mao KT, Shi JY, Huang H, Chen Z, Dong K, Yiu SM (2018) Predicting and understanding comprehensive drug–drug interactions via seminonnegative matrix factorization. BMC Syst Biol 12(s1):14
 22.
Luo H, Zhang P, Huang H, Huang J, Kao E, Shi L, He L, Yang L (2014) DDICPI, a server that predicts drug–drug interactions through implementing the chemical–protein interactome. Nucleic Acids Res 42(Web Server issue):46–52
 23.
Shi JY, Huang H, Li JX, Lei P, Zhang YN, Dong K, Yiu SM (2018) TMFUF: a triple matrix factorizationbased unified framework for predicting comprehensive drug–drug interactions of new drugs. BMC Bioinform 19(S14):411
 24.
KochWeser J (1981) Serum drug concentrations in clinical perspective. Ther Drug Monit 3(1):3–16
 25.
Facchetti G, Iacono G, Altafini C (2011) Computing global structural balance in largescale signed social networks. Proc Natl Acad Sci USA 108(52):20953–20958
 26.
Harary F (1953) On the notion of balance of a signed graph. Mich Math J 2(2):143–146
 27.
Davis JA (1967) Clustering and structural balance in graphs. Hum Relat 20(2):181–187
 28.
Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: The 19th international conference on world wide web. ACM, New York, pp 641–650
 29.
Shi JB, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal 22(8):888–905
 30.
Lee DD, Seung HS (2001) Algorithms for nonnegative matrix factorization. Adv Neural Inf Process Syst 13:556–562
 31.
Traag VA, Bruggeman J (2009) Community detection in networks with positive and negative links. Phys Rev E 80(3):036115
 32.
Dhillon IS, Guan YQ, Kulis B (2007) Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans Pattern Anal 29(11):1944–1957
 33.
Ding C, Li T, Jordan MI (2010) Convex and seminonnegative matrix factorizations. IEEE Trans Pattern Anal 32(1):45–55
 34.
Camacho LAG, AlvesSouza SN (2018) Social network data to alleviate coldstart in recommender system: a systematic review. Inf Process Manag 54(4):529–544
 35.
Shi JY, Yiu SM, Li YM, Leung HCM, Chin FYL (2015) Predicting drug–target interaction for new drugs using enhanced similarity measures and supertarget clustering. Methods 83:98–104
 36.
Jiao Y, Du P (2016) Performance measures in evaluating machine learning based bioinformatics predictors for classifications. Quant Biol 4(4):320–330
 37.
Hao M, Bryant SH, Wang Y (2018) A new chemoinformatics approach with improved strategies for effective predictions of potential drugs. J Cheminform 10(1):50
 38.
Hao M, Bryant SH, Wang Y (2018) Opensource chemogenomic datadriven algorithms for predicting drug–target interactions. Brief Bioinform. https://doi.org/10.1093/bib/bby010
 39.
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z et al (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46(D1):D1074–D1082
 40.
Hill A, van der Lugt J, Sawyer W, Boffito M (2009) How much ritonavir is needed to boost protease inhibitors? Systematic review of 17 doseranging pharmacokinetic trials. Aids 23(17):2237–2245
 41.
Back DJ, Burger DM (2015) Interaction between amiodarone and sofosbuvirbased treatment for hepatitis C virus infection: potential mechanisms and lessons to be learned. Gastroenterology 149(6):1315–1317
Authors’ contributions
JYS and HY conceived and designed the experiments, and draft the manuscript. KTM collected the dataset and performed the experiments. JYS and SMY analysed the results. JYS contributed materials/analysis tools and developed the codes used in the analysis. SMY helped to draft the manuscript. JYS is the corresponding author. All authors read and approved the final manuscript.
Acknowledgements
The author would like to thank the reviewers for their constructive comments that help make the paper much clearer.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
Codes for this work are available in https://github.com/JustinShi2016/BRSNMF.
Funding
This work has been supported by the National Natural Science Foundation of China (No. 61872297), the Seed Foundation of Innovation and Creation for Graduate Students in Northwestern Polytechnical University (Nos. ZZ2018170, ZZ2018235), China National Training Programs of Innovation and Entrepreneurship for Undergraduates (No. 201710699330), and the Program of Peak Experience of Northwestern Polytechnical University (2016).
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Shi, J., Mao, K., Yu, H. et al. Detecting drug communities and predicting comprehensive drug–drug interactions via balance regularized seminonnegative matrix factorization. J Cheminform 11, 28 (2019). https://doi.org/10.1186/s1332101903529
Received:
Accepted:
Published:
Keywords
 Drug–drug interaction
 Weak balance theory
 Seminonnegative matrix factorization
 Regularization
 Community