 Methodology
 Open Access
 Published:
MDDISCL: predicting multitype drugdrug interactions via supervised contrastive learning
Journal of Cheminformatics volume 14, Article number: 81 (2022)
Abstract
The joint use of multiple drugs may cause unintended drugdrug interactions (DDIs) and result in adverse consequence to the patients. Accurate identification of DDI types can not only provide hints to avoid these accidental events, but also elaborate the underlying mechanisms by how DDIs occur. Several computational methods have been proposed for multitype DDI prediction, but room remains for improvement in prediction performance. In this study, we propose a supervised contrastive learning based method, MDDISCL, implemented by threelevel loss functions, to predict multitype DDIs. MDDISCL is mainly composed of three modules: drug feature encoder and mean squared error loss module, drug latent feature fusion and supervised contrastive loss module, multitype DDI prediction and classification loss module. The drug feature encoder and mean squared error loss module uses selfattention mechanism and autoencoder to learn druglevel latent features. The drug latent feature fusion and supervised contrastive loss module uses multiscale feature fusion to learn drug pairlevel latent features. The prediction and classification loss module predicts DDI types of each drug pair. We evaluate MDDISCL on three different tasks of two datasets. Experimental results demonstrate that MDDISCL achieves better or comparable performance as the stateoftheart methods. Furthermore, the effectiveness of supervised contrastive learning is validated by ablation experiment, and the feasibility of MDDISCL is supported by case studies. The source codes are available at https://github.com/ShenggengLin/MDDISCL.
Introduction
The use of multiple drugs, often termed as polypharmacy, is a therapeutic approach to treat various complex diseases [1, 2]. However, polypharmacy can lead to drugdrug interactions (DDIs), in which the pharmacological effect of a drug is altered by another drugs [3,4,5]. It has been estimated that DDIs are associated with 30% of all the reported adverse drug events (ADEs) which may result in the majority of incidence and mortality, and even drug withdrawal from the market, incurring huge medical expense due to the stringent demands on drug development [6]. Therefore, it is necessary to reliably identify DDIs and understand their underlying mechanisms, which will be beneficial for drug development in pharmaceutical companies and can provide important information on polypharmacy prescription for clinicians and patients. In vitro experiments and clinical trials can be conducted to identify DDIs, but systematic combinatorial screening of DDI candidates from a large pool of drugs by experimental techniques remains challenging, time and resourceconsuming.
In the last decades, there are increasing availability of scientific literature, electronic medical records, populationbased reports of adverse events, drug labels, and other related sources [7]. Researchers attempted to extract DDIs from scientific literature and electronic medical records via natural language processing (NLP) techniques [8, 9], infer potential DDIs by similaritybased methods based on known DDIs [10], and predict DDIs by leveraging machine learning [11], network modelling [12, 13], and knowledge graphs [14, 15]. However, most of these computational methods (except the extraction of DDIs via NLP methods) only consider whether a DDI occurs or not given a pair of drugs.
To facilitate the understanding of the causal mechanisms of DDIs, recent studies have developed multitype DDIs prediction methods to elaborate sufficient details beyond the chance of DDI occurrence [16]. The pioneering study by Ryu et al. constructed the gold standard DDI dataset from DrugBank [17], which covers 192,284 DDIs associated with 86 DDI types (changes in pharmacological effects and/or the risk of ADEs as a result of DDI) from 191,878 drug pairs [18]. Then, they formulated the multitype DDI prediction as a multilabel classification task and proposed DeepDDI by using deep neural network (DNN) based on structural information of chemical compounds for a drug pair. This architecture became a baseline for several other stateoftheart multitype DDI prediction methods, which improved the multitype DDI prediction by incorporating various types of biological information such as drug targets and enzymes to represent a drug pair in addition to the structural information of drugs based on autoencoder or the encoder module of transformer for learning the lowdimensional latent features and DNN algorithms for classification [19,20,21]. It should be noted that those methods represent the feature vector of a drug by the similarity profile, which is generated by the similarity (i.e., structural similarity) of a given drug against each one in the rest of drugs across the entire dataset. More recently, Deng et al. used fewshot learning based on the latent features from a pair of drug structures to improve the prediction performance on rare types of DDIs which have few samples [22]. Liu et al. proposed the method CSMDDI, which first generates the embedding representations of drugs and DDI types and then learns a mapping function to bridge the drugs attributes to their embeddings to predict multitype DDIs [23]. Feng et al. proposed deepMDDI, which consists of an encoder by deep relational graph convolutional networks constraining with similarity regularization to capture the topological features of DDI network and a tensorlike decoder for multilabel prediction of DDI types [24]. Yang et al. proposed a substructureaware graph neural network, utilizing a messagepassing neural network with a novel substructure attention mechanism and a substructuresubstructure interaction module for DDI prediction [25].
With the increasing availability of large biomedical knowledge graphs (KGs), some studies attempt to incorporate KG with other data (i.e., drug molecular structures) for multitype DDI predictions via graph neural networks (GNNs) [26, 27]. However, there are data redundancy and noise in the large KGs, in which only a small subgraph is relevant to a prediction target [28, 29]. Thus, the KGbased prediction methods for DDIs are still at the infant stage.
Although these published methods have achieved some success in multitype DDI prediction, there still exist some limitations. First, datasets of DDI types are extremely unbalanced, and these methods have poor performance in predicting rare types with fewer samples. Second, most methods perform well in predicting unknown DDI types between known drugs, but they often fail to do it for new drugs. It will be useful to develop the new methods to resolve the problems and further improve the prediction performance.
Since the labelled data is limited and expensive to obtain, contrastive learning has recently become a popular and powerful strategy to get quality representations of samples in a selfsupervised way. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples [30]. Contrastive learning is not only used for selfsupervised tasks, but also for supervised tasks. Khosla et al. extend the selfsupervised batch contrastive approach to the fullysupervised setting, allowing models to effectively leverage label information [31]. For supervised contrastive learning, the samples belonging to the same class are pulled together in embedding space, while simultaneously pushing apart samples from different classes [31, 32].
Contrastive learning has been successfully applied in the field of bioinformatics [33,34,35,36,37,38]. In this study, we propose a new method named MDDISCL for multitype DDI prediction, which is based on Supervised Contrastive Learning (SCL) and threelevel loss functions. MDDISCL (Fig. 1) mainly includes three parts: drug feature encoder and mean squared error (MSE) loss module, drug latent feature fusion and supervised contrastive loss module, DDI type prediction and classification loss module. Specifically, we first input the drugs into the drug encoder to obtain the lowerdimensional latent features of each drug by MSE. Then, the latent features of two drugs are combined as input into the feature fusion module to obtain the latent features of the drug pairs. Supervised contrastive loss can make the features of the same type of DDIs more similar, and the features of DDIs from different types more different. Therefore, we can obtain features that are more powerful to classification by using contrastive loss in the feature fusion module. Finally, we input the latent features of each drug pair into the multitype DDI prediction module to predict DDI types, and update the model parameters by the classification loss.
Experimental results demonstrate that MDDISCL achieves better performance than several stateoftheart methods on all three tasks of two different datasets. Additionly, we also proved the effectiveness of supervised contrastive learning for multitype DDI prediction. More importantly, results of the case studies validated the feasibility of our method in practice.
Materials and methods
Datasets
In this study, we use two datasets with the number of samples at a different scale. The first dataset (Dataset1) is the benchmark dataset that Deng et al. collected [20]. Dataset1 contains 572 drugs with 74, 528 pairwise DDIs, which are associated with 65 DDI types. Each drug in Dataset1 has four types of features: chemical substructures, targets, pathways and enzymes, which are extracted from DrugBank [39]. The second dataset (Dataset2) is the dataset from the study of Lin et al. [21]. Dataset2 contains 1, 258 drugs with 323, 539 pairwise DDIs, which are associated with 100 DDI types. Each drug in Dataset2 has three types of features: substructures, targets and enzymes.
Drug feature representation
Each feature type of a drug corresponds to a set of descriptors, so one drug can be represented by a binary feature vector, and its value (1 or 0) indicates the presence or absence of the corresponding element.
These feature vectors have high dimensionality with values of most of dimensions being 0. Therefore, we represent the feature vector of a drug by the similarity profile, which is generated by the similarity of drug A against each one (i.e., drug B) in the rest of drugs in the dataset [18]. Jaccard similarity is calculated by the following equation,
where A and B are original bit vectors of two drugs; A ∩ B is the number of elements in the intersection of A and B; A ∪ B is the number of elements in the union of A and B.
Based on the Jaccard similarity, in Dataset1, each type feature of a drug is represented as a 572dimensional vector. Therefore, each drug with four type of features is represented by a 4*572dimentional vector. In the similar way, each drug is represented as a 3*1258dimensional vector in Dataset2.
Drug feature encoder and mean squared error loss
The drug feature encoder module mainly includes multihead selfattention layers and an autoencoder. The multihead selfattention layers can focus on more important drug features [40, 41], and further the autoencoder performs feature dimensionality reduction [42, 43]. Consequently, lowerdimensional and better drug representations can be obtained through the drug feature encoder module. We use mean squared error loss to update the parameters of the feature encoder module.
Multihead selfattention mechanism and autoencoder
The detailed description of the multihead selfattention mechanism and autoencoder is provided in the Additional file 1 [41]. In the model, the hidden features obtained through the multihead selfattention layers are denoted as DA1 and DB1 for a pair of drugs (i.e., drug A and drug B), as shown in Fig. 1A. The encoder of autoencoder has two linear layers. The output vectors of the first linear layer are denoted as DA2 and DB2, and the output vectors of the second linear layer are denoted as DA3 and DB3.
Mean squared error
Mean squared error is commonly used as regression loss function, which calculates average squared difference between the observed and predicted values. In our model, MSE is the sum of squared distances between the drug feature vector and the output vector of decoder divided by the feature dimensionality. The MSE is calculated by following formula,
where fea_dim is the feature dimensionality of the drug, val_{i} is the value of each dimension of the drug feature vector, val_{i}^{~} is the value of each dimension of the output vector of the decoder.
Drug latent feature fusion and supervised contrastive loss
The drug latent feature fusion module mainly includes two submodules: multiscale feature fusion and latent feature dimensionality reduction. The multiscale feature fusion submodule can simultaneously combine the lowlevel features and highlevel features of a drug pair, and the feature dimensionality reduction submodule can further fuse latent features and reduce the feature dimensionality. The supervised contrastive learning loss function is utilized to update the parameters of the drug latent feature fusion module.
Multiscale feature fusion submodule
A drug pair contains two drugs (i.e., drug A and drug B). Through the drug feature encoder module, three latent features of drug A are obtained: DA1, DA2, and DA3, as shown in Fig. 1A. Similarly, we can acquire three latent features of drug B: DB1, DB2, and DB3. DA1 and DB1 are lowlevel features, which usually contain more detailed information but also more noise [44, 45]. DA3 and DB3 are highlevel features. Normally, highlevel features have more semantic information and less noise but lose a lot of detailed information [45,46,47,48]. Thus, in order to better integrate the advantages of lowlevel features and highlevel features, we concatenate DA1 and DB3, DA2 and DB2, DA3 and DB1 to represent a drug pair, respectively. Then, we input the concatenated features into the fully connected layer to obtain the fused drug pair features FD1, FD2, and FD3, as shown in Fig. 1B.
Latent feature dimension reduction submodule
When the neural network becomes deep, residual connection can be used to avoid the problem of vanishing gradient [ 49 ]. In this submodule, the output (DA3 and DB3) of encoder and the output (FD1, FD2 and FD3) of multiscale feature fusion submodule are concatenated as input into the latent feature dimensionality reduction submodule, which mainly includes multihead selfattention layers and linear layers. The number of neurons for each linear layer is half of the former layer. Multihead selfattention has been introduced in detail in “Multihead selfattention mechanism and autoencoder” section. The output vector of latent feature dimensionality reduction submodule is named CFV, as shown in Fig. 1B.
Supervised contrastive loss
Contrastive learning includes unsupervised contrastive learning and supervised contrastive learning. The latent features of samples obtained by unsupervised contrastive learning have the following property: the features of samples from the same source are more similar, whereas the features of samples from different sources are more different [50]. However, one significant disadvantage of unsupervised contrastive learning is that it does not consider the correlation of features between samples from different sources yet belonging to the same class. To overcome this drawback of unsupervised contrastive learning, supervised contrastive learning is proposed. The latent features of samples obtained by supervised contrastive learning have the following property: the features of samples belonging to same type are more similar, while the features of samples of different types are more different [31, 51].
Considering that the DDI type prediction task is a multiclass classification task, supervised contrastive learning is more competent for this task. Accordingly, our model employs supervised contrastive learning. The loss function of supervised comparative learning in our model can be calculated by the following formula,
where N_{batchsize} is the number of samples in each batch, y_{i} is the class label of sample i, and y_{j} is the class label of sample j. N_{yi} is the number of samples of class y_{i} in the same batch. sim is a function that measures the similarity of two vectors, such as cosine similarity. CFV_{i}, CFV_{j}, CFV_{k} are the latent feature vector, which are the output vector of latent feature dimensionality reduction submodule of sample i, j, and k, respectively. τ ∈ R^{+} is a scalar temperature parameter. According to the above formulas, in order to make the l_{i}^{con} loss smaller, the value of sim(CFV_{i,} CFV_{j}) will be larger. So the hidden vectors CFV_{i} and CFV_{j} must be more similar. CFV_{i} and CFV_{j} are the latent vectors of the same type samples, so the latent features of the same type samples are more similar.
Multitype DDI prediction and classification loss
The module employs two fully connected layers to predict DDI types, and the number of neurons in the second fully connected layer is the number of DDI types. DDI type prediction is a multiclass classification task, and the sample size of each class is not balanced. Since focal loss can partially solve the problem of sample imbalance [21], we use focal loss [52] and crossentropy loss as our classification loss functions. In detail, we choose the crossentropy loss as our classification loss function in the first one third of training steps, and apply focal loss as our classification loss function in the last two thirds of steps. Therefore, the total loss function of the model is as follows:
, where x is the feature vector of the drug pair, x ~ is the output vector of the decoder, CFV is the output vector of latent feature dimensionality reduction submodule, y is the class label of sample, and y ~ is the predicted value of sample. l_{MSE} is MSE loss function, l_{con} is supervised contrastive learning loss function and l_{cla} is classification loss function. l_{cla} is composed of the crossentropy loss in the first one third of training steps and focal loss in the last two thirds of steps.
In order to prevent overfitting, the label smoothing strategy is implemented [53]. For multiclassification problems, the class label vector is often converted into onehot vector. However, the onehot vector may weaken the generalization ability of the model and result in overfitting. Label smoothing uses the smoothing parameter to add noise to the onehot encoding, making the model less confident about its predictions. Therefore, it can partially solve the problem of overfitting.
We utilize Gaussian error linear unit activation function and Radam optimizer [54]. The dropout layer and batch normalization layer are placed between the fully connected layers [55].
Results and discussion
Experimental settings of prediction tasks
This study evaluated the multitype DDI prediction tasks based on three experimental settings: (i) prediction of unobserved interaction types between known drugs (Task1); (ii) prediction of interaction types between known drugs and new drugs (Task2) and (iii) prediction of interaction types between new drugs (Task3). New drugs in the corresponding task are missing in the training set, but exist in the test set.
For Task1, we apply fivefold crossvalidation (5CV) to DDI types and split all DDI types into five subsets. We train models based on DDI types in the training set, and then make predictions for DDI types in the test set. For Task2 and Task3, we apply 5CV to drugs instead of DDI types. We randomly split drugs into five subsets, and used four of them as training drugs, leaving the remaining one as test drugs. For Task2, prediction models are constructed on the DDI types between two training drugs, and then make predictions for DDI types between training drugs and test drugs. For Task3, prediction models are built on the DDI types between two drugs in the training set to predict for DDI types between two drugs in the test set.
For model evaluation, accuracy (ACC), area under the precisionrecallcurve (AUPR), area under the ROC curve (AUC), F1 score, precision and recall are adopted as evaluation metrics. On highly imbalanced data sets, AUPR and F1 score metrics are more objective for model evaluation. Consequently, in the following discussion, we will focus on these two metrics.
Hyperparameters setting
The chosen of hyperparameters influences the performance of model. First, we discussed the settings of six hyperparameters on affecting the prediction performance on Task2 of Dataset1: smoothing parameter in the label smoothing strategy, temperature parameter in the contrastive learning, learning rate, batch size, training epochs and the epoch to change the crossentropy loss to focal loss. Task1 is a relatively simple task, while Task3 is a relatively difficult task. Thus, to ensure the versatility of the hyperparameters, we chose Task2 to tune the hyperparameters. For Task1 and Task3, we used the optimal parameters tuned on Task2. The performance metrics under different settings are shown in Fig. 2.
According to Fig. 2, the performance of the model does not change drastically as the hyperparameters change. Almost all metric scores vary within the range of 0.01. This also illustrates the stability of our model. In the end, we chose 0.3 for smoothing parameter, 0.05 for temperature parameter, 2e5 for learning rate, 512 for batch size, 120 for training epochs and the 40^{th} epoch to change the crossentropy loss to focal loss.
The prediction effect of multiscale feature fusion
In the drug latent feature fusion module, we tried three types of feature fusion methods. The first method is the singlescale feature fusion, which concatenates DA1 and DB1, DA2 and DB2, DA3 and DB3 as three assemblies. The second method is multiscale feature fusion. Correspondingly, we concatenate DA1 and DB3, DA2 and DB2, DA3 and DB1 as three assemblies. The third method is to use only DA3 and DB3 without feature fusion. We compared these three feature fusion methods on three tasks of Dataset1, as shown in Fig. 3.
On three tasks, the AUPR and AUC of the multiscale feature fusion method achieved the highest scores. In general, the performance of the multiscale feature fusion method is slightly better than the other two methods. Therefore, multiscale feature fusion is incorporated into the final model.
The prediction effect of supervised contrastive learning
In order to verify the effectiveness of supervised contrastive learning, we compared the performance of the model with and without supervised contrastive learning on three tasks of Dataset1, as shown in Table 1. The model with supervised contrastive learning achieved better performance in ACC, AUPR, and AUC on all three tasks. The AUPR of the model with supervised contrastive learning on Task2 is 0.6947 while the AUPR of the model without supervised contrastive learning on Task2 is 0.6765. The AUC of the model with supervised contrastive learning on Task3 is 0.0313 higher than that without supervised contrastive learning. In general, model with supervised contrastive learning achieves better prediction performance.
The prediction effect of focal loss
Focal loss can solve problems of imbalance in sample size of each category and difficulty of imbalanced classification. Focal loss improves the classification ability of the model by forcing the model to focus on categories with a small sample size. In order to examine whether focal loss improves the prediction for categories with small sample size, we selected 20 categories with the smallest sample size (from DDI type46 to DDI type65) on Task1 of Dataset1 for comparison, as shown in Fig. 4.
On categories with a small sample size, focal loss can boost the classification performance of the model. Among the 20 categories with a small sample size, the F1 score of the model with focal loss is higher than that of the model without focal loss on 19 categories. On DDI type 52, 63, and 64, the F1 score of the model without focal loss is 0, while the F1 score of the model with focal loss is 0.2222, 0.5, and 0.25, respectively. Among the 20 categories with a small sample size, the AUPR of the model with focal loss is higher than the AUPR of the model without focal loss on 16 categories. On DDI type 63, the AUPR of the model without focal loss is 0.0001, while the AUPR of the model with focal loss is 0.5334.
The prediction effect of label smoothing strategy
We verified the effectiveness of the label smoothing strategy on three tasks of Dataset1. The experimental results are shown in Table 2.
On all three tasks, the AUPR of the model using label smoothing is higher than that of the model which does not utilize label smoothing. The AUPR of the model using label smoothing on Task2 is 0.0242 higher than that without label smoothing. The AUPR of the model using label smoothing on Task3 is 0.0302 higher than that without label smoothing.
Comparison with stateoftheart DDI type prediction and baseline methods
We compared MDDISCL with other four stateoftheart DDI type prediction methods: DeepDDI [18], Lee et al.’s methods [19], DDIMDL [20] and MDFSADDI [21], and also several baseline classification methods: fully connected DNN, random forest (RF), knearest neighbor (KNN) and logistic regression (LR). The performance comparison of all prediction models on Dataset1 and Dataset2 is shown in Table 3 and Table 4, respectively.
We evaluated the performance of all prediction methods for Task1. Experimental results show that MDDISCL and MDFSADDI perform much better than other methods on Task1 on Dataset1. MDDISCL achieves the best AUPR 0.9782. On Dataset2, the performance of MDDISCL is better than other methods. The AUPR, F1 score and ACC of MDDISCL is 0.9862, 0.9321 and 0.9516, respectively. These evaluation scores of MDDISCL are higher than that of other methods.
We also compared the stateoftheart methods on Task2 and Task3 of the two datasets. Experimental results show that our method MDDISCL achieves better or comparable performance than the stateoftheart methods on some evaluation metrics. On Dataset1, the AUPR of MDDISCL is 0.6947 and 0.3938 on Task2 and Task3, respectively. The AUC of MDDISCL is 0.6767 and 0.4589 on Task2 and Task3, respectively. These evaluation scores of MDDISCL are higher than that of other methods. The F1 score of MDDISCL is slightly worse than the stateoftheart methods. It should be emphasized that we used the same hyperparameters on different tasks and different datasets. We did not optimize the hyperparameters of the model across all the datasets and tasks. The hyperparameters of the deep learning model may affect the performance of the model, so the experimental results presented here may not be the optimal performance of our model.
In general, our model achieves better or similar performance on Task1 of both datasets compared to the stateoftheart methods. Our model also achieves better or similar performance as the stateoftheart methods on Task2 and Task3 of Dataset1. Our model performs slightly worse than the stateoftheart models on Task2 and Task3 of Dataset2. This may be explained by the fact that the hyperparameters of our model are obtained on Dataset1. Inappropriate hyperparameters may affect the performance of the model.
Case studies
The evaluation metrics have proved the effectiveness of our model. We conducted case studies to further validate the effectiveness of MDDISCL in practice.
We used all the DDI type samples on Dataset1 originally obtained from DrugBank [17] to train the prediction model, and then predicted the drugdrug pairs that do not exist on Dataset1. We focused on the five most frequent DDI types and checked up the top 20 predictions related to each type. We used the interactions checker tool provided by https://go.drugbank.com/drugs to verify these predictions.
Among 100 samples, 43 DDI type samples were confirmed, which are shown in Additional file 1: Table S1. For example, the interaction between Donepezil and Armodafinil is predicted to cause the DDI type #0, which means that metabolism of Donepezil can be decreased when combined with Armodafinil.
Under the same experimental setup, 43 of the 100 DDI samples predicted by MDDISCL were confirmed, whereas 35 of the 100 DDI samples predicted by MDFSADDI were confirmed. This shows that MDDISCL is more effective than MDFSADDI in practice. In Additional file 1: Table S2, we list the other 57 drug pairs among the 100 DDI samples. These drug pairs may not be reported in the literature, but these DDIs are likely to occur when taken together, which may be helpful for pharmaceutical research.
Conclusions
We proposed a multitype DDI prediction model based on supervised contrastive learning and threelevel loss functions, and proved the effectiveness and robustness of our model. In addition, we also proved the prediction effect of supervised contrastive learning, focal loss and label smoothing strategy. Experimental results demonstrate that our proposed model achieves better or comparable performance than that of the stateoftheart models. The case studies were also performed to identify the new DDIs which are not included in the current datasets. Moreover, the effectiveness of our model is supported by case studies in practice.
Availability of data and materials
The source codes are available at https://github.com/ShenggengLin/MDDISCL. The datasets are available at https://github.com/ShenggengLin/MDFSADDI.
Abbreviations
 DDIs:

Drugdrug interactions
 ADEs:

Adverse drug events
 NLP:

Natural language processing
 DNN:

Deep neural network
 KGs:

Knowledge graphs
 GNNs:

Graph neural networks
 SCL:

Supervised contrastive learning
 MSE:

Mean squared error
 ATT:

Attention
 ACC:

Accuracy
 AUPR:

Area under the precisionrecallcurve
 AUC:

Area under the ROC curve
 LS:

Label smoothing
 RF:

Random forest
 KNN:

Knearest neighbor
 LR:

Logistic regression
References
Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H, Xiao G, Li Y, Allen J, Zhong R et al (2014) A community computational challenge to predict the activity of pairs of compounds. Nat Biotechnol 32(12):1213–1222
Zitnik M, Agrawal M, Leskovec J (2018) Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34(13):457–466
Vilar S, Harpaz R, Uriarte E, Santana L, Rabadan R, Friedman C (2012) Drugdrug interaction through molecular structure similarity analysis. J Am Med Inform Assn 19(6):1066–1074
Xiong G, Yang Z, Yi J, Wang N, Wang L, Zhu H, Wu C, Lu A, Chen X, Liu S et al (2022) DDInter: an online drugdrug interaction database towards improving clinical decisionmaking and patient safety. Nucleic Acids Res 50(D1):D1200–D1207
Su XR, Hu L, You ZH, Hu PW, Wang L, Zhao BW (2022) A deep learning method for repurposing antiviral drugs against new viruses via multiview nonnegative matrix factorization and its application to SARSCoV2. Brief Bioinform. 23(1):bbab526
Tatonetti NP, Fernald GH, Altman RB (2012) A novel signal detection algorithm for identifying hidden drugdrug interactions in adverse event reports. J Am Med Inform Assoc 19(1):79–85
Tatonetti NP, Ye PP, Daneshjou R, Altman RB (2012) Datadriven prediction of drug effects and interactions. Sci Transl Med. https://doi.org/10.1126/scitranslmed.3003377
Vilar S, Friedman C, Hripcsak G (2018) Detection of drugdrug interactions through data mining studies using clinical sources, scientific literature and social media. Brief Bioinform 19(5):863–877
Zhang Y, Zheng W, Lin H, Wang J, Yang Z, Dumontier M (2018) Drugdrug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths. Bioinformatics 34(5):828–835
Vilar S, Uriarte E, Santana L, Lorberbaum T, Hripcsak G, Friedman C, Tatonetti NP (2014) Similaritybased modeling in largescale prediction of drugdrug interactions. Nat Protoc 9(9):2147–2163
Cheng F, Zhao Z (2014) Machine learningbased prediction of drugdrug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties. J Am Med Inform Assoc 21(e2):e278286
Park K, Kim D, Ha S, Lee D (2015) Predicting pharmacodynamic drugdrug interactions through signaling propagation interference on proteinprotein interaction networks. Plos ONE 10(10):e0140816
Zhang P, Wang F, Hu J, Sorrentino R (2015) Label propagation prediction of drugdrug interactions based on clinical side effects. Sci Rep 5:12339
Lin X, Quan Z, Wang ZJ, Ma TF, Zeng XX. KGNN: Knowledge Graph Neural Network for DrugDrug Interaction Prediction. Proceedings of the TwentyNinth International Joint Conference on Artificial Intelligence 2020:2739–2745.
Dai YF, Guo CH, Guo WZ, Eickhoff C (2021) Drugdrug interaction prediction with Wasserstein Adversarial Autoencoderbased knowledge graph embeddings. Brief Bioinform. 22(4):bbaa256
Zhang XD, Wang G, Meng XY, Wang S, Zhang Y, RodriguezPaton A, Wang JM, Wang X (2022) Molormer: a lightweight selfattentionbased method focused on spatial structure of molecular graph for drugdrug interactions prediction. Brief Bioinform. 23(5):bbac296
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z et al (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46(D1):D1074–D1082
Ryu JY, Kim HU, Lee SY (2018) Deep learning improves prediction of drugdrug and drugfood interactions. Proc Natl Acad Sci USA 115(18):E4304–E4311
Lee G, Park C, Ahn J (2019) Novel deep learning model for more accurate prediction of drugdrug interaction effects. BMC Bioinformatics. 20(1):415
Deng YF, Xu XR, Qiu Y, Xia JB, Zhang W, Liu SC (2020) A multimodal deep learning framework for predicting drugdrug interaction events. Bioinformatics 36(15):4316–4322
Lin SG, Wang YJ, Zhang LF, Chu YY, Liu YT, Fang YT, Jiang MM, Wang QK, Zhao BW, Xiong Y et al (2022) MDFSADDI: predicting drugdrug interaction events based on multisource drug fusion, multisource feature fusion and transformer selfattention mechanism. Brief Bioinform. 23(1):bbab421
Deng Y, Qiu Y, Xu X, Liu S, Zhang Z, Zhu S, Zhang W (2022) METADDIE: predicting drugdrug interaction events with fewshot learning. Brief Bioinform. 23(1):bbab421
Liu Z, Wang XN, Yu H, Shi JY, Dong WM (2022) Predict multitype drugdrug interactions in cold start scenario. BMC Bioinformatics 23(1):75
Feng YH, Zhang SW, Zhang QQ, Zhang CH, Shi JY (2022) deepMDDI: a deep graph convolutional network framework for multilabel prediction of drugdrug interactions. Anal Biochem 646:114631
Yang ZD, Zhong WH, Lv QJ, Chen CYC (2022) Learning sizeadaptive molecular substructures for explainable drugdrug interaction prediction by substructureaware graph neural network. Chem Sci 13(29):8693–8703
Chen YJ, Ma TF, Yang XX, Wang JM, Song BS, Zeng XX (2021) MUFFIN: multiscale feature fusion for drugdrug interaction prediction. Bioinformatics 37(17):2651–2658
Yu Y, Huang KX, Zhang C, Glass LM, Sun JM, Xiao C (2021) SumGNN: multityped drug interaction prediction via efficient knowledge graph summarization. Bioinformatics 37(18):2988–2995
Su XR, Hu L, You ZH, Hu PW, Zhao BW (2022) Attentionbased knowledge graph representation learning for predicting drugdrug interactions. Brief Bioinform 23(3):bbac140
Hu L, Zhang J, Pan XY, Yan H, You ZH (2021) HiSCF: leveraging higherorder structures for clustering analysis in biological networks. Bioinformatics 37(4):542–550
Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F (2021) A survey on contrastive selfsupervised learning. Technologies. https://doi.org/10.3390/technologies9010002
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2021) Supervised contrastive learning. arXiv. https://doi.org/10.48550/arXiv.2004.11362
LopezMartin M, SanchezEsguevillas A, Arribas JI, Carro B (2022) Supervised contrastive learning over prototypelabel embeddings for network intrusion detection. Inform Fusion 79:200–228
Zheng L, Liu Z, Yang Y, Shen HB (2022) Accurate inference of gene regulatory interactions from spatial gene expression with deep contrastive learning. Bioinformatics 38(3):746–753
Liu X, Song C, Huang F, Fu H, Xiao W, Zhang W (2022) GraphCDR: a graph neural network method with contrastive learning for cancer drug response prediction. Brief Bioinform. 23(1):bbab457
Li Y, Qiao G, Gao X, Wang G (2022) Supervised graph cocontrastive learning for drugtarget interaction prediction. Bioinformatics 38(10):2847–2854
Hu H, Bindu JP, Laskin J (2021) Selfsupervised clustering of mass spectrometry imaging data using contrastive learning. Chem Sci 13(1):90–98
Wang YH, Min YS, Chen X, Wu J. Multiview Graph Contrastive Representation Learning for DrugDrug Interaction Prediction. Proceedings of the World Wide Web Conference 2021 (Www 2021) 2021:2921–2933.
Ciortan M, Defrance M (2021) Contrastive selfsupervised clustering of scRNAseq data. BMC Bioinformatics 22(1):280
Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V et al (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 42(Database issue):1091–1097
Chu YY, Zhang Y, Wang QK, Zhang LF, Wang XH, Wang YJ, Salahub DR, Xu Q, Wang JM, Jiang X et al (2022) A transformerbased model to predict peptideHLA class I binding and optimize mutated peptides for vaccine design. Nat Mach Intell. 4(3):300
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neur In. https://doi.org/10.48550/arXiv.1706.03762
Dai QY, Chu YY, Li ZQ, Zhao YS, Mao XY, Wang YJ, Xiong Y, Wei DQ (2021) MDACF: predicting MiRNAdisease associations based on a cascade forest model by fusing multisource information. Comput Biol Med. 136:104706
Rao JH, Zhou X, Lu YT, Zhao HY, Yang YD (2021) Imputing singlecell RNAseq data by combining graph convolution and autoencoder neural networks. iScience 24(5):102393
Liu S, Qi L, Qin HF, Shi JP, Jia JY. Path Aggregation Network for Instance Segmentation. 2018 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr) 2018:87598768
Singh B, Davis LS. An Analysis of Scale Invariance in Object Detection  SNIP. 2018 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr) 2018:35783587
Li YH, Chen YT, Wang NY, Zhang ZX. ScaleAware Trident Networks for Object Detection. Ieee I Conf Comp Vis 2019:6053–6062.
Song T, Zhang XD, Ding M, RodriguezPaton A, Wang SD, Wang G (2022) DeepFusion: a deep learning based multiscale feature fusion method for predicting drugtarget interactions. Methods 204:269–277
Tang Q, Nie FL, Kang JJ, Chen W (2021) mRNALocater: enhance the prediction accuracy of eukaryotic mRNA subcellular localization by using model fusion strategy. Mol Ther 29(8):2617–2623
He KM, Zhang XY, Ren SQ, Sun J. Deep Residual Learning for Image Recognition. Proc Cvpr Ieee 2016:770–778.
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. Pr Mach Learn Res. 119:1597
Guo DE, Xia Y, Luo XB, Feng JF. 2021. Remote Sensing Image Scene Classification Based on Supervised Contrastive Learning. Acta Photonica Sinic. 50(7).
Lin TY, Goyal P, Girshick R, He KM, Dollar P (2020) Focal loss for dense object detection. Ieee T Pattern Anal 42(2):318–327
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. Proc Cvpr Ieee 2016:2818–2826.
Zheng W, Zhang YX, Gong XH, Zhanghuali, Yu BY. DenseNet model with RAdam optimization algorithm for cancer image classification. 2021 Ieee International Conference on Consumer Electronics and Computer Engineering (Iccece) 2021:771775.
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 37:448
Acknowledgements
Not applicable.
Funding
This work is supported by grants from the National Science Foundation of China (Grant Nos. 62172274, 32070662, 61832019, 32030063), the Science and Technology Commission of Shanghai Municipality (Grant No. 19430750600), as well as Joint Research Fund for Medical and Engineering and Scientific Research at Shanghai Jiao Tong University (Grant Nos. YG2021ZD02, YG2019GD01, YG2019ZDA12).
Author information
Authors and Affiliations
Contributions
SL: conceptualization and design, data acquisition and analysis, methodology and writing—original draft. WC: validation. GC: investigation and visualization. SZ: writing—review editing and visualization. YX: writingreview, editing and project administration. YX and DQW: funding acquisition. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
: Table S1. Fortythree DDI samples have been confirmed among the 100 DDI samples predicted by MDDISCL. Table S2. Fiftyseven DDI samples that may not be reported in the literature among the 100 DDI samples predicted by MDDISCL.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Lin, S., Chen, W., Chen, G. et al. MDDISCL: predicting multitype drugdrug interactions via supervised contrastive learning. J Cheminform 14, 81 (2022). https://doi.org/10.1186/s13321022006598
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13321022006598
Keywords
 Drugdrug interaction
 Multitype classification
 Supervised contrastive learning
 Multiscale feature fusion
 Selfattention mechanism