 Research
 Open Access
 Published:
DFFNDDS: prediction of synergistic drug combinations with dual feature fusion networks
Journal of Cheminformatics volume 15, Article number: 33 (2023)
Abstract
Drug combination therapies are promising clinical treatments for curing patients. However, efficiently identifying valid drug combinations remains challenging because the number of available drugs has increased rapidly. In this study, we proposed a deep learning model called the Dual Feature Fusion Network for Drug–Drug Synergy prediction (DFFNDDS) that utilizes a finetuned pretrained language model and dual feature fusion mechanism to predict synergistic drug combinations. The dual feature fusion mechanism fuses the drug features and cell line features at the bitwise level and the vectorwise level. We demonstrated that DFFNDDS outperforms competitive methods and can serve as a reliable tool for identifying synergistic drug combinations.
Introduction
Drug therapy is the most commonly used method in clinical cancer treatments. To address clinical demands, the number of anticancer drugs has increased rapidly, and many efficient single drugs have been applied in cancer therapy. Although monotherapy has contributed greatly to developing disease treatments, it has some drawbacks due to the heterogeneity of drug responses, such as toxicity and drug resistance [1]. Drug combinations, which involves using two or more drugs to treat a specific disease, have been proposed as valid treatment approaches [2]. Combination methods allow different drugs to target various targets and pathways, thereby improving the treatment effects, reducing side effects and decreasing drug resistance [3, 4]. Therefore, drug combinations have been suggested as a potential strategy for addressing drawbacks such as heterogeneity.
Various methods for identifying valid drug combinations have been proposed. The traditional testing method involves clinical trials; however, only a small number of drugs are investigated through clinical trials, as they are timeconsuming, expensive, and might expose patients to unnecessary treatment [5]. Therefore, the highthroughput drug screening method [6] has been applied to screen effective drug combinations. Highthroughput drug screening method allows automated testing of chemical and biological compounds for specific biological targets and accelerates the identification of synergistic drug combinations. However, highthroughput drug screening methods have failed to reveal the action modes of drug molecules in vivo [7], it is impractical to screen all possible drug combinations for all possible indications. Therefore, several computational methods have been proposed to address the significant increase in the number of available drugs. These computational methods include systems biology methods [8], kinetic models [9] and machine learning methods [10]. Among them, machine learning methods have powerful modeling capabilities because machine learning approaches can learn potential drug features, allowing these models to effectively predict the synergistic effects of various drug combinations while reducing the costs of drug trials. Thus, machine learning has developed rapidly in this field.
Machine learning methods can be divided into two categories: classical machine learning and deep learning. The most commonly used classical machine learning methods are random forests [11], extreme gradient boosting [12] and support vector machines [13]. Li proposed a random forestbased drug combination synergy prediction model on the basis of drugtarget networks and druginduced gene expression profiles to predict synergistic anticancer combinations [14]. Sidorov et al. [15] proposed an XGBoostbased model. However, this approach trains a unique model for each cell line rather than a single model for all cell lines; thus, differences in the cell lines may reduce the rootmeansquare error by up to 50%, thereby decreasing the reliability of the model. Julkunen et al. [16] proposed comboFM, a drug combination prediction model that models cell contextspecific drug interactions through higherorder tensors and efficiently uses factorization machines to learn tensor latent factors. This model can predict the responses of new drug combinations and explore different drug combination doses.
While classical machine learning relies on handcrafted features, deep learning approaches can extract features from raw data without handcrafted feature extraction. Various neural networks have been proposed, including the convolutional neural network (CNN) [17], recurrent neural network (RNN) [18], and attention mechanism [19]. These neural networks have been successfully applied in computer vision [20] and natural language processing (NLP) tasks [21]. Furthermore, deep learning has gradually been applied in the field of drug prediction. Preuer et al. [5] utilized DeepSynergy, a deep learning method based on a feedforward network. This model uses molecular fingerprints and cell line gene expression. This approach was the first attempt to utilize deep learning in this domain, and this model achieved better performance than traditional machine learning methods. Yang et al. [22] proposed GraphSynergy, a new model for identifying synergistic combinations. This model adapts a spatialbased graph convolutional network to encode higherorder structural information of protein modules targeted by drug pairs and the protein modules associated with specific cancer cell lines in proteinprotein interaction (PPI) networks. Jiang et al. [23] proposed using a graph convolutional network to predict drug combinations in cancer cell lines. In 2021, the DeepDDS model [24] was proposed, which uses a graph neural network and attention mechanism to identify valid drug combinations. In this model, to obtain cell structures, RDKit is applied to convert simplified molecular input line entry specifications (SMILES) into molecular graphs, and the structures and gene expression patterns are integrated to identify synergistic combinations. However, some problems remain. In terms of feature extraction, these methods do not sufficiently investigate the SMILES information. Moreover, in terms of feature fusion, the abovementioned methods simply concatenate drug features and cell line features, and these fusion methods do not fully capture the interactions between these features.
Therefore, in this paper, we proposed the dual feature fusion network for drugdrug synergy prediction (DFFNDDS), a deep learning model for predicting the synergistic effects of drug combinations. The model inputs are the SMILES representations of the drugs, hashed atom pair fingerprints of the drugs, and cell line gene expression. The model output is the synergy score of the given drug combination. To address the above problems, we investigated the SMILES representations and used a finetuned BERT model to identify efficient drug features. To obtain the fusion features, we used a doubleview feature fusion mechanism to combine the drug and cell line features. Finally, we compared our method to recent deep learning prediction models, including MatchMaker [25], DeepSynergy [5], EPGCNDS [26], GCNBMP [27] and DeepDDS [24], on the benchmark datasets DrugComb [28] and DrugCombDB [29]. The experimental results indicated that DFFNDDS is an effective model for predicting the synergistic scores of drug combinations.
Methods and pipelines
Pipeline
Figure 1 illustrates the endtoend learning framework for predicting drug combinations. Our framework has 4 modules, including the SMILES encoder, dimensional alignment module, dual fusion module, and predictor module. For each pairwise drug combination, the input layer receives the SMILES string representations, hashed atom pair fingerprints of the two drugs, and cancer cell lines addressed by the drugs. Then, the SMILES string is encoded by a finetuned BERT model that converts the features into vectors. Moreover, the gene expression in the cell lines, output of the SMILES encoder and hashed atom pair fingerprints are input into the dimensional alignment module, which maps the inputs to the same dimension. To fuse the features, we utilize two networks (multihead attention mechanism and highway network) to extract and combine the input features in the dual fusion block. Finally, the outputs of the two networks are concatenated to obtain the final feature representation, which is propagated through the linear layer. The output of the linear layer is the predicted synergy score, which is used to determine whether the drug combination is synergistic or antagonistic.
Drug encoding based on SimCSE
In recent years, pretrained models have thoroughly changed various artificial intelligence domains, including NLP. BERT (bidirectional encoder representations from transformers) [30] is one of the most famous NLP models. BERT includes 12 transformer encoders and uses a masked language model to predict randomly masked words in a sequence. BERT can learn both left and right context with the addition of an attention mechanism. Moreover, BERT has achieved stateoftheart performance on eleven NLP tasks. Inspired by the great NLP performance, many chemical language models have been proposed in the field of drug discovery to predict drug molecule characteristics and proteinprotein interactions (PPIs). For instance, ChemBERT [31], DeepChem [32], and SciBERT [33] were developed to apply deep learning in drug discovery. BERT models use 3 common methods to generate the embedding of the input sentence: cls pooling, max pooling and mean pooling. These three methods cannot completely extract textual information [34]. Thus, to enhance the encoding quality, we use simple contrastive learning of sentence embeddings (SimCSE) [35] to finetune the original BERT model. The SimCSE framework uses contrastive learning objectives to finetune the BERT model and has achieved competitive results on NLP tasks. This finetuned model takes SMILES to predict itself in a contrastive objective, using only standard dropout as noise, we apply this method to generate improved drug characterizations.
Let \(s_i\) denote a SMILES string. This SMILES string is input into two BERT models, yielding two different output vectors \(h_i^z\) and \(h_i^{z'}\) with different dropout masks. The two embeddings of the same SMILES string are treated as positive pairs, and other embeddings are selected as negative samples. The training objective for \(h_i^z\) and \(h_i^{z'}\) for the minibatch number N of pairs is:
The finetuned BERT model evaluates encodings of SMILES strings more effectively than the original BERT model; given a drug pair, the embeddings of corresponding SMILES after the finetuned BERT encoding can be expressed as (\(x_i\), \(x_j\)), where \(x_i \in {\mathbb {R}}^D\) and \(x_j \in {\mathbb {R}}^D\).
Dimensional alignment
The model input includes hashed atom pair molecular fingerprints of drugs, SMILES string encodings and cell line gene expressions. The hashed atom pair molecular fingerprint is a molecular representation that transforms molecules into series of bit strings. However, the various inputs have different dimensions, with some inputs having high dimensions. To reduce the calculation costs and ensure that all inputs have the same dimensions, we project the hashed atom pair fingerprints of a drug pair \(f_A^{i}, f_B^{i}\), the gene expression of the cell line z, and the SMILES string encodings \(x_A^{i}\) and \(x_B^{i}\) to the same dimension. Given g(\(\cdot \)) as a projection equation, the output can be computed as:
In the equation, W is the weight, and b is the bias. On the basis of the above equation, the inputs can be projected as follows: \(f_A^{i'}\) and \(f_B^{i'}\) for the fingerprints, \(z^{'}\) for the cell lines, and \(x_A^{i'}\) and \(x_B^{i'}\) for SMILES encodings.
Dual fusion
Most prior models concatenated only the drug features and cell line information as input into the multilayer fully connected network; however, this approach does not capture the potential information within the concatenated features. To generate more informative representations, in the feature fusion block, we propose a doubleview feature fusion mechanism that reweights the input feature representations at the bit and vector levels simultaneously. Given \(f_A^{i'}\) and \(f_B^{i'} \) as the fingerprint representations of the drug pairs, \(x_A^{i'}\) and \(h_B^{i'}\) as the SMILES features of the drug pairs, and \( z^{'} \) as the gene expression of the cell line, the input to the fusion mechanism is:
Multihead attention mechanism
Figure 2 shows the architecture of the multihead attention mechanism. The attention module is utilized to capture interactions between features at the vector level. The important operation of the multihead attention mechanism is the function Attention(Q, K, V), which takes three feature matrices (\(Q \in R^{l_q \times d_k}\), \(K \in R^{l_k \times d_k}\), and \(V \in R^{l_v \times d_v}\)) as inputs, where \(l_q\), \(l_k\) and \(l_v\) are the dimensions of the input length and \(d_k\) and \(d_v\) indicate the transformed dimensions. Let \(l_i\) be the input to the multihead attention mechanism. Then, the output matrix can be obtained as follows:
where \(W^K\), \(W^V\), and \(W^Q\) are weight matrices. \(W^K\), \(W^V\), and \(W^Q\) are 2dimensional matrices, and the 2 dimensions are the embedded size. The multihead attention mechanism contains h heads, where the ith head can be computed as:
Although previous experiments indicate that the expressiveness of a network increases with increasing network depth, it is wrong to interpret this result as the deeper the network is, the better the result [36]. As the number of network layers increases, the error increases. To alleviate the difficulty of training deep networks and reduce the training error, a residual block was added to the attention network. In the equation, \(W^R\) is the parameter. \(W^R\) is a 2dimensional matrix, where the dimensions represent the embedded size. After the residual learning block, the ReLU activation function is performed, which can be computed as:
The output of the attention module is \(m_{vec}\).
Highway network
In traditional deep learning, highway networks allow unimpeded information to flow across several layers. As the number of network layers increases, the network becomes more difficult to optimize. Highway networks have been used to partially address this optimization problem and prevent vanishing gradients. In the proposed model, the highway network learns feature information at the bitwise level. The input to the highway network module is \(l_i\). The highway layer can be formulated as follows:
In the above formula, \(t(l_i)\) denotes a nonlinear transformation, which is the ReLU function in our experiments; \(g=\sigma (l_i)\) is a sigmoid gate; \(q_i=linear(l_i) \) is a linear transformation; \((1g)\) is the carry gate; and \(m_{bit} \) is the output of the highway network. Figure 3 shows the components of the highway network.
Predicting the synergistic effect
The fusion features (which include the vector and bit levels) can be represented as:
The output of the model is the synergistic prediction score of the drug pair, which can be calculated as:
Let \({\hat{y}}\) represent the synergistic prediction score of a drug pair, y is the real score. Then, the crossentropy loss is adopted as the loss function to train the model, which is defined as:
Given a sample, each input \(l_i\) is passed through the network twice, resulting in two different output predictions, \({\hat{y}}_1^{i}\) and \({\hat{y}}_2^{i} \). Since the dropout mechanism randomly discards some neurons during each pass, \({\hat{y}}_1^{i}\) and \({\hat{y}}_2^{i}\) represent different prediction probabilities generated by two distinct subnets. Regularized dropout (Rdrop) is applied to regularize the output predictions by minimizing the Kullback–Leibler (KL) divergence between two output distributions, which can be calculated as follows:
Moreover, the predictions \({\hat{y}}_1^{i}\) and \({\hat{y}}_2^{i}\) are both considered in the cross entropy loss by averaging their sum:
The final loss is calculated as:
In the above equation, \(\alpha \) is the parameter.
Results
To evaluate the experimental performance of our model, we compared our model with several competitive deep learning methods, including DeepDDS [24], EPGCNDS [26], GCNBMP [27], DeepSynergy [5], MatchMaker [25] and MRGNN [37]. To clarify the differences between our model and the above deep learningbased methods, we summarize the comparison methods below.

DeepSynergy: DeepSynergy uses molecular chemistry and cell line genomic information as input and a deep neural network (DNN) to simulate drug synergy and predict the synergy score.

MRGNN: MRGNN uses a multiresolutionbased architecture to extract node features from neighborhoods of graph nodes, applies dual graphstate long shortterm memory (LSTM) networks to summarize the local features of each graph, extracts interactions between pairwise graphs, and combines the results to predict the synergy score.

GCNBMP: GCNBMP uses a Siamese GCN architecture to transform irregularly structured molecular data into realvalued embedding vectors, which are then input into an interaction predictor based on the HOLEstyle neural network to predict interactions between the input drug pairs.

EPGCNDS: EPGCNDS uses twin GCN branches to learn atomlevel features. The drug is indicated as the sum of all atom features. The interaction decoder outputs the possibility of two drugs interacting with one another.

DeepDDS: DeepDDS uses a graph neural network and attention mechanism to identify drug combinations, and its inputs are drug molecule structures and gene expression levels.

MatchMaker: MatchMaker trains two parallel subnetworks to learn specific representations: the first subnetwork is for the drug structures, and the second is for the gene expression of the cell lines. The joint representation is then input into a third subnetwork to predict drug pair synergy.
The DeepSynergy, MRGNN and MatchMaker models predict continuous synergy scores. To compare with other methods, we converted the models into a classifier by transforming the last layer into a sigmoid function and changing the MSE Loss to CrossEntropyLoss. The hyperparameter settings of the compared methods were taken from Additional file 1: Table S9.
Dataset summary
We evaluated the performance of the models on two datasets, DrugComb [28] and DrugCombDB [29]. DrugComb is a networkbased dataset that was released in 2019 and updated in March 2021 [38]. DrugComb provides experimental data on 739,964 drug combinations for 4268 drugs tested in 288 cell lines. DrugCombDB is a drug combination dataset that was released in 2019. DrugCombDB includes 498,865 drug combinations of 5350 drugs tested in 104 cell lines. The gene expression profiles were downloaded from the Cancer Cell Line Encyclopedia (CCLE) database [39], which contains the expression profiles of 1035 cell lines, covering 72 cell lines in DrugCombDB and 98 cell lines in DrugComb. After adding the gene expression profiles, the DrugCombDB dataset includes 106,709 combined experiments of 1084 drugs, and the DrugComb dataset includes 292,005 combined experiments of 3038 drugs. Figure 4 shows the number of drug occurrences in the DrugCombDB and DrugComb datasets. The figure shows that the 2 datasets are imbalanced; 11% of the drugs in DrugComb appear more than 300 times, and 7% of the drugs in DrugCombDB appear more than 200 times.
Evaluation metrics
Nine metrics are used to measure the performance, including the accuracy (ACC), area under the receiver operator characteristics curve (ROCAUC), balanced accuracy score (BACC), Matthews corrcoef (MCC), F1 score, recall (Rec), average precision (AP), precision (Prec) and kappa coefficient. These evaluation metrics are calculated as follows:
In the equations, TP denotes true positives, TN denotes true negatives, FP denotes false positives, and FN denotes false negatives. The balanced accuracy score is used to handle imbalanced datasets and is defined as the average recall score obtained in each class. TPR represents the recall score, and TNR is the recognition rate (coverage rate) of the model for negative samples. The MCC is mainly used to evaluate binary classification problems and is a relatively balanced metric. Kappa is a consistency measure; in this case, consistency indicates whether the model prediction results are consistent with the actual classification results. \(P_o\) denotes the accuracy, assuming that the number of real samples in each class is \(a_{1}, a_{2},..., a_{c}\), the number of predicted samples in each class is \(b_{1}, b_{2},..., b_{c}\), and the total number of samples is n. \(P_e\) is calculated as
Experimental settings
First, we conducted a 5fold crossvalidation to evaluate the predictive power of DFFNDDS. The training samples are randomly divided into five subsets of approximately equal size; every four subsets are treated as training datasets, while the one left is used as the test set. The average prediction accuracy over the 5fold crossvalidation is used as the final performance measure. Under the random splitting setting, the ratio of synergistic/antagonistic pairs in 5 crossvalidations is the same. In the DrugcombDB dataset, the ratio of synergistic/antagonistic pairs is 0.4, in the Drugcomb dataset, the ratio is 1.5.
To verify the prediction performance of DFFNDDS, we used leaveoneout crossvalidation. First, leaveonedrugcombinationout crossvalidation is used to evaluate the performance of predicting unlearned drug combinations. This method iteratively excludes drug pairs from the training set and uses the remaining drug combinations as the test set.
However, drug combinations alone cannot exclude single drugs from the training set, and the same drug may be used in both the training and testing sets. Thus, the next division method is to leave one drug out to verify the ability of the model to learn features of unseen drugs based on the chemical structures of known drugs.
In addition, leaveonecelllineout experiments are implemented to verify the performance of DFFNDDS. We excluded all cell lines in the training set and used the excluded data as the test set to ensure that the model did not know the gene expression of the excluded cell lines. This method is applied to assess the ability of the model to predict drug synergy scores in unknown environments. The ratios of synergistic/antagonistic in crossvalidation under different leaveoneout experiments in two datasets are discussed in Additional file 1: Tables S1–S8. In the different splitting settings, despite the influence of uneven drug distribution, the ratio of synergistic/antagonistic is similar in different splitting settings.
Performance evaluation
We binarized the predictive probability with a threshold of 0.5. Tables 1 and 2 summarize the performance measures of DFFNDDS and the comparison methods on the different datasets. Table 1 shows that our method demonstrated the best overall performance. In terms of the ACC score, DFFNDDS achieved a value of 0.871, demonstrating higher accuracy than all other methods. In terms of the Prec, Rec, and F1 scores, DFFNDDS achieved the best scores on the DrugCombDB dataset, with values of 0.801, 0.746, and 0.773, respectively. The results show that DFFNDDS clearly recognized synergistic drug combinations. To prevent the imbalanced datasets from impacting the model evaluation results, we used the BACC, MCC and Kappa metrics. The table shows that our proposed method achieved BACC, MCC and Kappa scores of 0.834, 0.684, and 0.683, respectively. To comprehensively evaluate the method, the AUC and AP metrics were used. DFFNDDS achieved ROCAUC and AP values of 0.921 and 0.859, respectively. Thus, the 9 performance metrics show various aspects of the model performance.
Table 2 shows that the models perform worse on the DrugComb dataset than on the DrugCombDB dataset; however, our method still achieved better performance than the other approaches on 8 of the 9 metrics. Table 2 shows that DFFNDDS achieved ACC, Prec, Rec, and F1 scores of 0.768, 0.788, 0.840, and 0.813, respectively. DFFNDDS exhibited slightly worse performance than GCNBMP, DeepDDS, EPGCNDS and MatchMaker in terms of the Recall score. However, we consider the F1 score, which is a weighted average of the precision and recall that reflects the robustness of a model. DFFNDDS achieved a better F1 score than the comparison methods. In terms of the ability of DFFNDDS to handle imbalanced datasets, our proposed model showed competitive performance, with BACC, MCC and Kappa scores of 0.749, 0.509, and 0.507, respectively. Moreover, DFFNDDS demonstrated the best performance in terms of the ROCAUC and AP metrics, with values of 0.846 and 0.890, respectively. Furthermore, in general, DFFNDDS has lower standard deviations than the other methods on the considered performance metrics. Therefore, the fivefold crossvalidation results show the competitiveness of our proposed method.
Regarding the leaveoneout crossvalidation results, in the leaveonedrugpairsout experiments on two datasets, Tables 7 and 8 showed the results on two datasets. DFFNDDS achieved the best scores on all performance metrics on the Drugcomb dataset, the model performed the best in 4 metrics and maintained the top 3 performance compared to baselines in the other 5 metrics. For the leaveonecelllineout experiments on the DrugComb dataset, Tables 4 and 6 display the performance. In the DrugcomDB, DFFNDDS achieved the best scores of 7 in 9 metrics, especially in ROCAUC and MCC metrics, DFFNDDS outperformed other methods by 17%. In the Drugcomb dataset, DFFNDDS performed the second best in the metrics, which is only a little inferior to MRGNN. For the leaveonedrugout experiments, our model did not achieve stateoftheart performance in terms of the Recall metric on the DrugCombDB or DrugComb datasets; however, our model has superior results on at least 5 metrics, as shown in Tables 3 and 5. The reason for these results might be that our model classifies more synergistic drug combinations as antagonist drug combinations. These results which used every single observation in the dataset prove the robustness of the DFFNDDS model, it maintained the top3 performance under all the leaveoneout splitting settings. From these results of Tables, we are concerned that our proposed method DFFNDDS has competitive performance compared to baselines.
Ablation analysis
We performed ablation analyses to investigate whether the inclusion of the attention mechanism, highway network, finetuned BERT model, and inputs improve the predictive performance of the model. To demonstrate the importance of each model component, we conducted ablation analyses by removing some model components. Specifically, we compared the DFFNDDS results of: (i) DFFNDDS without the attention mechanism, (ii) DFFNDDS without the highway network, (iii) DFFNDDS without SMILES string inputs, (iv) DFFNDDS without fingerprint inputs, and (v) DFFNDDS without the finetuned BERT. The comparison was performed based on 5fold crossvalidation tests on the training dataset. These results on the DrugCombDB dataset are summarized in Table 9.
The results revealed that the complete DFFNDDS framework achieves the best predictive performance on 8 of the 9 evaluation metrics. In contrast, DFFNDDS without fingerprints displayed the worst performance. The results demonstrated that fingerprint inputs and the highway network play important roles in ensuring highquality drug synergy predictions. This may be because fingerprints contain considerable chemical information about drugs. The highway network contributes more to learning drug features than the attention mechanism. The attention mechanism might not capture as much SMILES information as expected. In terms of model design, the ablation experiments indicated that combining fingerprint inputs and SMILES strings is effective. The DFFNDDS models without the attention mechanism and highway network performed worse than DFFNDDS, which indicates that the attention mechanism and highway network enhance the performance of DFFNDDS, possibly due to the complementarity of the features extracted by different feature extractors. Moreover, the results of the DeepChem encoding framework confirmed that the finetuned BERT model is indispensable.
Meanwhile, We also provided the results of DFFNDDS without Rdrop loss. To explore the effect of the Rdrop loss, we applied the Rdrop on compared models, these results are discussed in Additional file 1: Table S10. Additional file 1: Table S10 shows that Rdrop doesn’t enhance all the performance of the models, so we concluded that the real novelty that gives the performance improvement is the framework of the model.
Discussion
From the results, though our model performed significantly better than other methods, the performance in 9 metrics reflected that our model is still limited. The performance might be due to the features of drugs and information of cell lines haven’t been researched and dug thoroughly in the model. Another contributing factor may be the network, we suspect the network we chose doesn’t fit the prediction of drug combinations entirely. We believe that the model can be enhanced by feeding into more effective representations of drugs and information about cell lines, the more appropriate networks are considered in the enhancement, too. On the other hand, the results in leaveoneout crossvalidation concern that our model has poor performance in generalization ability. But in reality, the leaveoneout crossvalidation is more commonly used as we need to identify unfamiliar drug combinations inevitably. To solve the problem, we recommend trying transfer learning and other advanced machine learning to enhance the performance in leaveoneout crossvalidation.
Conclusions
In this paper, we proposed DFFNDDS, a novel model for predicting the synergy scores of drug combinations. In the model, the cell line information is represented by gene expression, and the drugs are represented by SMILES strings and fingerprints. we presented SMILES strings pretraining with finetuned BERT model and fused all the features not only at the bitwise level but also at the vectorwise level. Compared to other competitive methods, DFFNDDS achieved stateoftheart performance on the DrugComb and DrugCombDB datasets. Moreover, DFFNDDS outperformed other methods in terms of most evaluation metrics in strict leaveoneout crossvalidation experiments. Overall, our method provides a new tool for identifying synergistic drug combinations.
Availability of data and materials
Our datasets and code are publicly available at GITHUB via https://github.com/sorachel/DFFNDDS.
References
Brunner HR, Menard J, Waeber B, Burnier M, Biollaz J, Nussberger J, Bellet M (1990) Treating the individual hypertensive patient: considerations on dose, sequential monotherapy and drug combinations. J Hypertens 8(1):3–11
Csermely P, Korcsmáros T, Kiss HJ, London G, Nussinov R (2013) Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Therapeut 138(3):333–408
Huang Y, Jiang D, Sui M, Wang X, Fan W (2017) Fulvestrant reverses doxorubicin resistance in multidrugresistant breast cell lines independent of estrogen receptor expression. Oncol Rep 37(2):705–712
Kruijtzer C, Beijnen J, Rosing H, ten Bokkel Huinink W, Schot M, Jewell R, Paul E, Schellens J (2002) Increased oral bioavailability of topotecan in combination with the breast cancer resistance protein and pglycoprotein inhibitor gf120918. J Clin Oncol 20(13):2943–2950
Preuer K, Lewis RP, Hochreiter S, Bender A, Bulusu KC, Klambauer G (2018) Deepsynergy: predicting anticancer drug synergy with deep learning. Bioinformatics 34(9):1538–1546
Lehár J, Krueger AS, Avery W, Heilbut AM, Johansen LM, Price ER, Rickles RJ, Short Iii GF, Staunton JE, Jin X et al (2009) Synergistic drug combinations tend to improve therapeutically relevant selectivity. Nat Biotechnol 27(7):659–666
Ferreira D, Adega F, Chaves R (2013) The importance of cancer cell lines as in vitro models in cancer methylome analysis and anticancer drugs testing. In: LopezCamarillo C, ArechagaOcampo E (eds) Oncogenomics and cancer proteomicsnovel approaches in biomarkers discovery and therapeutic targets in cancer. InTech, London
Feala JD, Cortes J, Duxbury PM, Piermarocchi C, McCulloch AD, Paternostro G (2010) Systems approaches and algorithms for discovery of combinatorial therapies. Wiley Interdiscip Rev Syst Biol Med 2(2):181–193
Sun X, Bao J, You Z, Chen X, Cui J (2016) Modeling of signaling crosstalkmediated drug resistance and its implications on drug combination. Oncotarget 7(39):63995
Madani Tonekaboni SA, Soltan Ghoraie L, Manem VSK, HaibeKains B (2018) Predictive approaches for drug combination discovery in cancer. Brief Bioinform 19(2):263–276
Breiman L (2001) Machine learning. Random For 45(1):5–32
Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, Chen K, et al. (2015) Xgboost: extreme gradient boosting. R package version 0.42 1(4):1–4
Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567
Li H, Li T, Quang D, Guan Y (2018) Network propagation predicts drug synergy in cancerspredict drug synergy with network propagation. Can Res 78(18):5446–5457
Sidorov P, Naulaerts S, ArieyBonnet J, Pasquier E, Ballester PJ (2019) Predicting synergism of cancer drug combinations using NCIalmanac data. Front Chem 7:509
Julkunen H, Cichonska A, Gautam P, Szedmak S, Douat J, Pahikkala T, Aittokallio T, Rousu J (2020) Leveraging multiway interactions for systematic prediction of preclinical drug combination effects. Nat Commun 11(1):1–11
O’Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv. arXiv:1511.08458
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv. arXiv:1409.2329
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A.N, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30
Shapiro LG, Stockman GC et al (2001) Computer vision. Prentice Hall, New Jersey
Chowdhary K (2020) Natural language processing. Fundamentals of artificial intelligence. Springer, Berlin, pp 603–649
Yang J, Xu Z, Wu WKK, Chu Q, Zhang Q (2021) Graphsynergy: a networkinspired deep learning model for anticancer drug combination prediction. J Am Med Inform Assoc 28(11):2336–2345
Jiang P, Huang S, Fu Z, Sun Z, Lakowski TM, Hu P (2020) Deep graph embedding for prioritizing synergistic anticancer drug combinations. Comput Struct Biotechnol J 18:427–438
Wang J, Liu X, Shen S, Deng L, Liu H (2022) Deepdds: deep graph neural network with attention mechanism to predict synergistic drug combinations. Brief Bioinform 23(1):390
Kuru HI, Tastan O, Cicek E (2021) Matchmaker: a deep learning framework for drug synergy prediction. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2021.3086702
Sun M, Wang F, Elemento O, Zhou J (2020) Structurebased drugdrug interaction detection via expressive graph convolutional networks and deep sets (student abstract). In: proceedings of the AAAI conference on artificial intelligence, vol 34, pp. 13927–13928
Chen X, Liu X, Wu J (2020) Gcnbmp: investigating graph representation learning for ddi prediction task. Methods 179:47–54
Zagidullin B, Aldahdooh J, Zheng S, Wang W, Wang Y, Saad J, Malyutina A, Jafari M, Tanoli Z, Pessia A et al (2019) Drugcomb: an integrative cancer drug combination data portal. Nucleic Acids Res 47(W1):43–51
Liu H, Zhang W, Zou B, Wang J, Deng Y, Deng L (2020) Drugcombdb: a comprehensive database of drug combinations toward the discovery of combinatorial therapy. Nucleic Acids Res 48(D1):871–881
Devlin J, Chang M.W, Lee K, Toutanova K (2018) Bert: pretraining of deep bidirectional transformers for language understanding. arXiv. arXiv:1810.04805
Chithrananda S, Grand G, Ramsundar B (2020) Chemberta: largescale selfsupervised pretraining for molecular property prediction. arXiv. arXiv:2010.09885
Ramsundar B, Eastman P, Walters P, Pande V (2019) Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. O’Reilly Media, Sebastopol
Beltagy I, Lo K, Cohan A (2019) Scibert: A pretrained language model for scientific text. arXiv. arXiv:1903.10676
Ma X, Wang Z, Ng P, Nallapati R, Xiang B (2019) Universal text representation from bert: an empirical study. arXiv. arXiv:1910.07973
Gao T, Yao X, Chen D (2021) Simcse: Simple contrastive learning of sentence embeddings. arXiv. arXiv:2104.08821
Boroumand M, Chen M, Fridrich J (2018) Deep residual network for steganalysis of digital images. IEEE Trans Inf Forensics Secur 14(5):1181–1193
Xu N, Wang P, Chen L, Tao J, Zhao J (2019) Mrgnn: Multiresolution and dual graph neural network for predicting structured entity interactions. arXiv. arXiv:1905.09558
Zheng S, Aldahdooh J, Shadbahr T, Wang Y, Aldahdooh D, Bao J, Wang W, Tang J (2021) Drugcomb update: a more comprehensive drug sensitivity data repository and analysis portal. Nucleic Acids Res 49(W1):174–184
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin A, Kim S, Wilson C, Lehar J, Kryukov G, Murray L et al (2012) The cancer cell line encyclopediausing preclinical models to predict anticancer drug sensitivity. Eur J Cancer 48:5–6
Acknowledgements
Our deepest gratitude goes to the anonymous reviewers for their careful work and thoughtful suggestions that will improve this paper substantially.
Funding
This research was funded by National Natural Science Foundation of China (NSFC, Grant No. 62271174 and 62102191), Jiangsu Province Graduate Research and Innovation Program(Grant no. JX12413925), 2022 Nanjing Life and Health Science and Technology Special Project (Grant no. 202205053) Cooperative Research and Transformation of Diabetes Active Intelligent Health Management Platform, the industry prospecting and common key technology key projects of Jiangsu Province Science and Technology Department (Grant no. BE2020721), the Industrial and Information Industry Transformation and Upgrading Special Fund of Jiangsu Province in 2021 (Grant no. [2021]92)), the Key Project of Smart Jiangsu in 2020 (Grant no. [2021]1), Jiangsu Province Engineering Research Center of Big Data Application in Chronic Disease and Intelligent Health Service (Grant no. (020)1460).
Author information
Authors and Affiliations
Contributions
MX contributed to the conception, design, preparation of the figures and writing the manuscript. XZ participated in revising of the manuscript. JW and NW organized the database. WF contributed to the statistical analysis and interpretation. CW contributed to the interpretation and revising of the manuscript. JW contributed to the conception of the study, statistical analysis and revising of the manuscript. YL and LZ supervised the research activity planning and execution. All authors contributed to manuscript revision, read. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
Additional tables for DFFNDDS.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Xu, M., Zhao, X., Wang, J. et al. DFFNDDS: prediction of synergistic drug combinations with dual feature fusion networks. J Cheminform 15, 33 (2023). https://doi.org/10.1186/s13321023006903
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13321023006903
Keywords
 Drug combination
 Synergistic effect
 Deep learning
 Dualfeature fusion