Skip to main content
  • Research article
  • Open access
  • Published:

When drug discovery meets web search: Learning to Rank for ligand-based virtual screening



The rapid increase in the emergence of novel chemical substances presents a substantial demands for more sophisticated computational methodologies for drug discovery. In this study, the idea of Learning to Rank in web search was presented in drug virtual screening, which has the following unique capabilities of 1). Applicable of identifying compounds on novel targets when there is not enough training data available for these targets, and 2). Integration of heterogeneous data when compound affinities are measured in different platforms.


A standard pipeline was designed to carry out Learning to Rank in virtual screening. Six Learning to Rank algorithms were investigated based on two public datasets collected from Binding Database and the newly-published Community Structure-Activity Resource benchmark dataset. The results have demonstrated that Learning to rank is an efficient computational strategy for drug virtual screening, particularly due to its novel use in cross-target virtual screening and heterogeneous data integration.


To the best of our knowledge, we have introduced here the first application of Learning to Rank in virtual screening. The experiment workflow and algorithm assessment designed in this study will provide a standard protocol for other similar studies. All the datasets as well as the implementations of Learning to Rank algorithms are available at

The analogy between web search and ligand-based drug discovery


The cost of developing a new drug today is estimated to be over several billions with around more than ten years’ efforts. While a large portion of this cost results from the failed molecules, where the candidate chemical compounds are proven to be unsuitable for further development in preclinical and clinical testing [1]. With millions chemical structures available in the public library (Figure 1), more sophisticated and accurate computational screening approaches are highly demanded. Particularly, computational methods that “rank” chemical structures based on their likelihood of clinical success are useful for large-scale compounds screening. Such technologies, often termed as Virtual Screening (VS) [2,3] are used to focus on a small set of highly promising candidates for further experimental testing, leading to potentially huge time and cost savings.

Figure 1
figure 1

Amount of CAS registry records of chemical substance.

Generally, the task of ligand-based VS is to output a ranking list of a set of molecules in terms of their binding affinities for a given drug target, so that the top-k molecules can be further examined through in-vivo or in-vitro test. The most basic technique utilized in VS is similarity search, which can be performed by firstly setting the target compound and then calculate the similarity between each compound and the target one. For this step, many different strategies of similarity measurements have been developed, including Cosine Coefficient, Euclidean Distance, Soergel Distance, Dice coefficient and Tanimoto coefficient [4]. Based on the similarity scores, the candidate compounds will be ranked and the top-k compounds can be selected for further investigation. Specially, VS can also be formulated as to learn a function f : Structure → Activity (R d → R) based on a set of training compounds with known affinities for the target. The learned function can be used to predict the label (compound affinity) for any given molecules according to their structural features. Traditionally, this function can be learned as a regression or classification form, similar to the procedure of Quantitatively Structure Activity Relationship (QSAR) study [5].

Recently, a new emerging computational strategy called Learning to Rank (LOR) [6,7] that was firstly utilized in information retrieval field especially for the web search, has gained much attention. Web search and VS can be treated as a similar problem, seeking an analogous result where higher candidates (webs or compounds) should have higher relevance to the underlying target (query or protein). Taking this fundamental similarities into consideration, LOR should be a promising technique for solving VS problem; however, very few studies were performed in this area.

The basic idea of LOR is to “learn” a rank function instead of traditional regression or classification function to predict the activity of candidate compounds for the query target. We see from limited literatures where LOR has been slightly touched on in drug discovery research. For examples, in 2009, Anne Mai Wassermann et al. utilized a Support Vector Machine (SVM)-based ranking method to distinguish compounds [8]; in 2010, Shivani Agarwal et al. introduced a bipartite ranking method on a relative small set of drug affinity data [1]; in the following year, Fabian Rathke et al. presented StructRank [9] which has shown competitive performance with traditional VS methods. Other applications of ranking in drug discovery include drug target fishing [10], drug descriptors selection [11] and chemical entity order analysis [12,13] etc. Although these works applied ranking techniques in VS, there is no systematic and benchmark study established for LOR in drug discovery so far, and the current methods were not generalized to cross-target screening. Basically, the goal of VS approaches is to learn a general ranking function which could be used for cross-target compound screening. It should focus on molecules with high binding affinities to the target while the predictive accuracy for the exact affinity labels is only of secondary interest [9]. Noted that traditional regression or classification model can also predict the different levels of the molecules of interest, this may not capture the intrinsic ranking order of the molecules [9]. As an illustration, for the traditional classification-based QSAR models, they are trained based on a set of molecules with known classification labels for a given target. It is clearly that the learned models only categorize the molecule activity into different known groups rather than ranking the molecule individually. For the traditional regression-based QSAR models, they are generally trained to minimize the squared error-based loss function for a given group of molecules, while equal models in terms of their mean squared error could give rise to completely different ranking results [9]. Therefore, the question arises whether the detour via regression or classification is necessary and whether the task can be addressed in a more straightforward way to directly derive the ranking function in VS (Figure 2) [9]. Given the aforementioned consideration, we proposed in this study the novel LOR model through learning a ranking function that focuses on the ranking relationship among all compounds rather than the exact activity or classification of each individual compound, which is inherently suitable in the identification of top-k-ranked compounds in VS.

Figure 2
figure 2

Different computational schemas in virtual screening.

Compared with traditional statistical learning based VS methods, Learning to Rank has the following two unique capabilities of (1). Applicable of extension to screen compounds on novel targets when there is not enough data available for these targets, and (2). Integration of heterogeneous data when compound affinities are measured in different platforms. Here, we have developed an integrated framework, which includes (1) a standard pipeline for LOR analysis in virtual drug screening, (2) comprehensive performance assessment for different LOR algorithms, and (3) public available testing benchmark data. In particular, the experimental workflow and algorithm assessments designed in this study will provide a standard protocol for other similar studies in drug discovery.

Results and discussion

Results of different testing strategies

LOR in VS aims to create a ranking function which could return the input compounds with a relevance descending order. In this study, six specific LOR models were comprehensively tested and compared for virtual drug screening. They are PRank [14], RankNet [15,16]_ENREF_13, RankBoost [15,17]_ENREF_15, SVMRank [18,19], AdaRank [15,20]_ENREF_18 and ListNet [15,21]_ENREF_19, which covers the three main categories of LOR (See Methods ). Support Vector Regression (SVR) was set as the baseline VS method, and it was implemented and optimized with LibSVM [22]. A comprehensive testing pipeline was designed to compare the performance of six LOR models on the curated molecule affinity datasets. The testing datasets (Table 1 and 2) were collected from two public data sources, the Binding Database (BDB) and the 2012 benchmark dataset published by Community Structure-Activity Resource (CSAR). Four specific testing strategies were designed to achieve a comprehensive quantitatively performance evaluation of the models from different perspectives (See Experimental). Noted that there are various QSAR modeling based screening techniques while essentially they are learning based, thus only the typical SVR method was selected for comparison. The main purpose of this study is not to show the superiority of LOR to traditional methods, rather to present that LOR is an alternative option in VS and has its advantages to be extended for screening molecules on novel target as well as its utility in data integration. In the following section we will show the investigation results of different testing strategies and make the corresponding summaries respectively.

Table 1 Curated bingding database dataset
Table 2 Curated CSAR dataset

It should be noted that in the following testing strategies Normalized Discounted Cumulative Gain (NDCG) was applied for the quantitatively comparison of different VS methods. NDCG was originally presented in information retrieval community to quantitatively measure the ranking results of instances based on its position in the ranking list. Basically in the ranking performance evaluation, we keep a grand-truth ranking list which is the molecule ranking for a given target based on their known efficacy. Then for different VS methods we obtain different predicted ranking lists based on different prediction models. These predicted ranking lists can be compared to the ground-truth ranking list to evaluate the VS performances respectively, as measured by the value of NDCG. Detailed information to calculate NDCG can be seen in Methods.

Strategy I

This strategy was designed to compare LOR with the traditional SVR based VS techniques, and mimic the scenario that for a given target, there exited compounds with known affinities and they are trained to screen novel compounds. In this strategy, each protein target among the 24 curated targets from BDB and its associated compounds was treated as a task respectively. For each task, 5-fold cross validation was performed on six LOR models compared with SVR based method. The 5 times averaged NDCG value for each target among the 24 ones were calculated for quantitatively performance evaluation. As a result (Figure 3, Table 3), RankBoost and SVMRank performed the best among the six LOR models, and they are slightly better than SVR based method.

Figure 3
figure 3

NDCG@10 in Strategy I.

Table 3 NDCG@10 of strategy I

As a summary, SVMRank was the most efficient one among others. The superiority of SVMRank probably due to that such a ranking method inherits the maximum-margin characteristics of SVM. It transfers the ranking problem into a partial order pair classification problem, and utilizes the maximum margin optimization in SVM to derive the optimal ranking order. Therefore SVMRank obtains a robust and satisfied performance in LOR [6,7]. This result indicates that given proper optimization, the pair-wise based LOR model may serve as a suitable option for VS. Compared to traditional SVR-based VS, LOR could be served as an alternative option and achieves the acceptable performance in VS.

Taking accuracy and efficiency into consideration, SVMRank was selected for comparison in the following testing. It should be noted that in the following strategies, traditional SVR based method does not make sense, since there are either no training data existed for the specific target or the training data are combined from different measurements.

Strategy II

This strategy was designed to investigate the performance of LOR to screen compounds on novel targets when there is no or few ligand affinity data available for these targets. In this case, traditional learning based VS techniques are not suitable here, since there are no or few available training datasets for the specific target. Specially, for the 24 protein curated from BDB, every 23 protein targets and their associated ligands data were combined together to act as the training dataset, and then tested on the left one target among the 24 ones. The testing procedure was also performed for 5 times on the 5 random divided parts of the compounds associated with the left target, respectively. Based on this strategy, the testing datasets in the strategy I and II were made to be identical for equally comparison purpose. The 5 times averaged NDCG value for each target among the 24 ones were calculated for quantitatively performance evaluation.

In this test, SVMRank performed differently for different targets in this strategy (Figure 4). Generally, the performance is not as good as that in Strategy I but it is still acceptable, since this test was performed in a cross-target scenario. It can be seen that SVMRank made satisfied prediction on several specific targets, such as mTOR, HMGCR, MMP-8. Nevertheless, the unsatisfactory performance on other targets inspired us to investigate whether selecting phylogenetically related training target will benefit the testing results, which leads to the next strategy.

Figure 4
figure 4

NDCG@10 in Strategy II.

As a summary, SVMRank can be served as an efficient method for cross-target VS, and the performance can be improved when much more biological and pharmaceutical information are taking into considerations, as shown in the following.

Strategy III

Compared to strategy II, in this test the training dataset was formed as the compounds data associated with the targets that belongs to the same family of the test protein target, to test the influence of protein phylogenetic feature in the prediction. In this strategy, among the original 24 targets, PDE5 and CTSK belong to two big protein families respectively. For each of these two targets, their remaining family members and the corresponding compound data in BDB were selected to form the training dataset (Table 4 and 5). This strategy was designed to check whether the training set formed from the same protein family would benefit the screening results on novel target under the LOR schema, since they are phylogenetically related. The testing datasets in the strategy II and III on proteins PDE5 and CTSK are made to be identical for equally comparison purpose. The NDCG value for the two targets PDE5 and CTSK were calculated for quantitatively performance evaluation. As shown in Figure 5, the final predictions for these two targets were improved substantially compared to those in Strategy II.

Table 4 PDE family
Table 5 Cathepsin family
Figure 5
figure 5

NDCG@10 of CTSK and PDE5 in Strategy II and Strategy III.

As a summary, the results in this strategy supported that, at least in our dataset, the selection of phylogenetically related targets and their associated compound affinity data in the training process may benefit the cross-target prediction to a certain extent. Serving as an efficient cross-target VS method, LOR still has the potential to improve its performance when extended useful information are considered.

Strategy IV

By using SVMRank, this strategy was designed to test the performance of LOR to integrate heterogeneous data in VS. The rationale to design this strategy is to mimic the scenario that the compound affinity data maybe measured in different platforms or in different affinity criteria. For example, in the following test, the curated CSAR dataset was used and the compound affinities for different targets were measured in different affinity indicators as pIC50 or pKi respectively. Traditional virtual screening method cannot integrate such heterogeneous data directly. In this dataset, the compound affinity for target Chk1 is measured in pIC50, and that for targets Erk2 and Urokinase are measured in pKi. To test the performance of LOR for these targets, every 2 targets and their associated compound affinity data were taken as training data. The trained models were tested on the left one respectively and the corresponding NDCG values were calculated. It can be seen that the affinity measurement for training data and testing data in this procedure are inconsistent thus they are heterogeneous. As a results, performance on target Chk1 and Erk2 is fairly well, but it is unsatisfied on target Urokinase (Table 6). As it is reported that the directly combining of target and compound feature may lead to limited biological representation meaning [23], a different feature mapping was introduced, i.e., the cross-term [23], which was calculated as: T 147C 32 (the Kronecker product of the two feature vector for target and compound, see Feature representations of targets and proteins in Methods), resulted into a new 4074 dimensional feature vector. Such a feature representation is reported to be more representative with enhanced prediction ability in protein-ligand interaction analysis [23]. From Table 6 it can be seen that SVMRank improved the prediction performances on the Top-10 candidates for all the 3 targets by using such new feature representation, even though the training data are heterogeneous and of limited amount. Particularly, the utility of cross-term feature mapping promoted the testing result on target Urokinase.

Table 6 NDCG@10 in strategy IV

As a summary, the test results indicate that LOR may serve as a good choice for integration of various heterogeneous compound affinity data in VS, and the design of proper feature mapping in LOR will also influence the final ranking result. While the design of the efficient feature mapping method remains an open question in this field.

Discussion on various VS methods based on multiple target information

Basically all the traditional regression or classification based models require that the training and testing data are i.i.d, and they cannot handle cross-target or cross-platform data integration. Although these methods can be directly performed, the results are not comparable since these methods are theoretically not suitable for cross-target or cross-platform scenario in VS. While for LOR, it is theoretically applicable for cross-target screening for the following reasons (1). In LOR model, it treated the target-compound pair as a whole instance. It does not require the distribution of the training compound data and testing compound data to be identical, thus it is inherently suitable for cross-target situations, and (2). It only considers the ranking orders of the instances for a specific target rather than their exact affinity values. In LOR for a specific target, especial in the use of the pair-wise LOR, it transfers the compound affinity data to the pair-wise partially order pairs and treats these new order pairs as the instances. Therefore although the compound affinities associated with the target may be measured in different platforms, it will have no influence on their transferred order pairs. While for traditional regression or classification based model it commonly treats all the compound data associated with different targets as a mixture dataset, thus their cross-platform effect should be taken into considerations.

LOR can be categorized to the idea of multi-targets based QSAR modeling for VS. Our group has previous tested other three multiple targets based QSAR schemas [24,25] such as multi-task learning based QSAR modeling [26], collaborative filtering based QSAR modeling [27] and Proteochemometric Modeling (PCM) [28,29]. Compared to traditional VS methods, essentially all these methods can be used to integrate multiple target information rather than the single one. All these models are constructed on both ligand and target similarity, and it can be regarded as an extension of conventional QSAR modeling to model the relationship between multiple compounds and targets simultaneously. For the multi-task learning based QSAR modeling and collaborative filtering based QSAR modeling, the target information is implicitly embedded in one computational schema and the target descriptor is not required to be calculated, while for LOR and PCM, they explicitly require the target information. From this point of view, PCM is intrinsically the most similar to LOR in QSAR modeling among others (Figure 6). However, in theory LOR directly aims at minimizing a ranking loss function rather than a classification or regression loss, thus inherently suitable for VS and integration of cross-platform data.

Figure 6
figure 6

Proteochemometric Modeling.

Another important issue for LOR is the proper design of feature function ( ) (See Methods). In current study we just combine the two feature vector for protein and compound in two sides directly to form the new feature vector or use the cross-term feature mapping. Compared to the directly feature combination from two sides, the cross-term feature mapping is more efficient. Although these two representations have their advantages of simplicity while their biological meanings are waiting to be elucidated. Another possible way to generate the feature is to define the target-compound interaction fingerprint as applied in our previous work [30]. Such kind of fingerprint is biologically much more meaningful while they are often not applicable for large-scale data since the generation of the fingerprint is time-consuming. We hoped that in the coming future more efficient and meaningful feature functions can be investigated.


Testing pipeline

A comprehensive testing pipeline was designed to compare the performance of six LOR models on the curated molecule affinity datasets. There are mainly three points need to be addressed in this pipeline: (1). What is the performance of LOR compared with traditional SVR method, (2). What is the performance of LOR when it is extended to screen compounds on novel targets if there is no or few compound affinity data available for these targets, and (3). What is the performance of LOR to integrate heterogeneous data in VS when the compound affinities are measured in different platforms. The general pipeline designed in this study is shown in Figure 7. The brief introduction of the data and testing strategies are presented below the figure.

Figure 7
figure 7

Research workflow for LOR. The datasets used in this study were curated from Binding Database and CSAR by well-designed filtering rules. The compounds and targets are represented in a specific feature vector respectively. With certain feature mapping function, the compound-target pair as a whole is transferred to a new feature vector. Based on four different testing strategies, the testing results on different VS algorithms are presented and evaluated quantitatively with NDCG@10. The color bars in the test frame indicate the corresponding algorithms investigated in the specific test strategies.

Benchmark datasets generation

The testing datasets were collected from two public data sources, the Binding Database and the 2012 benchmark dataset published by CSAR. To make a relatively objective and balanced dataset, for the BDB, protein targets and their associated compound affinities data were selected based on the following criteria: (1). Only human protein targets are considered; (2) The redundancy of protein targets are eliminated; (3) The protein targets are selected to cover as many protein families as possible, and the proteins from the same family are avoid to be selected again as much as possible once other members in this family were selected; (4) To keep the data balanced, only targets with non-redundant ligands record number between 500 and 1,500 are considered; and (5) The affinity distribution of the compounds associated with a given target should be even. Taking pIC50 value as the affinity measurement, normally a compound is considered to be active if its pIC50 value is higher than 6 (pIC50 ≥ 6) [27], and inactive vise verse. The affinity was roughly graded into 5 categories as 0 (pIC50 < 6), 1 (6 ≤ pIC50 < 7), 2 (7 ≤ pIC50 < 8), 3 (8 ≤ pIC50 < 9), 4 (9 ≤ pIC50) according to reported literatures and we required that the associated compound affinity value should cover these 5 grades evenly. Those targets with associated compound affinities only have 0-grade and 1-grade, or the percentage of their highest grade data is fewer than 5% were also deleted. Based on these criteria, finally 24 proteins associated with 9,330 compounds were curated (Table 1). These data will be used in the former three testing strategies in the pipeline.

The second dataset is curated from the published 2012 CSAR benchmark dataset, which includes six protein targets and several of them have associated compound affinity information, while measured in different standards, including pIC50 and pKi value. In this dataset, only target Chk1, Erk2 and Urokinase with associated compound affinity data were tested in the fourth strategy in the pipeline (Table 2).


In this work, a comprehensive investigation on LOR was performed on benchmark datasets and the experiment workflow and algorithm assessment was presented. The results indicate that LOR, especially the pair-wise methods like SVMRank, can be served as an alternative option for VS compared with traditional methods. Furthermore, LOR has its inherent advantages to be extended for screening molecules of novel target as well as its utility in data integration. For a certain novel protein target, no matter whether its associated known ligand affinity information existed or not, LOR can return a satisfied ranking result. It is also theoretically suitable to rank the compounds based on the training data measured in different platforms. In addition, several future work directions on LOR would be: (1) The integration of multiple feature representations of the target as well as the compound using other descriptors or profiles. The high-dimensional pharmoco-genomics information from CMAP [31,32]_ENREF_30 and PubChem BioAssay data [33,34] can be extensively investigated. The multi-view learning [35] based methodology can be investigated to integrate different representations to present the comprehensive target and compound description and similarity calculation; (2) The transfer learning [36] based methodology is needed in VS for the study of “cross-target knowledge transfer” to leverage the information of large-scale of target and compound data.


LOR model in VS

LOR in VS aims to create a ranking function which could return the input compounds with a relevance descending affinity order for the target. Traditionally, the similarity based ranking model in VS is constructed by purely similarity-based or regression/classification-based model. In LOR framework, we often learn a ranking function f(T, C), which is trained by minimize a ranking loss function on a set of compound C ij (i = 1,2, …, m) for a given set of targets (T 1T 2, … T m ) [37]. Different from the traditional machine learning model for single target, the learned function has the generalized ability for novel data prediction. This means that for a novel target T m + 1 that is not seen in the previous training dataset, as long as it can be explicitly represented in the correspondence feature space, the system can also rank the compounds on this target.

The specific LOR procedure is analogue to the traditional training and testing procedure in QSAR modeling. In LOR, the training data contains given targets and compounds. Each target is associated with a number of compounds. While the difference between LOR and traditional QSAR model lies that the LOR model often involve multiple targets rather than one single target. The relevance of the compounds with respect to the target is known, measured as the compound affinity either in a categorical label (High, medium, low etc.) or in a numerical value (IC50, EC50). Supposing that for a given target T i , (T i , C ij ) is used to represent the target and its associated compound information, then a feature vector x ij  = (T i C ij ) is created for each target-compound pair (T i , C ij ), where  ( ) denotes the feature function. In the training procedure, the aim of LOR is to train a local ranking model f(T, C) = f(x) that can assign a ranking score to a given target-compound pair T and C, with the feature vector x representing the whole target-compound pair [37]. In the procedure of testing, given a novel target T m + 1 which is not seen in the previous screening, the ranking function f can assign scores to the compounds. This can be achieved by taken the novel target T m + 1 with its associated compound C m + 1,j as a pair (T m + 1, C m + 1,j ). And then the pair can also be represented in a feature vector based on the feature mapping function x ij  = (T m + 1C m + 1,j ). Using the trained model based other target pairs, the ranking position of C m + 1,j to T m + 1 can be predicted, finally the molecule ranking list for this novel target can be obtained (Figure 8).

Figure 8
figure 8

Illustration of training and testing in LOR.

Compared to traditional QSAR modeling, LOR is different in that it focus on multiple targets rather than single target. LOR uses a bunch of targets with their associated compounds to train a generalized prediction model and makes prediction on the other targets (Figure 8). Therefore LOR is suitable for the cross-target screening. Such an extended ranking ability for the new target cannot be achieved with the traditional classification or regression model in VS [9].

Based on the distinct forms of input instance organization, generally there are three different approaches to realize LOR, and can be categorized into three types: point-wise, pair-wise and list-wise (Table 7, Figure 9). The point-wise and pair-wise approaches transform the ranking problem into classification, regression, or ordinal classification. The list-wise approach takes ranking lists of objects as instances in learning and learns the ranking model based on ranking lists. Detailed information can be referred in the literature [38].

Table 7 6 LOR algorithms
Figure 9
figure 9

Three different approaches of LOR.

Feature representations of targets and proteins

As aforementioned, in LOR framework, for a given target-compound pair (T i , C ij ) a feature vector C ij  = (T i C ij ) should be defined, where ( ) denotes the feature function. In this study, for ligands, the widely used General Descriptor (GD, 32 bit) is employed to represent the ligand in a 32-dimensional feature vector. GD measures a compound through four aspects, van der Waals surface area, log P (octanol/water), molar refractivity and partial charge [39]. For protein targets, they were depicted through CTD (Composition, Transition, Distribution) feature, which represents the amino acid distribution patterns of a specific structural or physicochemical property along a protein or peptide sequence. The protein target is represented in 147-dimension vector by the CTD feature. In this study, GD was calculated through the software Molecular Operating Environment (MOE, C.C.G., Inc. Molecular Operation Environment, 2008.10; Montreal, Quebec, Canada, 2008) and protein CTD feature was calculated by PROFEAT [40].

After representing target and compound respectively, the chosen of ( ) is important for the performance of LOR. In strategy I, II and III, the protein feature and compound feature were combined in two sides directly to form the new feature vector (totally 179-dimension). In strategy IV, the cross-term feature mapping function was also used to generate the new feature vector for target-compound pair representation. While the possibility of defining other forms of ( ) was discussed in Results and Discussion.

Performance measurement

In order to quantitatively evaluate the VS performance under the LOR schema, Normalized Discounted Cumulative Gain (NDCG) was applied in evaluation. NDCG was originally presented in information retrieval community to measure the ranking results of instances based on its position in the ranking list. Specifically, assuming \( \overline{y} \) is ideal ranking and ŷ is the predicted ranking, for the top-k in the predicted ranking list, NDCG [8] is calculated as following:

$$ \mathrm{NDCG}@k=\frac{\mathrm{DCG}@k\left(\widehat{y}\right)}{\ \mathrm{D}\mathrm{C}\mathrm{G}@k\left(\overline{y}\right)} $$
$$ \mathrm{D}\mathrm{C}\mathrm{G}@k={\displaystyle {\sum}_{r=1}^k\frac{2^{y(r)}-1}{ \log {}_2\left(1+r\right)}} $$

Where y (r) is the rank label of the compound at r-th position in the ranking list.

Noted that if the predicted ranking is exactly the same as the ground truth, the NDCG value will be 1.0. This measurement can be used for the evaluation of LOR results compared to traditional regression or classification based performance measurements such as RMSE and accuracy etc. Also we noticed that there are some other ranking performance evaluations like ERR [41], MAP [42] etc., while they are not intuitionistic as NDCG does.

It also be noted that in this study, only the top-10 ranking results were evaluated with NDCG value, denoted as NDCG@10. This is a very strict evaluation criteria since the ideal ranking list can only be achieved when the top-10 known candidates were successfully predicted.


  1. Agarwal S, Dugar D, Sengupta S. Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach. J Chem Inf Model. 2010;50(5):716–31.

    Article  CAS  Google Scholar 

  2. Shoichet BK. Virtual screening of chemical libraries. Nature. 2004;432(7019):862–5.

    Article  CAS  Google Scholar 

  3. Walters WP, Stahl MT, Murcko MA. Virtual screening–an overview. Drug Discov Today. 1998;3(4):160–78.

    Article  CAS  Google Scholar 

  4. Fechner U, Schneider G. Evaluation of Distance Metrics for Ligand‐Based Similarity Searching. Chem BioChem. 2004;5(4):538–40.

    CAS  Google Scholar 

  5. Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V. A practical overview of quantitative structure-activity relationship. EXCLI J. 2009;8:74–88.

    Google Scholar 

  6. Trotman A. Learning to rank. Inf Retr. 2005;8(3):359–81.

    Article  Google Scholar 

  7. Liu T-Y. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval. 2009;3(3):225–331.

    Article  CAS  Google Scholar 

  8. Wassermann AM, Geppert H, Bajorath JR. Searching for target-selective compounds using different combinations of multiclass support vector machine ranking methods, kernel functions, and fingerprint descriptors. J Chem Inf Model. 2009;49(3):582–92.

    Article  CAS  Google Scholar 

  9. Rathke F, Hansen K, Brefeld U, Muller KR. StructRank: A New Approach for Ligand-Based Virtual Screening. J Chem Inf Model. 2011;51(1):83–92.

    Article  CAS  Google Scholar 

  10. Wale N, Karypis G. Target Fishing for Chemical Compounds Using Target-Ligand Activity Data and Ranking Based Methods. J Chem Inf Model. 2009;49(10):2190–201.

    Article  CAS  Google Scholar 

  11. Li S, Leihong W, Xiaohui F, Yiyu C. Consensus Ranking Approach to Understanding the Underlying Mechanism With QSAR. J Chem Inf Model. 2010;50(11):1941–8.

    Article  Google Scholar 

  12. Al-Sharrah G. Ranking Using the Copeland Score: A Comparison with the Hasse Diagram. J Chem Inf Model. 2010;50(5):785–91.

    Article  CAS  Google Scholar 

  13. Lerche D, Sørensen PB, Brüggemann R. Improved Estimation of the Ranking Probabilities in Partial Orders Using Random Linear Extensions by Approximation of the Mutual Ranking Probability. J Chem Inf Model. 2003;43(5):1471–80.

    Article  CAS  Google Scholar 

  14. Crammer K, Singer Y. Pranking with ranking. Adv Neur In. 2002;14:641–7.

    Google Scholar 

  15. Van Dang: RankLib []

  16. Burges CJ. From ranknet to lambdarank to lambdamart: An overview. Learning. 2010;11:23–581.

    Google Scholar 

  17. Freund Y, Iyer R, Schapire RE, Singer Y. An efficient boosting algorithm for combining preferences. J Mach Learn Res. 2004;4(6):933–69.

    Google Scholar 

  18. Joachims T. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM; 2002: 133–142.

  19. Joachims T. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM; 2006: 217–226.

  20. Xu J, Li H. Adarank: a boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM; 2007: 391–398.

  21. Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, ACM; 2007: 129–136.

  22. Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST). 2011;2(3):27.

    Google Scholar 

  23. Jacob L, Vert J-P. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics. 2008;24(19):2149–56.

    Article  CAS  Google Scholar 

  24. Liu Q, Che D, Huang Q, Cao Z, Zhu R. Multi‐target QSAR Study in the Analysis and Design of HIV‐1 Inhibitors. Chin J Chem. 2010;28(9):1587–92.

    Article  CAS  Google Scholar 

  25. Liu Q, Zhou H, Liu L, Chen X, Zhu R, Cao Z. Multi-target QSAR modelling in the analysis and design of HIV-HCV co-inhibitors: an in-silico study. BMC Bioinformatics. 2011;12(1):294.

    Article  CAS  Google Scholar 

  26. Liu Q, Xu Q, Zheng VW, Xue H, Cao Z, Yang Q. Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study. BMC Bioinformatics. 2010;11(1):181.

    Article  Google Scholar 

  27. Gao J, Che D, Zheng VW, Zhu R, Liu Q. Integrated QSAR study for inhibitors of hedgehog signal pathway against multiple cell lines: a collaborative filtering method. BMC Bioinformatics. 2012;13(1):186.

    Article  CAS  Google Scholar 

  28. Gao J, Huang Q, Wu D, Zhang Q, Zhang Y, Chen T, et al. Study on human GPCR–inhibitor interactions by proteochemometric modeling. Gene. 2013;518(1):124–31.

    Article  CAS  Google Scholar 

  29. Wu D, Huang Q, Zhang Y, Zhang Q, Liu Q, Gao J, et al. Screening of selective histone deacetylase inhibitors by proteochemometric modeling. BMC Bioinformatics. 2012;13(1):212.

    Article  CAS  Google Scholar 

  30. Shen Z, Huang Q, Kang H, Liu Q, Cao Z, Zhu R. A new fingerprint of chemical compounds and its application for virtual drug screens. ACTA CHIMICA SINICA. 2011;69(1):1845–50.

    Google Scholar 

  31. Huang S. Genomics, complexity and drug discovery: insights from Boolean network models of cellular regulation. Pharmacogenomics. 2001;2(3):203–22.

    Article  CAS  Google Scholar 

  32. Adkins DE, Åberg K, McClay JL, Bukszár J, Zhao Z, Jia P, et al. Genomewide pharmacogenomic study of metabolic side effects to antipsychotic drugs. Mol Psychiatry. 2011;16(3):321–32.

    Article  CAS  Google Scholar 

  33. Wang Y, Bolton E, Dracheva S, Karapetyan K, Shoemaker BA, Suzek TO, et al. An overview of the PubChem BioAssay resource. Nucleic Acids Res. 2010;38 suppl 1:255–66.

    Article  Google Scholar 

  34. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, et al. PubChem's BioAssay database. Nucleic Acids Res. 2012;40(D1):D400–12.

    Article  CAS  Google Scholar 

  35. Muslea I, Minton S, Knoblock CA. Active + semi-supervised learning = robust multi-view learning. ICML. 2002;2:435–42.

    Google Scholar 

  36. Pan SJ, Yang Q. A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on. 2010;22(10):1345–59.

    Article  Google Scholar 

  37. Li H. Learning to rank for information retrieval and natural language processing. Synthesis Lectures Human Language Technol. 2011;4(1):1–113.

    Article  CAS  Google Scholar 

  38. Chang K.-Y. A Survey on Learning to Rank. 2010

  39. Labute P. A widely applicable set of descriptors. J Mol Graph Model. 2000;18(4):464–77.

    Article  CAS  Google Scholar 

  40. Li Z-R, Lin HH, Han L, Jiang L, Chen X, Chen YZ. PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res. 2006;34 suppl 2:32–7.

    Article  Google Scholar 

  41. Chapelle O, Metlzer D, Zhang Y, Grinspan P. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management, ACM; 2009: 621–630.

  42. Yue Y, Finley T, Radlinski F, Joachims T. A support vector method for optimizing average precision. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM; 2007: 271–278.

Download references


All the datasets as well as the LOR algorithm packages are available at This work was supported by the Young Teachers for the Doctoral Program of Ministry of Education, China (Grant No. 20110072120048), Innovation Program of Shanghai Municipal Education Commission (Grant No. 20002360059), the Fundamental Research Funds for the Central Universities (Grant No. 2000219084), National Natural Science Foundation of China (Grant No.31100956 and Grant No. 61173117), National 863 Funding (Grant No. 2012AA020405) and Zhejiang Open Foundation of the Most Important Subjects.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Qi Liu.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

Conceived and designed the experiments: QL and WZ. Performed the experiments: WZ, YNC, LJJ, HPW, KLT, JW., Analyzed the data: WZ, HPW and RXZ. Wrote the paper: QL,WZ, ZWC and JW. All authors discussed the results and commented on the manuscript. All authors have given approval to the final version of the manuscript.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Ji, L., Chen, Y. et al. When drug discovery meets web search: Learning to Rank for ligand-based virtual screening. J Cheminform 7, 5 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: