- Research article
- Open Access
Application of 3D Zernike descriptors to shape-based ligand similarity searching
© Venkatraman et al; licensee BioMed Central Ltd. 2009
- Received: 1 September 2009
- Accepted: 17 December 2009
- Published: 17 December 2009
The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed.
In this study, we have examined the efficacy of the alignment independent three-dimensional Zernike descriptor (3DZD) for fast shape based similarity searching. Performance of this approach was compared with several other methods including the statistical moments based ultrafast shape recognition scheme (USR) and SIMCOMP, a graph matching algorithm that compares atom environments. Three benchmark datasets are used to thoroughly test the methods in terms of their ability for molecular classification, retrieval rate, and performance under the situation that simulates actual virtual screening tasks over a large pharmaceutical database. The 3DZD performed better than or comparable to the other methods examined, depending on the datasets and evaluation metrics used. Reasons for the success and the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability.
The 3DZD has unique ability for fast comparison of three-dimensional shape of compounds. Examples analyzed illustrate the advantages and the room for improvements for the 3DZD.
- Enrichment Factor
- Spherical Harmonic
- Virtual Screening
A crucial step in early phase drug discovery is the identification of promising drug leads i.e. those of pharmacological interest. A guiding premise in this stage is that of the similarity property principle [1, 2] which states that similar structures are likely to have similar properties (although exceptions to this rule do exist [3, 4]). Proceeding along these lines, similarity based virtual screening efforts  look for nearest neighbours for a given query structure. The output of the screen is in the form of a sorted list, where top-ranking molecules are selected to undergo further processing.
As compound databases can hold millions of structures (spanning a large chemical space), the application of such approaches requires suitable molecular representations that aid rapid screening. An additional requirement is that of a numerical measure that quantifies the similarity between the compounds. Popular descriptions include fingerprints that encode the two-dimensional molecular structure as a bit string where each value indicates the presence or absence of a desired attribute (e.g. a substructural fragment) [6, 7]. Similarities between the ligands can then be obtained using the Tanimoto score  which accounts for the number of bits shared by the fingerprints. While these descriptors are extremely efficient and easy to calculate they have still some limitations . A number of three-dimensional (3D) similarity methods [10–12] have therefore been developed to investigate if the 3D structure information improves over the existing descriptors.
While a number of techniques for 3D molecular comparison have been proposed [1, 13, 14], in this article, we focus on similarity-based virtual screening using molecular shape [15–17] as the key feature for discrimination. Shape is known to play an important role in molecular-recognition, with previous studies demonstrating successful applications to virtual screening experiments [16, 18]. However, identifying suitable encodings based on shape are far from trivial [18–21], which pose significant hurdles in their application to fast screening of compound databases.
In order to facilitate efficient comparisons, several representations of shape have been proposed ranging from those based on moments  and surfaces [21, 23] to grid-based approaches  (see Putta & Beroz  for a comprehensive review). A well-established method is that of ROCS (Rapid Overlay of Chemical Structures) that describes the molecule as a set of atom-centered Gaussians . Shape similarity scores are then evaluated in terms of the rigid body overlap volume with comparison timings in the milliseconds range. Goldman and Wipke , on the other hand, divide the molecular surface  into a series of patches (2Å radius) centered on a set of critical points  with each patch defined by a geometrically invariant descriptor (the principal curvatures, normals, and the shape index). Points with similar geometric signatures are identified, based on which a transformation can be calculated. Proceeding along the same lines, SURFCOMP  uses a graph matching to identify correspondences between shape (local curvature) critical points of the molecular surfaces being compared. Although the superimpositions found using the above two methods were found to be reasonably accurate, pairwise comparisons took more than a minute, which limits their large scale application.
Spherical harmonics based representations [29, 30] have been further applied to comparing shapes of ligand binding sites  and as geometric filters for virtual high throughput screening . The use of spherical harmonics allows the surface information to be encoded in a compact form as an orthonormal one-dimensional (1D) vector of floating numbers rendering it amenable to fast comparison. However, for the molecules to be compared, they have to be placed in a standard frame of reference. This has been shown to be error-prone and hence may result in the decreased performance of the descriptor [33, 34]. Consequently, descriptors that obviate the need for any pre-alignment have been the focus in a number of studies. Shape signatures [18, 35], for example, produce a 1D representation of the ligand or receptor site by ray-tracing the molecular volume. The geometric information is encoded as a probability distribution which enables fast comparisons using the shape histograms. The signatures can be further extended to incorporate other properties such as electrostatics. Another method that captures shape independent of orientation is the Ultrafast Shape Recognition (USR) scheme . In this technique, the 3D molecular shape is represented as a set of statistical moments, generated from all atom distance distributions that are calculated with respect to preselected reference locations.
More recently, a number of articles [36–41] have advocated the use of 3D Zernike invariants as descriptors for shape comparison. An extension of spherical harmonics, the 3D Zernike descriptors (3DZD) have favourable features such as orthonormality and compactness. More importantly, the 3DZD are also invariant to transformation (see section on computational details), and thus the pre-alignment step is no longer required. Represented by a 1D set of numbers (subject to a specified order of expansion), the 3DZD have facilitated rapid screening of protein databases , discrimination of proteins based on the electrostatic potential  and for analyzing shapes of ligand binding pockets [34, 37].
In this paper, we examine the efficacy of the 3DZD as a tool for shape similarity based virtual screening. Due to its compact representation, 3DZD enable fast comparison of compounds, which is a key property of virtual screening methods given the fast growing size of molecular databases. The performance is compared with several other methods, USR , SIMCOMP , EVA , Unity2D , Molprint-2D , and MACCS . These approaches and the metrics used for evaluation are briefly described in the next section. Three datasets were used: the first one is a set of 47 diverse odour compounds (divided into seven classes) taken from a previous study by Takane and Mitchell , while the second one is the Directory of Useful Decoys (DUD) dataset [45, 46], that contains 13 targets with 66013 compounds. The last dataset includes 42,689 anti-HIV inhibitors  categorized into active, inactive, and moderately active. With the first dataset, we test the ability of the methods to classify compounds, while the second dataset is employed to examine the ability to rank actives among decoys. In addition, the third dataset is used to simulate actual virtual screening process against a large pharmaceutical database. Results evaluated with respected to the datasets are assessed in terms of several evaluation metrics. Reasons for the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability with suggestions for future work.
Methods for structure comparison
Computational approaches for ligand screening used in this study are briefly introduced here. For further details, please refer to the cited articles.
3D Zernike Descriptors
The moments however are not rotationally invariant but as rotations do not change the magnitudes of the functions, the invariant features are expressed in terms of the norms . Translational invariance is obtained by fixing the coordinate system with the origin coinciding with the spatial center of the molecule. From mathematical point of view, this procedure is proven to compute the identical descriptor for an object regardless of the positioning of the object in the space [36, 48]. However, in practice some variance may be caused due to numerical errors and the voxelization step of the object necessary to represent the shape of the object. In our previous paper, we examined the variance caused by rotation . An advantage of the 3DZD is that it can also describe non-star-like shapes. This is a limiting factor for spherical harmonics as they can only model single-valued surfaces .
Note that DE is 0 for two identical molecules, while the correlation coefficient and DM give the value of 1 for that case.
Ultrafast Shape Descriptor
Here, M Q and M D are the one-dimensional vectors corresponding to the query and database molecules and N is the length of the vector, determined by the number of statistical moments considered i.e. N = 12 for the first three and N = 16 for the first four. In addition to DM, in this study we also employed the Euclidean distance (DE) and the correlation coefficient.
The other existing programs compared in this study
In addition, we compared with several existing programs: EVA , UNITY2D , SIMCOMP , Molprint-2D , and MACCS . Below we provide a brief description of the characteristics of the methods.
The EVA (Eigen VAlue) descriptors are derived from the vibrational frequency calculations (calculated normal modes) with each molecule represented as a vector of 761 numbers . In the UNITY2D  the molecule is encoded as a Boolean array of 922 bits (available in SYBYL 7.1) that encode the presence (1) or absence (0) of substructural features. The results of these two methods are taken from the paper by Takane & Mitchell (2004), who analyzed the odour dataset.
Here, MCS is the maximal common subgraph found and |.| represents the number of vertices in the graphs. The score depends on the sizes of the graphs and ranges between 0 and 1.
The MACCS descriptors are a set of 166 predefined structural keys that encode patterns in the molecule (such as the presence of S-S bonds, rings of size 4, presence of halogen etc).
The Molprint-2D fingerprint  also encodes the molecule as a binary vector by taking into consideration the atom environment (only heavy atoms) i.e. the counts of the types of the atoms (SYBYL atom types are used) within two bond-lengths of a central atom. The bits thus encode the presence or absence of these atom environments.
In addition, we employ a method which simply considers the number of atoms in the molecule (the atom count method). The similarity of two molecules is defined as the difference of the number of atoms. This method serves as the reference to examine the effect of using shape information by the above methods.
The aforementioned approaches were tested on three datasets that are chosen to demonstrate the applicability of the methods for classification and rapid screening of large databases. The first dataset was taken from an earlier study by Takane and Mitchell , who attempted a clustering of 47 odour compounds using the EVA descriptors. The compounds are divided into seven categories: amber (9 compounds), bitter almond (9 compounds), camphor (5 compounds), musk (11 compounds), jasmine (2 compounds), rose (5 compounds) and muguet (6 compounds).
Breakdown of the DUD dataset.
Decoys per active
angiotensin-converting enzyme (ace)
cyclin-dependent kinase 2(cdk2)
epidermal growth factor receptor(egfr)
HIV reverse transcriptase(hivrt)
enoyl ACP reductase(inha)
P38 mitogen activated protein(p38)
platelet derived growth factor receptor kinase(pdgfrb)
tyrosine kinase SRC(src)
vascular endothelial growth factor receptor(vegfr2)
The third dataset from the National Cancer Institute (NCI) consists of 42,687 compounds derived from an assay measuring protection from HIV-1 infection of human CEM cells . The compounds were further categorized into 423 confirmed actives (100% protection), 1,081 moderately actives (> 50% protection) and 41,185 confirmed inactives (<50% protection) yielding a ratio of 97 decoys per active. More details of the dataset is available at the website at NCI http://dtp.nci.nih.gov/docs/aids/aids_data.html. The coordinates for these structures were downloaded from http://ligand.info in the SDF format. The dataset not only resembles a pharmaceutical database but also enables the extraction of actives in a form akin to that of a typical virtual screening experiment. For cases with missing coordinates, the structures were rebuilt using CORINA . Examination of the compounds in the dataset revealed that a small number of cases had disconnected components. As neither the 3DZD nor USR can currently handle such structures, we decided to choose the largest fragment to represent the compound. Following a previous work on the same dataset by von Grotthus et al. , the objective was to test the retrieval capability of the actives using the 1081 moderately actives as queries. The datasets used are available at our website, http://kiharalab.org/zernikeligand/.
- 1)Clustering and the Adjusted Rand Index- For the odour dataset, results were evaluated based on the quality of the clustering obtained. Ward's hierarchical clustering  was done using software downloaded from http://www.let.rug.nl/kleiweg/clustering/. Starting with an initial number of clusters (say N), Ward's method merges two clusters at a time while minimizing the sum of squared errors at each step. To compare, the quality of the clusters, the adjusted Rand index [43, 61] has been applied and is given by
- 2)Enrichment factor - This metric  describes the ratio of actives retrieved relative to the percentage of the database scanned. If T A be the total number of actives in a database of size T D and N a be the number of actives in the top x percent N x of the database, then the enrichment factor is given by
- 3)BEDROC - Although frequently used, the enrichment factor has a major drawback in the form of the "early recognition problem". It does not distinguish between schemes that rank actives ranked at the top of the list from those that place them at the end. As actives ranked earlier in the list are desired, the Boltzmann enhanced discrimination of receiver operating characteristic or BEDROC  was proposed to evaluate the performance of ranking methods. The metric is given by
where, n is the number of actives among N compounds, ri is the rank of the ith active and α is a parameter that assigns a weight towards compounds the top of the ranked list. The BEDROC metric ranges between 0 and 1 and in this study, has been calculated for α = 160.9 and α = 32.2 which corresponds to the top 1% and 5% of the relative rank accounting for 80% of the BEDROC score.
where is the number of decoys ranked above the i th active.
All calculations were performed on a 2.13 GHz Intel dual processor system running Linux with 8 GB RAM. Programs for the extraction and comparison of the moments based on the 3DZD and USR were written in C++. For the USR approach, the extraction of moments typically takes around 4 ms on an average. Times for the 3DZD on an average take about 1s including surface generation which is about 250 times that of the USR. However, this step needs to be performed only once and can be directly stored in a database.
Software for SIMCOMP  was downloaded from the KEGG website http://web.kuicr.kyoto-u.ac.jp/simcomp/. The software executable SIMCOMP which performs a pairwise comparison of two structures was used to calculate the similarities. Prior to the calculation, all structures were converted into the required KCF (KEGG Chemical Function) format using the SOAP/WSDL http://www.genome.jp/kegg/soap/ interface provided by the KEGG database.
Adjusted Rand Indices, BEDROC, and AUCROC values.
Adjusted Rand Index
BEDROC (α = 160.9)
BEDROC (α = 32.2)
(Maximal Common Subgraph)
σ = 100 cm-1
σ = 50 cm-1
σ = 20 cm-1
166 bit key
For the 3DZD, similarity matrices were built based on the three different measures of the correlation coefficient (r), DE, and DM. Orders of expansion ranging from 4 to 14 were tested, with significant gains observed in the value of the Rand index, as the order increases. The highest adjusted Rand index (0.487) is obtained for the 3DZD, where both correlation and DE metrics provide the same results for an order 12 expansion. The value becomes worse when the DM is used (0.393). Using a higher order (here we examined 14) resulted in smaller adjusted Rand index value which also suggests that expansion orders of 10-12 should be appropriate for comparison. The poorer performance at this level (>12) can be attributed to the noise resulting from far too detailed a description of the molecular shape. On the other hand, using a smaller order say 4-6, results in a much lower Rand index value (0.25-0.30), implying insufficient resolution for shape description.
The highest value achieved by the 3DZD (0.487) is higher than that of SIMCOMP (0.400), USR (0.342) and its variant USR-k (0.328), MACCS (0.364), and the atom count method (0.400). The 3DZD also marginally outperforms the EVA descriptor based classification which achieved a highest Rand index value of 0.480. However, Molprint-2D, which considers atom environments and atom types, shows the highest value among all (0.516).
The methods are also evaluated by two ranking-based scores, BEDROC and ROCAUC. The performance of the 3DZD becomes worse when evaluated by these two scores relative to the other methods. When evaluated by BEDROC (α = 160.9), Molprint-2D shows the highest value (0.848), and the rest of the methods are ranked in the following order: SIMCOMP, USR (Manhattan, 0.782), MACCS (0.778), and the 3DZD (0.739 with the correlation coefficient, order = 8). In terms of BEDROC (α = 32.2) and ROCAUC, SIMCOMP shows the highest value and Molprint-2D comes to the close second. With the AUCROC, the 3DZD (0.748 with Manhattan, order = 14) is ranked the third, and MACCS (0.742), USR (0.718 with Manhattan, order = 12) follow in this order. Although BEDROC and ROCAUC are frequently used ranking-based scores, it would be argued that such ranking-based scores are not very appropriate for a small dataset like the odour dataset . As mentioned in Method section, BEDROC α = 160.9 and α = 32.2 emphasize ranks in top 1% and 5%, respectively, which correspond only to the top rank and top two ranks for the odour dataset of 47 compounds. On the other hand, the AUCROC computes unintuitively high value for many search results since the number of hits in the dataset is relatively high (5 to 11 hits among 49 total, as described in the dataset section).
Summary of the performance of the methods in the DUD dataset.
BEDROC (α = 32.2)
3DZD (order 12, Correlation coeff.)
USR (order = 16, Correlation coeff.)
Overall, the clustering results based on the 3DZD for the odour dataset have been encouraging. To further evaluate this approach, we have analysed the performances with respect to much larger datasets, results for which are presented in the next section.
In the previous section, we investigated how well the methods classify compounds using a small dataset of 47 odour compounds. Next, we use a larger dataset, namely, the DUD dataset (Table 1), to examine the performance of the methods in ranking and retrieving active compounds. The DUD dataset is appropriate for this task since it is developed for virtual screening benchmark and has been used in several previous studies. For the query, the active ligand molecule crystallized with the target protein is used to retrieve the other known active molecules among decoys. We compare 3DZD, USR, MACCS, and the atom count method. Three ranking-based evaluation metrics are used, i.e. BEDROC (a = 32.2), the enrichment factor (5%), and the AUCROC.
NCI anti-HIV dataset
The third dataset, the anti-HIV dataset from the National Cancer Institute, is employed to simulate a typical virtual screening experiment. With both the actives and inactives forming the database to be searched, each of the remaining 904 moderately active molecules was used as the query. USR typically takes about 0.74 (12 terms)-0.76 (16 terms) seconds per query. Timings for the 3DZD comparisons are about 3 times that of the USR with per query comparisons taking 2.62 seconds for order 8 expansions and up to 2.70 seconds for order 14. It must be noted that the 3DZD considers more terms ranging from 25 (order 8) to 64 terms (order 14). The timings are reported with respect for 38352 database molecules that were compared and include the calculation of all the three distance metrics (correlation, DE, and DM). In comparison, the graph based SIMCOMP is significantly slower with timings exceeding an hour (~4245 seconds/query) in most cases.
The enrichment factors, BEDROC, and AUC ROC scores evaluated for the anti-HIV dataset.
(α = 160.9)
(α = 32.2)
(Maximal Common Subgraph)
The performance of the 3DZD for this dataset is however slightly poorer in comparison to the other methods tested. SIMCOMP achieves the highest value for the enrichment factor value at the 1% cutoff (2.735), BEDROC (α = 160.9) (0.0383), and for AUCROC (0.477). It also shows a better score than the 3DZD for BEDROC (α = 32.2). USR (Manhattan, with the order of 12) outperforms the 3DZD at all the metrics except for AUCROC. However, the relatively small value by all the methods, the 3DZD, USR, and SIMCOMP, suggests that all the methods compared here would not effective from an early recognition perspective for the anti-HIV dataset. It is particularly notable that the performance of the atom count method is close to the other methods on this dataset and even shows the highest value in the EF5% and BEDROC (α = 32.2). These results imply that molecular shape information is not effective as it is for the previous two datasets. We discuss the nature of this dataset in Discussion, which could be a reason of the results.
A case where the 3DZD is able to retrieve more hits in the top 100 than SIMCOMP and USR,
A case where SIMCOMP outperforms USR and 3DZD, and
Where USR does better than SIMCOMP and 3DZD.
Ranks and distance values for the case shown in Figure 5 where SIMCOMP outperforms both 3DZD and USR.
On the basis of these results observed on the three datasets, it would be of interest to combine other molecular surface properties such as electrostatics and hydrophobicity that are captured implicitly by the atom environments used in SIMCOMP and Molprint-2D. While this may be a limiting factor for USR, the 3DZD facilitates their incorporation in a more convenient way as shown in our previous paper .
In this article, we have presented the 3D Zernike descriptors for ligand similarity searching. The work was motivated by previous studies that showed that these rotation invariant descriptors outperformed several other shape and alignment based comparison techniques when applied to proteins . Compactness in terms of the number of coefficients generated in comparison to their spherical harmonics counterpart and easy extensibility to other molecular properties  were additional factors in favour of this descriptor. The compactness of the 3DZD enable fast comparison of compounds, which is a key property of virtual screening methods given the fast growing size of molecular databases.
The application to ligand similarity searching was exemplified using three datasets, each of which has a different purpose. The first odour dataset is suitable for examining ability to classify compounds into experimentally verified categories. The adjusted Rand index used as a measure of agreement with the known classification was found to be the second highest (0.487) for the 3DZD and it outperformed other shape based method (USR), a chemical graph matching scheme (SIMCOMP), a 2D finger print-based method (UNITY2D), the vibrational frequency based method (EVA), and MACCS. The second dataset, DUD, was chosen to investigate ability of ranking compounds, as the dataset has been used for the same purpose in several previous studies. All the methods compared consistently showed better performance than the simple atom count method, and the performance of the 3DZD was comparable among the methods.
For the third dataset, moderately active structures were used as queries to search an anti-HIV database of active and inactives. This dataset is intended to simulate actual application of virtual screening methods to a large pharmaceutical database. Rapid comparisons are facilitated by the floating point vector representation and both 3DZD and USR were found to be more than 60 orders faster than SIMCOMP. Although both USR and 3DZD describe shapes using moments, they exhibit a preference for specific shape types. Examination of a few cases suggests that the 3DZD may perform better for comparing molecules of more or less spherical shape while USR performs well for elongated asymmetric structures. However, a more detailed analysis of this would be required and is planned for the future. For this dataset, the 3DZD, USR, and SIMCOMP showed much lower values in terms of all the evaluation metrics as compared with the results for the previous two datasets and those typically seen in literature . Moreover, the atom count method which simply considers the size of molecules showed comparable results, indicating molecular shape information did not add effective information for retrieval. To understand these results, it should be noted that this dataset may contain active compounds for multiple different molecular targets and, moreover, the heterogeneity of actives may be further increased due to the fact that the activity of compounds is measured by assays in living cell systems, where metabolism and uptake become important factor. Hence, generally speaking, we must say that this dataset is not among the most appropriate data for rigorous benchmark of virtual screening methods. In this study, however, we tried this dataset to mimic actual situation of the virtual screening after testing the methods on two well curated datasets.
Compound similarity searching by the 3DZD is intrinsically sensitive to the shape of molecules. This characteristic of the 3DZD can work as an advantage or can also lead to poor performance. In Figure 5, we showed a case that the 3DZD were able to find an active compound which was failed by SIMCOMP and USR. On the other hand, the two jasmine compounds in Figure 3 are the case where the 3DZD failed but the method which considers atom environments (i.e. SIMCOMP, see Figure 2), can detect their similarity. Figures 6 and 9 also exemplify compounds that the 3DZD find similarity based on the shape, which are not desired.
In summary, the 3DZD provide compact representations of molecular shape and can be applied to rapid screens of large compound databases. In addition to shape, other molecular properties can also be incorporated, thus enabling uniform comparison of the structures. The fact that they are surface based has other advantages such as comparison of shapes of binding pockets and the ligands bound to them, which are currently being studied in our group.
The authors acknowledge Lee Sael for developing programs for the 3DZD method. This work is supported by grants from the National Institutes of Health (R01GM075004) and in part by National Science Foundation (DMS800568, EF0850009, IIS0915801).
- Bender A, Glen RC: Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004, 2: 3204-3218. 10.1039/b409813g.View ArticleGoogle Scholar
- Johnson AM, Maggiora GM: Concepts and applications of molecular similarity. 1990, New York: WileyGoogle Scholar
- Martin YC, Kofron JL, Traphagen LM: Do structurally similar molecules have similar biological activity?. J Med Chem. 2002, 45: 4350-4358. 10.1021/jm020155c.View ArticleGoogle Scholar
- Kubinyi H: Similarity and dissimilarity: A medicinal chemist's view. Perspectives in Drug Discovery and Design. 1998, 9-11: 225-252. 10.1023/A:1027221424359.View ArticleGoogle Scholar
- Eckert H, Bajorath J: Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today. 2007, 12: 225-233. 10.1016/j.drudis.2007.01.011.View ArticleGoogle Scholar
- MACCS Keys. 2009, MDL Information Systems, Inc., San Leandro, CA, [http://www.mdl.com]
- Bender A, Mussa HY, Glen RC, Reiling S: Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J Chem Inf Comput Sci. 2004, 44: 1708-1718.View ArticleGoogle Scholar
- Edgar SJ, Holliday JD, Willett P: Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. J Mol Graph Model. 2000, 18: 343-357. 10.1016/S1093-3263(00)00061-9.View ArticleGoogle Scholar
- Schneider G, Schneider P, Renner S: Scaffold-hopping: How far can you jump?. Qsar & Combinatorial Science. 2009, 25: 1162-1171. 10.1002/qsar.200610091.View ArticleGoogle Scholar
- Nisius B, Goller AH: Similarity-based classifier using topomers to provide a knowledge base for hERG channel inhibition. J Chem Inf Model. 2009, 49: 247-256. 10.1021/ci800304t.View ArticleGoogle Scholar
- Bender A, Mussa HY, Gill GS, Glen RC: Molecular surface point environments for virtual screening and the elucidation of binding patterns (MOLPRINT 3D). J Med Chem. 2004, 47: 6569-6583. 10.1021/jm049611i.View ArticleGoogle Scholar
- Markt P, Petersen RK, Flindt EN, Kristiansen K, Kirchmair J, Spitzer G, Distinto S, Schuster D, Wolber G, Laggner C, Langer T: Discovery of novel PPAR ligands by a virtual screening approach based on pharmacophore modeling, 3D shape, and electrostatic similarity screening. J Med Chem. 2008, 51: 6303-6317. 10.1021/jm800128k.View ArticleGoogle Scholar
- Chen Q, Higgs RE, Vieth M: Geometric accuracy of three-dimensional molecular overlays. J Chem Inf Model. 2006, 46: 1996-2002. 10.1021/ci060134h.View ArticleGoogle Scholar
- Artymiuk PJ, Bath PA, Grindley HM, Pepperrell CA, Poirrette AR, Rice DW, Thorner DA, Wild DJ, Willett P, Allen FH: Similarity searching in databases of three-dimensional molecules and macromolecules. J Chem Inf Comput Sci. 1992, 32: 617-630.View ArticleGoogle Scholar
- Putta S, Beroza P: Shapes of things: computer modeling of molecular shape in drug discovery. Curr Top Med Chem. 2007, 7: 1514-1524. 10.2174/156802607782194770.View ArticleGoogle Scholar
- Rush TS, Grant JA, Mosyak L, Nicholls A: A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. J Med Chem. 2005, 48: 1489-1495. 10.1021/jm040163o.View ArticleGoogle Scholar
- Kirchmair J, Ristic S, Eder K, Markt P, Wolber G, Laggner C, Langer T: Fast and efficient in silico 3D screening: toward maximum computational efficiency of pharmacophore-based and shape-based approaches. J Chem Inf Model. 2007, 47: 2182-2196. 10.1021/ci700024q.View ArticleGoogle Scholar
- Chekmarev DS, Kholodovych V, Balakin KV, Ivanenkov Y, Ekins S, Welsh WJ: Shape signatures: new descriptors for predicting cardiotoxicity in silico. Chem Res Toxicol. 2008, 21: 1304-1314. 10.1021/tx800063r.View ArticleGoogle Scholar
- Willett P: Searching techniques for databases of two- and three-dimensional chemical structures. J Med Chem. 2005, 48: 4183-4199. 10.1021/jm0582165.View ArticleGoogle Scholar
- Labute P, Williams C, Feher M, Sourial E, Schmidt JM: Flexible alignment of small molecules. J Med Chem. 2001, 44: 1483-1490. 10.1021/jm0002634.View ArticleGoogle Scholar
- Hofbauer C, Lohninger H, Aszodi A: SURFCOMP: a novel graph-based approach to molecular surface comparison. J Chem Inf Comput Sci. 2004, 44: 837-847.View ArticleGoogle Scholar
- Ballester PJ, Richards WG: Ultrafast shape recognition to search compound databases for similar molecular shapes. J Comput Chem. 2007, 28: 1711-1723. 10.1002/jcc.20681.View ArticleGoogle Scholar
- Hudson BD, Whitley DC, Ford MG, Swain M, Essex JW: Pattern recognition based on color-coded quantum mechanical surfaces for molecular alignment. J Mol Model. 2008, 14: 49-57. 10.1007/s00894-007-0251-2.View ArticleGoogle Scholar
- Tervo AJ, Ronkko T, Nyronen TH, Poso A: BRUTUS: optimization of a grid-based similarity function for rigid-body molecular superposition. 1. Alignment and virtual screening applications. J Med Chem. 2005, 48: 4076-4086. 10.1021/jm049123a.View ArticleGoogle Scholar
- Grant JA, Pickup BT: A Gaussian description of molecular shape. J Phys Chem. 1995, 99: 3503-3510. 10.1021/j100011a016.View ArticleGoogle Scholar
- Goldman BB, Wipke WT: Quadratic shape descriptors. 1. Rapid superposition of dissimilar molecules using geometrically invariant surface descriptors. J Chem Inf Comput Sci. 2000, 40: 644-658.View ArticleGoogle Scholar
- Connolly ML: Analytical molecular surface calculation. J Appl Cryst. 1983, 16: 548-558. 10.1107/S0021889883010985.View ArticleGoogle Scholar
- Lin SL, Nussinov R, Fischer D, Wolfson HJ: Molecular surface representations by sparse critical points. Proteins. 1994, 18: 94-101. 10.1002/prot.340180111.View ArticleGoogle Scholar
- Max NL, Getzoff ED: Spherical harmonic molecular surfaces. IEEE Comput Graph Appl. 1988, 8: 42-50. 10.1109/38.7748.View ArticleGoogle Scholar
- Ritchie DW, Graham J: Fast computation, rotation, and comparison of low resolution spherical harmonic molecular surfaces. J Comp Chem. 1999, 20: 383-395. 10.1002/(SICI)1096-987X(199903)20:4<383::AID-JCC1>3.0.CO;2-M.View ArticleGoogle Scholar
- Morris RJ, Najmanovich RJ, Kahraman A, Thornton JM: Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics. 2005, 21: 2347-2355. 10.1093/bioinformatics/bti337.View ArticleGoogle Scholar
- Cai W, Xu J, Shao X, Leroux V, Beautrait A, Maigret B: SHEF: a vHTS geometrical filter using coefficients of spherical harmonic molecular surfaces. J Mol Model. 2008, 14: 393-401. 10.1007/s00894-008-0286-z.View ArticleGoogle Scholar
- Funkhouser T, Min P, Kazhdan M, Chen J, Halderman A, Dobkin D, Jacobs D: A search engine for 3D models. Acm Transactions on Graphics. 2003, 22: 83-105. 10.1145/588272.588279.View ArticleGoogle Scholar
- Venkatraman V, Sael L, Kihara D: Potential for protein surface shape analysis using spherical harmonics and 3D Zernike descriptors. Cell Biochem Biophys. 2009, 54: 23-32. 10.1007/s12013-009-9051-x.View ArticleGoogle Scholar
- Zauhar RJ, Moyna G, Tian L, Li Z, Welsh WJ: Shape signatures: a new approach to computer-aided ligand- and receptor-based drug design. J Med Chem. 2003, 46: 5674-5690. 10.1021/jm030242k.View ArticleGoogle Scholar
- Novotni M, Klein R: 3D Zernike descriptors for content based shape retrieval. ACM Symposium on Solid and Physical Modeling, Proceedings of the eighth ACM symposium on Solid modeling and applications. 2003, 216-225. full_text.View ArticleGoogle Scholar
- Kihara D, Sael L, Chikhi R: Local surface shape-based protein function prediction using Zernike descriptors. Biophys J. 2009, 96: 650a-10.1016/j.bpj.2008.12.3435.View ArticleGoogle Scholar
- Sael L, Li B, La D, Fang Y, Ramani K, Rustamov R, Kihara D: Fast protein tertiary structure retrieval based on global surface shape similarity. Proteins. 2008, 72: 1259-1273. 10.1002/prot.22030.View ArticleGoogle Scholar
- Sael L, Kihara D: Protein surface representation and comparison: New approaches in structural proteomics. Biological Data Mining. Edited by: Chen J, Lonardi S. 2009, Boca Raton, Florida, USA: Chapman & Hall/CRC Press, 89-109.Google Scholar
- Sael L, La D, Li B, Rustamov R, Kihara D: Rapid comparison of properties on protein surface. Proteins. 2008, 73: 1-10. 10.1002/prot.22141.View ArticleGoogle Scholar
- Mak L, Grandison S, Morris RJ: An extension of spherical harmonics to region-based rotationally invariant descriptors for molecular shape description and comparison. J Mol Graph Model. 2007, 26: 1035-1045. 10.1016/j.jmgm.2007.08.009.View ArticleGoogle Scholar
- Hattori M, Okuno Y, Goto S, Kanehisa M: Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc. 2003, 125: 11853-11865. 10.1021/ja036030u.View ArticleGoogle Scholar
- Takane SY, Mitchell JB: A structure-odour relationship study using EVA descriptors and hierarchical clustering. Org Biomol Chem. 2004, 2: 3250-3255. 10.1039/b409802a.View ArticleGoogle Scholar
- Tripos Inc: SYBYL 6.8. South Hanley Road, St. Louis, Missouri, 63114, USA. 2009, [http://tripos.com/]Google Scholar
- Huang N, Shoichet BK, Irwin JJ: Benchmarking sets for molecular docking. J Med Chem. 2006, 49: 6789-6801. 10.1021/jm0608356.View ArticleGoogle Scholar
- Jahn A, Hinselmann G, Fechner N, Zell A: Optimal assignment methods for ligand-based virtual screening. J Cheminformatics. 2009, 1: 14-10.1186/1758-2946-1-14.View ArticleGoogle Scholar
- von GM, Koczyk G, Pas J, Wyrwicz LS, Rychlewski L: Ligand.Info small-molecule Meta-Database. Comb Chem High Throughput Screen. 2004, 7: 757-761. 10.2174/1386207043328265.View ArticleGoogle Scholar
- Canterakis N: 3D Zernike moments and Zernike affine invariants for 3D image analysis and recognition. Proc 11th Scandinavian Conference on Image Analysis. 1999, 85-93.Google Scholar
- Joanes DN, Gill CA: Comparing measures of sample skewness and kurtosis. The Statistician. 1998, 47: 183-189.Google Scholar
- Raymond JW, Willett P: Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J Comput Aided Mol Des. 2002, 16: 521-533. 10.1023/A:1021271615909.View ArticleGoogle Scholar
- Bron C, Kerbosch J: Algorithm 457: finding all cliques of an undirected graph. Commun ACM. 1973, 16: 575-577. 10.1145/362342.362367.View ArticleGoogle Scholar
- Irwin JJ, Shoichet BK: ZINC--a free database of commercially available compounds for virtual screening. J Chem Inf Model. 2005, 45: 177-182. 10.1021/ci049714+.View ArticleGoogle Scholar
- Kinnings SL, Jackson RM: LigMatch: a multiple structure-based ligand matching method for 3D virtual screening. J Chem Inf Model. 2009, 49: 2056-2066. 10.1021/ci900204y.View ArticleGoogle Scholar
- Cheeseright TJ, Mackey MD, Melville JL, Vinter JG: FieldScreen: virtual screening using molecular fields. Application to the DUD data set. J Chem Inf Model. 2008, 48: 2108-2117. 10.1021/ci800110p.View ArticleGoogle Scholar
- Good AC, Oprea TI: Optimization of CAMD techniques 3. Virtual screening enrichment studies: a help or hindrance in tool selection?. J Comput Aided Mol Des. 2008, 22: 169-178. 10.1007/s10822-007-9167-2.View ArticleGoogle Scholar
- Schrodinger LLC: MacroModel (ver. 9.6). New York, NY. 2008, [http://www.schrodinger.com/ProductDescription.php?mID=6&sID=8&cID=0]Google Scholar
- Weislow OS, Kiser R, Fine DL, Bader J, Shoemaker RH, Boyd MR: New soluble-formazan assay for HIV-1 cytopathic effects: application to high-flux screening of synthetic and natural products for AIDS-antiviral activity. J Natl Cancer Inst. 1989, 81: 577-586. 10.1093/jnci/81.8.577.View ArticleGoogle Scholar
- Molecular Networks: CORINA. Erlangen, Germany. 2009, [http://www.molecular-networks.com/products/corina]Google Scholar
- von Grotthuss M, Pas J, Rychlewski L: Ligand-Info, searching for similar small compounds using index profiles. Bioinformatics. 2003, 19: 1041-1042. 10.1093/bioinformatics/btg117.View ArticleGoogle Scholar
- Ward J: Hierarchical grouping to optimize an objective function. J Am Stat Ass. 1963, 58: 236-244. 10.2307/2282967.View ArticleGoogle Scholar
- Hubert L, Arabie P: Comparing Partitions. Journal of Classification. 1985, 2: 193-218. 10.1007/BF01908075.View ArticleGoogle Scholar
- Bender A, Glen RC: A discussion of measures of enrichment in virtual screening: comparing the information content of descriptors with increasing levels of sophistication. J Chem Inf Model. 2005, 45: 1369-1375. 10.1021/ci0500177.View ArticleGoogle Scholar
- Truchon JF, Bayly CI: Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model. 2007, 47: 488-508. 10.1021/ci600426e.View ArticleGoogle Scholar
- Fawcett T: An introduction to ROC analysis. Pattern Recognition Letters. 2006, 27: 861-874. 10.1016/j.patrec.2005.10.010.View ArticleGoogle Scholar
- Mackey MD, Melville JL: Better than random? The chemotype enrichment problem. J Chem Inf Model. 2009, 49: 1154-1162. 10.1021/ci8003978.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.