- Research article
- Open Access
Structural diversity of biologically interesting datasets: a scaffold analysis approach
© Khanna and Ranganathan; licensee Chemistry Central Ltd. 2011
- Received: 1 February 2011
- Accepted: 8 August 2011
- Published: 8 August 2011
The recent public availability of the human metabolome and natural product datasets has revitalized "metabolite-likeness" and "natural product-likeness" as a drug design concept to design lead libraries targeting specific pathways. Many reports have analyzed the physicochemical property space of biologically important datasets, with only a few comprehensively characterizing the scaffold diversity in public datasets of biological interest. With large collections of high quality public data currently available, we carried out a comparative analysis of current day leads with other biologically relevant datasets.
In this study, we note a two-fold enrichment of metabolite scaffolds in drug dataset (42%) as compared to currently used lead libraries (23%). We also note that only a small percentage (5%) of natural product scaffolds space is shared by the lead dataset. We have identified specific scaffolds that are present in metabolites and natural products, with close counterparts in the drugs, but are missing in the lead dataset. To determine the distribution of compounds in physicochemical property space we analyzed the molecular polar surface area, the molecular solubility, the number of rings and the number of rotatable bonds in addition to four well-known Lipinski properties. Here, we note that, with only few exceptions, most of the drugs follow Lipinski's rule. The average values of the molecular polar surface area and the molecular solubility in metabolites is the highest while the number of rings is the lowest. In addition, we note that natural products contain the maximum number of rings and the rotatable bonds than any other dataset under consideration.
Currently used lead libraries make little use of the metabolites and natural products scaffold space. We believe that metabolites and natural products are recognized by at least one protein in the biosphere therefore, sampling the fragment and scaffold space of these compounds, along with the knowledge of distribution in physicochemical property space, can result in better lead libraries. Hence, we recommend the greater use of metabolites and natural products while designing lead libraries. Nevertheless, metabolites have a limited distribution in chemical space that limits the usage of metabolites in library design.
- Chemical Space
- Tanimoto Coefficient
- Pipeline Pilot
- Scaffold System
- ChEMBL Database
An established idea of similarity-based virtual screening is that similar structures tend to have similar properties . Diversifying the compound library collection for in silico and in vitro high-throughput screening without compromising biological activity remains an active research area. Chemical space is enormous but mostly biologically insignificant  and therefore, uninteresting from a drug design perspective. Given the large number of currently available chemical compounds in one of the largest public databases, PubChem , it is impossible and irrational to screen all known compounds for potential ligands. One key methodology, fragment-based virtual screening (FBVS) or fragment-based drug discovery (FBDD), is an emerging area to identify novel, small molecules for preclinical studies. In FBDD, the starting points are small low molecular weight, drug-like fragments. Examples of such fragments are ring systems, functional groups, side chains, linkers and fingerprints.
Over the past decade, substructures contributing to drug-like or lead-like properties have governed library design . In one of the pioneering works to understand the distribution of common fragments in drugs, Bemis and Murcko  fragmented a drug dataset (taken from the Comprehensive Medicinal Chemistry database) into rings, linkers, frameworks and side chains. Using two-dimensional topological graph-based molecular descriptors, they found 2506 different frameworks for a set of 5120 drug compounds, with the top 32 accounting for the topologies of 50% of the database compounds. They concluded a skewed distribution of molecular frameworks in drugs. Metabolite-likeness is increasingly being used as filter to design lead libraries similar to metabolites with better absorption, distribution, metabolism, elimination and toxicology (ADMET) properties . Many recent studies have compared chemical space occupied by compounds of pharmaceutical interest [7–12]. Grabowski and Schneider  studied the molecular properties and chemotype diversity of drugs, pure natural products (NPs), and natural product derived compounds. Following the approach described by Bemis and Murcko , they virtually dissected the molecules into frameworks, corresponding to scaffolds and side-chains. The drug dataset was ranked most structurally diverse, followed by marine and plant derived NPs, respectively. However, in contrast to the observation of Bemis and Murcko, that only 32 frameworks form the basis of nearly 50% of the compounds in CMC drug database, they found that 160 graph-based frameworks are needed to explain the chemotype of 50% of the compounds in the Collection of Bioactive Reference Analogues (COBRA) dataset  which contains drug-like reference molecules for ligand-based library design. In the same year, Siegel and Vieth  examined a set of 1386 marketed drugs and found that 15% of the drugs are embedded within other larger drugs, differing by one or more chemical fragments while 30% of drugs contain other drugs as building blocks. Recently, Franco et al.  analyzed scaffold diversity of 16 datasets of active compounds, targeting five protein classes, using an entropy-based information metric. They found that compounds targeted to the vascular endothelial growth factor receptor kinase, followed by compounds targeted to HIV reverse transcriptase and phosphodiesterase V, are maximally diverse. On the other hand, molecules in the glucocorticoid receptor, neuraminidase and glycogen phosphorylase β datasets are least diverse. Singh et al.  employed multiple criteria to compare libraries of drugs, small molecules and NPs, in terms of physicochemical properties, molecular scaffolds and fingerprints. The degree of overlap between libraries was assessed using the R-NN curve technique and the biologically relevant chemical space occupied by various compound datasets delineated. Hert et al.  compared a comprehensive dataset of 26 million compounds (i.e. a representative sample of the full chemical space) with 25810 purchasable screening compounds, metabolites, and natural product dataset. They found that almost 1300 ring systems present in NPs are missing in current day screening or lead libraries and suggest introducing bias in screening libraries towards molecules that are likely to bind protein targets. Khanna and Ranganathan  compared current day drugs with toxics and metabolites and found that drugs are more similar to toxics than to metabolites in physicochemical property space distribution.
As discussed above, there are many studies analyzing the scaffolds and physicochemical properties of the various chemical datasets. However, none of the studies contains a comprehensive comparison of the compounds obtained from publically available datasets of human metabolites, toxics, drugs, natural products and currently used lead libraries. In addition, we believe that inclusion of the experimental compounds from National Cancer Institute open database and the recently released ChEMBL database would enhance our analysis and prove useful in recognizing fragments in biologically interesting compounds.
In this study, we aim to answer questions such as 1) What is the physicochemical property space distribution of compounds for the datasets under comparison? 2) Are there any pharmaceutically relevant scaffolds or fragments present in metabolites and natural products that are missing in current lead libraries? 3) Are there any preferred or frequently occurring fragments and scaffolds in these datasets? 4) What is the percentage similarity of the scaffolds and fragments found in drugs to those found in other datasets?
We found patterns of commonly occurring fragments using extended connectivity functional class fingerprint (FCFP_4; details in Methods section). FCFP is a variant of extended connectivity atom type (ECFP) fingerprint, differing from the latter in the assignment of initial code . The highly specific initial atoms types in ECFP fingerprints are replaced with more general atom types, with functional meaning in the FCFP fingerprints. For example, a single initial code is assigned for all halogen atoms in the FCFP fingerprints as they can often substitute each other functionally. In accord with their definition, ECFP fingerprints are a better choice to measure diversity. Therefore, we used ECFP fingerprints for diversity analysis while the more generic FCFP fingerprints were selected for Tanimoto analyses.
Five different types of pharmaceutically relevant public molecular datasets were selected for this study: drugs, human metabolites, toxics, natural products and a sample of currently used lead compounds. Furthermore, we have also considered two popular small molecule databases viz. National Cancer Institute (NCI) database and ChEMBL database (details in the Methods section). Our results are presented in three sections, viz. preliminary analysis (measuring diversity and Tanimoto similarity), calculating physicochemical properties and scaffold analysis.
After carefully pruning and filtering the datasets, all the datasets were clustered (see Methods section) to avoid biased results due to overrepresentation of similar molecules.
1. Preliminary analysis
1.1 Diversity analysis
1.2 Tanimoto analysis
We extend this concept to compare different datasets used in this study. To calculate how similar two datasets are, we first calculated the Scitegic Pipeline Pilot connectivity fingerprints, FCFP_4 (details in the Methods section) for all the datasets. Subsequently, the sum of squares of the frequency of fingerprint features was calculated over the n elements for each dataset. Finally, the common features present in both datasets were counted and their frequencies multiplied, to determine T nb .
Tanimoto similarity values using circular connectivity fingerprint descriptors for different datasets under study.
2. Physicochemical property analysis
2.1 Lipinski's properties for "rule of five" (Ro5) compliance
Comparision of the number of molecules failing Lipinski's "rule of five" (Ro5) in clustered and randomly selected datasets.
Total no. of molecules
(in clustered dataset)
% of molecules failing Ro5 in clustered datasets
% of molecules failing Ro5 in randomly selected subset
2.2 Lipinski's properties as boxplots
2.3 Other physicochemical properties
3. Scaffold or cyclic system analysis
Scaffold analysis of various clustered datasets under study.
Occurrence of scaffolds (% relative to dataset size)
No. of singletons (% relative to number of scaffolds)
Aromatic scaffolds (% relative to number of scaffolds)
The drug dataset generates the largest proportion of non-redundant scaffolds (50.0%) relative to the dataset size, followed by the toxics (42%), ChEMBL (33.4%), leads (32%) and NCI dataset (28%). Exceptionally low number of scaffolds in metabolites (14.3%) and natural products (21.2%) suggest lower scaffold diversity in these datasets. The higher scaffold diversity in drugs could be attributed to the fact that drugs are derived from various biologically relevant compounds. The drug scaffold diversity is probably also due to the patenting requirements, to position functionality in the same way as an existing drug but outside of its patent space, that is often achieved by a minor change in the scaffold. Similarly, a large number of scaffolds in the toxic compound set is indicative of the high diversity of compounds with toxicity potential. Further, we note that distribution of scaffolds in all the datasets in highly skewed with large number of them occurring only once (singletons). In fact, almost 70% of the scaffolds in drugs, toxics, NCI and ChEMBL dataset occur only once. We also found that natural products comprise maximum number of recurring scaffolds (100 - % of singletons = 64%) followed by metabolites (38.9%) and leads (35.7%) suggesting that the compounds in these datasets revolve around certain preferred types of scaffolds. Our results agree with the recent study using similar natural product and drug dataset . In their study, authors found high scaffold diversity in drugs (39.7%) while low diversity in natural products (17.9%) which is in accordance with our results. By counting the number of aromatic rings in non-redundant scaffolds, we note that metabolites contain least number of aromatic rings (only 47.3% contain one or more aromatic rings in a scaffold) as compared to other datasets. 85% of the drugs on the other hand have scaffolds with aromatic rings. Furthermore, we note that 97.4% of the scaffolds found in lead dataset contain aromatic rings. There seems to be a bias towards aromatic ring containing scaffolds in presently used lead libraries.
Scaffolds shared between pairs of clustered datasets.
(6%; D: 7%,
(7.5%; D: 10%, T: 21%)
(2.4%; D: 19%,
(1.4%; D: 17%, L: 1%)
(2%; D: 45%,
(1.0%; D: 72%, C: 1%)
(6.3%; M: 24%, T: 8%)
(1.1%; M: 47%,
(0.3%; M: 23%, L: 0.3%)
(0.5%; M: 78%,
(0.2%, M: 73%,
(1.3%; T: 19%,
(0.7%; T: 16%, L: 1%)
(1.2%, T: 59%,
(0.4%, T: 59%, C: 0.4%)
(2.1%; P: 5%,
(3.1%; P: 13%,
(1.4%, P: 15%, C: 1.5%)
(4.4%; L: 13%,
(2.4%; L: 16%, C: 3%)
(5.0%; N: 17%, C: 6%)
In this study, we have carried out a detailed analysis of commonly occurring fragments in various datasets of biological interest. Dataset comparison using the Tanimoto coefficient shows that drugs and toxics share a large number of topological fragments whereas drugs are least similar to metabolites than to any other dataset studied. However, in scaffold analysis we found that drugs and metabolites share 6% of the total non-redundant scaffolds, i.e. over 42% of the metabolite scaffolds are present in drugs, whereas only 23% of the metabolite scaffolds are represented in current leads. This shows that although drugs and metabolites share many scaffolds, they largely differ in topological fragment space. Further, we conclude that current lead libraries do not cover much of metabolite scaffold space.
Library design is a multi-class optimization problem. It often presents a trade-off between several factors, including diversity and ADMET properties. Since metabolites and NPs are already optimized by millions of years of evolution to bind to at least one biological macromolecule therefore, it is highly likely that libraries designed based on the scaffolds and fragments occurring in metabolite and NP space will result in molecules with better ADMET properties. Hence, the use of metabolites and NPs while designing lead libraries would be beneficial. However, metabolites occupy a limited space in chemical universe that limits their usage in library design.
From physicochemical properties analysis, we note that there is a need to diversify present day lead libraries in order to optimize the coverage of chemical space. We also note that with the exception of few compounds, most of the drug molecules follow Lipinski's rule whereas over 68% of metabolites are outside Lipinski's universe. On a closer examination of metabolites, we found that the compounds that do not follow Lipinski rule are mainly lipids and large molecules. Further, we note that lipid-free metabolite dataset contains low molecular weight and less complex molecules as compared to other datasets. Our studies on scaffolds systems suggest that drugs are most diverse (50% scaffolds relative to the dataset size) and prefer aromatic to non-aromatic ring-containing scaffolds. Metabolites, on the other hand, have a very narrow distribution of scaffolds (only 14.3% scaffolds relative to the dataset size) of which 38.9% recur. The exceptionally low number of cyclic systems in metabolites implies lower scaffold diversity in metabolites. Further, we confirm earlier reports of skewed distribution of scaffolds, with many more singletons than recurring scaffolds.
Preparation of datasets
Databases used in this study
Number of molecules
ZINC NP database
Likewise, for the toxics dataset, compounds from various public sources were integrated to make a single dataset focusing largely on carcinogenic molecules. The Distributed Structure-Searchable Toxicity (DSSTox) Carcinogenic Potency Database  contains experimental results and carcinogenicity information for 1547 substances tested against different species. Contrera et al.  published a dataset of 282 human pharmaceuticals obtained from FDA database for carcinogenicity studies on mouse and rat. They reported 125 (44% of the above 282) of the positive chemicals that were used in this study. Toxicology Excellence for Risk Assessment (TERA) is an independent non-profit organization dedicated to the public health. Since 1996, TERA has maintained an International Toxicity Estimate for Risk database  which provides chronic human risk assessment data from organization around the world for over 650 chemicals . Finally, ~1000 molecules with medium and high toxicity were downloaded from the SuperToxic database . The dataset for NPs was obtained from the ZINC database . These molecules can be searched under the subset tab, as "Meta subsets". For lead dataset, we merged two independent screening sets obtained from BioNET  and Maybridge database . The molecules in these two databases are well diversified and we integrated them to form a dataset of lead compounds as found in pharmaceutical collections. Further, we included molecules from NCI open database . The latest September 2003 release of the database stores 260071 organic compounds tested by NCI for anticancer activity. Since many of the compounds are experimental, have not been tested for human consumption and covers high diversity therefore, we believe it would be good choice to include this dataset in our study. One other public dataset, ChEMBL  was used as the reference dataset for biologically interesting molecules. ChEMBL is a chemogenomics data resource with over 8000 targets and about 622,884 bioactive compounds.
All datasets are current as of 10-November-2010.
Cleaning and processing of the datasets
We followed a standard cleaning procedure (see additional file 1) to obtain a non-redundant dataset in each category. Finally, clustering was performed to address the issue of possible overrepresentation of the chemical space, which might bias the analysis results towards similar molecules . Clusters were generated, using the Cluster "Clara" algorithm embedded in the Pipeline Pilot (PP) software  by employing an atom type fingerprint as a chemical descriptor and Euclidean distance was the distance metric selected. Cluster centers served as the representatives for clusters containing more than one molecule while singletons were directly used as cluster centers. This resulted in 30% decreases of each dataset. Upon further analysis, we found that clustered metabolite set contains lipids in large numbers. In order to remove the bias towards lipids and large molecules, we filtered out lipids resulting in 2072 molecules in the "lipid-free" metabolite dataset, used for analysis in this study.
To simplify the analysis, we randomly selected 2000 compounds from each of the clustered datasets and lipid-free metabolite dataset in case of metabolites. The majority of the analysis was carried out using the clustered datasets and lipid-free metabolite dataset, except for preliminary analysis, where these randomly selected molecules were used and in the case of Ro5 test, where both datasets were compared.
All the descriptors were calculated using PP. Beside the four Lipinski properties: molecular weight, the number of hydrogen bond acceptors, AlogP (a hydrophobicity measure) and the number of hydrogen bond donors , other descriptors such as molecular polar surface area (MPSA), molecular solubility (MS), the number of rings (NR) and the number of rotatable bonds (NRB) were also computed. AlogP was calculated using the Ghose-Crippen method  which takes into account the group's contribution to Log P. MPSA is defined as the sum over all the polar atoms. This descriptor is correlated with drug transport capabilities and is important in penetrating the blood-brain barrier. The NRB is a direct measure of the flexibility of molecules thus related to MPSA. Binary descriptors (ECFP_4 and FCFP_4) were calculated using a structural property calculator embedded in PP. Initially, each atom is assigned a code based on its properties and connectivity. With increasing iteration, each atom code is combined with the code of its immediate neighbours to produce the next order code. This process is repeated until the desired number of iterations has been achieved, typically to four iterations, generating ECFP_4, or FCFP_4 fingerprints.
In addition to examining the physicochemical properties, each dataset was also explored for the frequent scaffold systems. We used an inbuilt PP protocol to identify the most common fragments, by setting "FragmentType" to MurckoAssemblies and adjusting "MaxFragSize" parameter at the required level.
VK is grateful to Macquarie University for the award of MQRES research scholarship.
- Patterson DE, Cramer RD, Ferguson AM, Clark RD, Weinberger LE: Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. J Med Chem. 1996, 39 (16): 3049-3059. 10.1021/jm960290n.View ArticleGoogle Scholar
- Dobson CM: Chemical space and biology. Nature. 2004, 432 (7019): 824-828. 10.1038/nature03192.View ArticleGoogle Scholar
- Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH: PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009, W623-633. 37 Web ServerGoogle Scholar
- Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 2001, 46 (1-3): 3-26. 10.1016/S0169-409X(00)00129-0.View ArticleGoogle Scholar
- Bemis GW, Murcko MA: The properties of known drugs. 1. Molecular frameworks. J Med Chem. 1996, 39 (15): 2887-2893. 10.1021/jm9602928.View ArticleGoogle Scholar
- Dobson PD, Patel Y, Kell DB: 'Metabolite-likeness' as a criterion in the design and selection of pharmaceutical drug libraries. Drug Discov Today. 2009, 14 (1-2): 31-40. 10.1016/j.drudis.2008.10.011.View ArticleGoogle Scholar
- Grabowski K, Schneider G: Properties and Architecture of Drugs and Natural Products Revisited. Curr Chem Biol. 2007, 1: 115-127. 10.2174/187231307779814066.Google Scholar
- Siegel MG, Vieth M: Drugs in other drugs: a new look at drugs as fragments. Drug Discov Today. 2007, 12 (1-2): 71-79. 10.1016/j.drudis.2006.11.011.View ArticleGoogle Scholar
- Medina-Franco JL, Martinez-Mayorgaa K, Bender A, Sciorc T: Scaffold Diversity Analysis of Compound Data Sets Using an Entropy-Based Measure. QSAR Comb Sci. 2009, 28 (11-12): 1551-1560.View ArticleGoogle Scholar
- Singh N, Guha R, Giulianotti MA, Pinilla C, Houghten RA, Medina-Franco JL: Chemoinformatic analysis of combinatorial libraries, drugs, natural products, and molecular libraries small molecule repository. J Chem Inf Model. 2009, 49 (4): 1010-1024. 10.1021/ci800426u.View ArticleGoogle Scholar
- Hert J, Irwin JJ, Laggner C, Keiser MJ, Shoichet BK: Quantifying biogenic bias in screening libraries. Nat Chem Biol. 2009, 5 (7): 479-483. 10.1038/nchembio.180.View ArticleGoogle Scholar
- Khanna V, Ranganathan S: Physicochemical property space distribution among human metabolites, drugs and toxins. BMC Bioinformatics. 2009, 10 (Suppl 15): S10-10.1186/1471-2105-10-S15-S10.View ArticleGoogle Scholar
- Schneider P, Schneider G: Collection of Bioactive Reference Compounds for Focused Library Design. QSAR & Combinatorial Science. 2003, 22 (7): 713-718. 10.1002/qsar.200330825.View ArticleGoogle Scholar
- Rogers D, Hahn M: Extended-connectivity fingerprints. J Chem Inf Model. 2010, 50 (5): 742-754. 10.1021/ci100050t.View ArticleGoogle Scholar
- Wang Y, Bajorath J: Development of a compound class-directed similarity coefficient that accounts for molecular complexity effects in fingerprint searching. J Chem Inf Model. 2009, 49 (6): 1369-1376. 10.1021/ci900108d.View ArticleGoogle Scholar
- Schuster D, Laggner C, Langer T: Why drugs fail--a study on side effects in new chemical entities. Curr Pharm Des. 2005, 11 (27): 3545-3559. 10.2174/138161205774414510.View ArticleGoogle Scholar
- Gut J, Bagatto D: Theragenomic knowledge management for individualised safety of drugs, chemicals, pollutants and dietary ingredients. Expert Opin Drug Metab Toxicol. 2005, 1 (3): 537-554. 10.1517/17425255.1.3.537.View ArticleGoogle Scholar
- Ganesan A: The impact of natural products upon modern drug discovery. Curr Opin Chem Biol. 2008, 12 (3): 306-317. 10.1016/j.cbpa.2008.03.016.View ArticleGoogle Scholar
- Oprea TI, Davis AM, Teague SJ, Leeson PD: Is there a difference between leads and drugs? A historical perspective. J Chem Inf Comput Sci. 2001, 41 (5): 1308-1315.View ArticleGoogle Scholar
- Wetzel S, Klein K, Renner S, Rauh D, Oprea TI, Mutzel P, Waldmann H: Interactive exploration of chemical space with Scaffold Hunter. Nat Chem Biol. 2009, 5 (8): 581-583.View ArticleGoogle Scholar
- Krueger BA, Dietrich A, Baringhaus KH, Schneider G: Scaffold-hopping potential of fragment-based de novo design: the chances and limits of variation. Comb Chem High Throughput Screen. 2009, 12 (4): 383-396. 10.2174/138620709788167971.View ArticleGoogle Scholar
- Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M: DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008, D901-906. 36 DatabaseGoogle Scholar
- Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, et al: KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008, D480-484. 36 DatabaseGoogle Scholar
- Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B, Hau DD, Psychogios N, Dong E, Bouatra S, et al: HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res. 2009, D603-610. 37 DatabaseGoogle Scholar
- Romero P, Wagg J, Green ML, Kaiser D, Krummenacker M, Karp PD: Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 2005, 6 (1): R2-View ArticleGoogle Scholar
- Schellenberger J, Park JO, Conrad TM, Palsson BO: BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics. 2010, 11: 213-10.1186/1471-2105-11-213.View ArticleGoogle Scholar
- Richard AM, Williams CR: Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. Mutat Res. 2002, 499 (1): 27-52.View ArticleGoogle Scholar
- Contrera JF, Jacobs AC, DeGeorge JJ: Carcinogenicity testing and the evaluation of regulatory requirements for pharmaceuticals. Regul Toxicol Pharmacol. 1997, 25 (2): 130-145. 10.1006/rtph.1997.1085.View ArticleGoogle Scholar
- International Toxicity Estimate for Risk database (TERA). [http://www.tera.org/iter]
- Wullenweber A, Kroner O, Kohrman M, Maier A, Dourson M, Rak A, Wexler P, Tomljanovic C: Resources for global risk assessment: the International Toxicity Estimates for Risk (ITER) and Risk Information Exchange (RiskIE) databases. Toxicol Appl Pharmacol. 2008, 233 (1): 45-53. 10.1016/j.taap.2007.12.035.View ArticleGoogle Scholar
- Schmidt U, Struck S, Gruening B, Hossbach J, Jaeger IS, Parol R, Lindequist U, Teuscher E, Preissner R: SuperToxic: a comprehensive database of toxic compounds. Nucleic Acids Res. 2009, D295-299. 37 DatabaseGoogle Scholar
- Irwin JJ, Shoichet BK: ZINC--a free database of commercially available compounds for virtual screening. J Chem Inf Model. 2005, 45 (1): 177-182. 10.1021/ci049714+.View ArticleGoogle Scholar
- BioNET. [http://www.keyorganics.co.uk/Downloads]
- Maybridge. [http://www.maybridge.com/default.aspx]
- National Cancer Institute (NCI). [http://cactus.nci.nih.gov/download/nci/]
- Overington J: ChEMBL. An interview with John Overington, team leader, chemogenomics at the European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory (EMBL-EBI). Interview by Wendy A. Warr. J Comput Aided Mol Des. 2009, 23 (4): 195-198. 10.1007/s10822-009-9260-9.View ArticleGoogle Scholar
- SciTegic Pipeline Pilot. [http://accelrys.com/products/scitegic/]
- Ghose AK, Crippen GM: Atomic physicochemical parameters for three-dimensional-structure-directed quantitative structure-activity relationships. 2. Modeling dispersive and hydrophobic interactions. J Chem Inf Comput Sci. 1987, 27 (1): 21-35.View ArticleGoogle Scholar
- Milne GW, Nicklaus MC, Driscoll JS, Wang S, Zaharevitz D: National Cancer Institute Drug Information System 3D database. J Chem Inf Comput Sci. 1994, 34 (5): 1219-1224.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.