Database | Open | Published:
Web-based 3D-visualization of the DrugBank chemical space
Journal of Cheminformaticsvolume 8, Article number: 25 (2016)
Similarly to the periodic table for elements, chemical space offers an organizing principle for representing the diversity of organic molecules, usually in the form of multi-dimensional property spaces that are subjected to dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection. Unfortunately, tools to look at chemical space on the internet are currently very limited.
Herein we present webDrugCS, a web application freely available at www.gdb.unibe.ch to visualize DrugBank (www.drugbank.ca, containing over 6000 investigational and approved drugs) in five different property spaces. WebDrugCS displays 3D-clouds of color-coded grid points representing molecules, whose structural formula is displayed on mouse over with an option to link to the corresponding molecule page at the DrugBank website. The 3D-clouds are obtained by principal component analysis of high dimensional property spaces describing constitution and topology (42D molecular quantum numbers MQN), structural features (34D SMILES fingerprint SMIfp), molecular shape (20D atom pair fingerprint APfp), pharmacophores (55D atom category extended atom pair fingerprint Xfp) and substructures (1024D binary substructure fingerprint Sfp). User defined molecules can be uploaded as SMILES lists and displayed together with DrugBank. In contrast to 2D-maps where many compounds fold onto each other, these 3D-spaces have a comparable resolution to their parent high-dimensional chemical space.
To the best of our knowledge webDrugCS is the first publicly available web tool for interactive visualization and exploration of the DrugBank chemical space in 3D. WebDrugCS works on computers, tablets and phones, and facilitates the visual exploration of DrugBank to rapidly learn about the structural diversity of small molecule drugs.
One of the defining features of organic chemistry is the extremely large diversity of possible molecules. The concept of chemical space, whereby molecules are annotated with a set of quantitative molecular properties and placed in a high-dimensional property space with each dimension corresponding to a different property, offers a practical approach to represent the structural diversity of large molecule collections [1–28]. Such high-dimensional spaces cannot be visualized directly but can be subjected to various dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection [29–32].
To make chemical space easier to inspect, we recently reported an interactive Java Applet representing databases of molecules as color-coded maps produced by projection of high-dimensional property spaces, defined by various molecular fingerprints, into two dimensions [32–37]. In these so-called Mapplets the computer screen shows a color-coded 2D-image where each pixel contains one or several molecules projected at that point. The average molecule contained in each pixel is displayed on a side-window on mouse over, with an option to open the complete list of molecules in the pixel in a secondary window, and subsequently to link selected molecules to the database entry, or to perform similarity searches in the parent high-dimensional property space. These Mapplets unfortunately suffer from the typical folding effects encountered when projecting high-dimensional property spaces into 2D [2, 6, 9, 28, 30, 32], which results in (a) many pixels containing molecules piled-up on top of each other, and (b) a poor correlation between distances on the 2D-map and distances in the original high-dimensional property space. In addition the Java Applets must be downloaded and run separately and are not platform independent.
Results and discussion
PCA of multidimensional property spaces
In a multidimensional property space dimensions and the position (coordinates) of any molecule are defined by a set of molecular descriptors. PCA is performed as a dimensionality reduction method to obtain 3D- or 2D-representations. In these projections the position of any molecule is defined by its coordinates in the first three respectively two principal components (PCs). Here PCA is used to project DrugBank from each of the five property spaces defined by the fingerprints MQN, SMIfp, APfp, Xfp and Sfp onto the corresponding 3D-space or 2D-map. The cumulative coverage of data variance within the first 3 PCs is larger than 75 % in the case of the fingerprints MQN, SMIfp and APfp, which are relatively simple descriptions of the molecules resulting in a relatively low number of dimensions (Fig. 1a). In these cases a very good correlation is observed between distances in the original high-dimensional property space and the 3D-projection (Fig. 1b). The situation is less optimal for the more complex and higher dimensional fingerprints Xfp and Sfp, where only 42 % respectively 20 % of data variance is covered within the first three PCs. Nevertheless the correlation between distances in the original property space and the 3D-space resulting from PCA is still acceptable (Xfp: 0.8, Sfp: 0.6), implying that these 3D-spaces still contain relevant information about the position of molecules in the original high-dimensional Xfp and Sfp spaces. In particular nearest neighbours in each of the 3D spaces are for the most part closely related molecules in the corresponding high-dimensional property space.
One of the remarkable aspects of the 3D-spaces concerns the resolution of compounds into individual 3D-grid positions after assigning molecules to a 3D-grid point in a 300 × 300 × 300 box covering the range of (PC1, PC2, PC3) values. In the original multidimensional property spaces an excellent resolution is obtained for DrugBank in the sense that almost all DrugBank molecules are encoded by a unique fingerprint bit value combination. This resolution is largely preserved upon PCA and assignment to the 3D-grid, as can be judged by the fact that the percentage of molecules appearing in singly occupied 3D-grid points is comparable to the percentage of molecule having a single fingerprint bit-value combination. The 3D-space is clearly superior in that matter to the 2D-map, where compounds are assigned to 2D-pixels in a 300 × 300 square covering the range of (PC1, PC2) values. In this case a significant folding occurs and only 40–60 % of the compounds appear in single occupied 2D-pixels (Fig. 1c).
As an additional noticeable feature the 3D-representations of the various property spaces represent DrugBank an intuitively logical spatial organization which can be visualized by color-coding each grid-point with a selected property value. As illustrated by screen-shots taken from the web application webDrugCS (details discussed below), striking features include for example parallel stripes grouping compounds of increasing ring count in the MQN 3D-space (Fig. 2a), the separation of molecules according to their number of aromatic carbon atoms in the SMIfp 3D-space (Fig. 2b) and according to their rotatable bond count in the APfp 3D-space (Fig. 2c), and the global separation of the Sfp 3D-space according to the fraction of aromatic atoms (Fig. 2d).
WebDrugCS (www.gdb.unibe.ch) is an online application for interactive visualization and exploration of DrugBank in color coded 3D property spaces. The application works on computers, tablets and phones. The starting page of webDrugCS (Fig. 3a) provides two options (1) Selection of molecular fingerprint: Choose between MQN, SMIfp, APfp, Xfp and Sfp fingerprint 3D-spaces by clicking the corresponding field, which opens a new browser tab. (2) External chemical library: The user can input up to 1000 additional molecules in SMILES format, which will be displayed together with DrugBank in any of the selected 3D-spaces. Each of the lines in the text box must represent an individual molecule as SMILES followed by a space and its name or tag. External molecules are viewed by default as dark violet colored grid points.
The graphical user interface (GUI) of the interactive visualization window is exemplified here with the MQN 3D-space. The GUI consists of a main panel, a molecule view panel, and a control panel. The main panel occupies the entire screen area and displays the 3D-space (Fig. 3b). Each point in the 3D-space is represented as sphere, whose size depends on its distance to the camera. The view angle rotates by dragging the mouse upon left click, and the wheel controls the zoom in/out function.
The view panel is positioned at upper left and shows the structural formula and DrugBank ID of the molecule at the current mouse-over 3D-grid point. Upon selecting a grid point by double click, one can then link to the molecule page at the DrugBank webpage by clicking on the DrugBank ID displayed below the structural formula (Fig. 3c), or access a similarity browser to search for nearest neighbours in the original high-dimensional fingerprint space via the control panel (Fig. 3d/e).
The control panel at top right lists options to change the 3D-space view. Lines 1–3: select a color code according to a descriptor, or a single color code for DrugBank and the uploaded molecule list. Line 4: display the reference 3D-axes. Line 5: hide the DrugBank grid points, leaving only the molecules uploaded by the user as visible points. Line 6: change the 3D-grid point sphere size. Line 7: set the currently selected 3D-grid point as reference pivot point for the 3D-space (after selecting a grid point by double click). Line 8: Reset the view to the default entry view. Line 9: Link to the fingerprint similarity browser, which opens as an additional tab. This browser allows one to perform nearest neighbour searches in DrugBank in any of the five original high-dimensional fingerprint spaces. The browser is built in the same manner as our recently reported ChEMBL similarity browser . Line 10: help function listing the different options.
The external chemical library option in the entry panel (Fig. 3a) is illustrated here for mapping 24 drugs from DrugBank annotated in ChEMBL as β1-adrenergic receptor antagonists. These typical drugs contain a short aliphatic amine or aminoalcohol connected to a mono- or bicyclic aromatic nucleus. Due to their comparable overall composition, molecular shape, pharmacophore and substructural elements these 20 drugs form a relatively tight group in each of the five property spaces in webDrugCS (Fig. 4). In general series of structurally related molecules appear grouped in the various 3D-spaces available with webDrugCS. Note that the option “hideDB” in the control panel allows one to remove the drugbank compounds, which leaves only the external library as visible points.
webDrugCS represents the first online application for visualizing DrugBank in five different 3D property spaces on computers, tablets or phones. In contrast to the other database exploration tools, webDrugCS can be used for curiosity driven exploration independently of specific queries, and is particularly suitable to rapidly gain an overview of the structures of drug molecules. While the present web-based application is currently limited to displaying of a few thousand points, the method might be applicable to displaying larger databases of millions of molecules if significant coding progress can be made.
Databases The DrugBank database was downloaded in SDF format from http://www.drugbank.ca/. Molecules were processed by checking for valency error, removing counter ions and adjusting their ionization state to pH 7.4, using an in-house built java program utilizing Java Chemistry library (JChem) from ChemAxon, Pvt. Ltd., as a starting point. Duplicates and molecules larger than 50 heavy atoms were removed from the database.
Fingerprints Calculation of MQN, SMIfp, APfp and Xfp fingerprints are discussed in detail in the respective publications from our group. Fingerprints were calculated as described previously using plugins provided in JChem chemistry library.
Principal component analysis
The PCA for each database was performed using an in house written Java program utilizing some of the available mathematical functions from JSci (A science API for Java: http://jsci.sourceforge.net/). The Java source code is based on the tutorial of Lindsay I. Smith (http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf).
3D-space and color coding
The PC-1, PC-2 and PC-3 values were calculated for each molecule in the database. The largest (PCmax) and smallest (PCmin) PC values appearing in the PC-1 or PC-2 or PC-3 values were used to define the value range ΔPC = PCmax − PCmin and set the binning scale as ΔPC/300. The PC-1, PC-2 and PC-3 values were binned onto 300 × 300 × 300 3D-grids using the same absolute bin size on the PC-1, PC-2 and PC-3 axis. Each molecule was assigned to a point on this 3D-grid. The Hue–Saturation–Lightness (HSL) color space was used for color coding, setting the hue value according to the average value of the selected molecular property across all molecules residing at that grid point, and the saturation according to the standard deviation of that value across all molecules within ±5 grid points in each direction. As a result the color change blue–cyan–green–yellow–red–magenta shows an increasing average value of property in a grid point, and saturation to grey indicates a strong gradient of the value in the vicinity.
Pearlman RS, Smith KM (1998) Novel software tools for chemical diversity. Persp Drug Discov Des 9–11:339–353
Oprea TI, Gottfries J (2001) Chemography: the art of navigating in chemical space. J Comb Chem 3:157–166
Takahashi Y, Konji M, Fujishima S (2003) MolSpace: a computer desktop tool for visualization of massive molecular data. J Mol Graph Model 21:333–339
Haggarty SJ, Clemons PF, Wong JC, Wong JF, Schreiber SL (2004) Mapping chemical space using molecular descriptors and chemical genetics: deacetylase inhibitors. Comb Chem High Throughput Screen 7:669–676
Eckert H, Bajorath J (2007) Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today 12:225–233
Medina-Franco JL, Maggiora GM, Giulianotti MA, Pinilla C, Houghten RA (2007) A Similarity-based data-fusion approach to the visual characterization and comparison of compound databases. Chem Biol Drug Des 70:393–412
Medina-Franco JL, Martinez-Mayorga K, Giulianotti MA, Houghten RA, Pinilla C (2008) Visualization of the chemical space in drug discovery. Curr Comput-Aided Drug Des 4:322–333
Medina-Franco JL, Martinez-Mayorga K, Bender A, Marin RM, Giulianotti MA, Pinilla C, Houghten RA (2009) Characterization of activity landscapes using 2D and 3D similarity methods: consensus activity cliffs. J Chem Inf Model 49:477–491
Rosen J, Gottfries J, Muresan S, Backlund A, Oprea TI (2009) Novel chemical space exploration via natural products. J Med Chem 52:1953–1962
Singh N, Guha R, Giulianotti MA, Pinilla C, Houghten RA, Medina-Franco JL (2009) Chemoinformatic analysis of combinatorial libraries, drugs, natural products, and molecular libraries small molecule repository. J Chem Inf Model 49:1010–1024
Ivanenkov YA, Savchuk NP, Ekins S, Balakin KV (2009) Computational mapping tools for drug discovery. Drug Discov Today 14:767–775
Akella LB, DeCaprio D (2010) Cheminformatics approaches to analyze diversity in compound screening libraries. Curr Opin Chem Biol 14:325–330
Geppert H, Vogt M, Bajorath J (2010) Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inf Model 50:205–216
Reymond JL, Van Deursen R, Blum LC, Ruddigkeit L (2010) Chemical space as a source for new drugs. MedChemComm 1:30–38
Le Guilloux V, Colliandre L, Bourg S, Guénegou G, Dubois-Chevalier J, Morin-Allory L (2011) Visual characterization and diversity quantification of chemical libraries: 1. Creation of delimited reference chemical subspaces. J Chem Inf Model 51:1762–1774
Reutlinger M, Guba W, Martin RE, Alanine AI, Hoffmann T, Klenner A, Hiss JA, Schneider P, Schneider G (2011) Neighborhood-preserving visualization of adaptive structure-activity landscapes: application to drug discovery. Angew Chem Int Ed Engl 50:11633–11636
Owen JR, Nabney IT, Medina-Franco JL, López-Vallejo F (2011) Visualization of molecular fingerprints. J Chem Inf Model 51:1552–1563
Medina-Franco JL, Yongye AB, Pérez-Villanueva J, Houghten RA, Martínez-Mayorga K (2011) Multitarget structure–activity relationships characterized by activity–difference maps and consensus similarity measure. J Chem Inf Model 51:2427–2439
Maggiora GM, Shanmugasundaram V (2011) Molecular similarity measures. Methods Mol Biol (Clifton NJ) 672:39–100
Yoo J, Medina-Franco J (2011) Chemoinformatic approaches for inhibitors of DNA methyltransferases: comprehensive characterization of screening libraries. Comput Mol Biosci 1:7–16
Gutlein M, Karwath A, Kramer S: CheS-Mapper—chemical space mapping and visualization in 3D. J Cheminform. 2012; 4:Article 7. http://www.jcheminf.com/content/4/1/7. Accessed 14 July 2015
Ertl P, Rohde B: The molecule cloud—compact visualization of large collections of molecules. J Cheminform. 2012; 4:Article 12. http://www.jcheminf.com/content/14/11/12. Accessed 16 Dec 2012
Lachance H, Wetzel S, Kumar K, Waldmann H (2012) Charting, navigating, and populating natural product chemical space for drug discovery. J Med Chem 55:5989–6001
Medina-Franco JL, Aguayo-Ortiz R (2013) Progress in the visualization and mining of chemical and target spaces. Mol Inf 32:942–953
Hoksza D, Skoda P, Vorsilak M, Svozil D: Molpher: a software framework for systematic chemical space exploration. J Cheminform. 2014; 6:Article 7. http://www.jcheminf.com/content/6/1/7. Accessed 14 July 2015
Miyao T, Reker D, Schneider P, Funatsu K, Schneider G (2015) Chemography of natural product space. Planta Med. doi:10.1055/s-0034-1396322
Rodrigues T, Hauser N, Reker D, Reutlinger M, Wunderlin T, Hamon J, Koch G, Schneider G (2015) Multidimensional de novo design reveals 5-HT2B receptor-selective ligands. Angew Chem Int Ed Engl 54:1551–1555
Sander T, Freyss J, von Korff M, Rufener C (2015) Datawarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model 55:460–473
Digles D, Ecker GF (2011) Self-organizing maps for in silico screening and data visualization. Mol Inf 30:838–846
Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2014) Chemical data visualization and analysis with incremental generative topographic mapping: big data challenge. J Chem Inf Model 55:84–94
Deng Z-L, Du C-X, Li X, Hu B, Kuang Z-K, Wang R, Feng S-Y, Zhang H-Y, Kong D-X (2013) Exploring the biologically relevant chemical space for drug discovery. J Chem Inf Model 53:2820–2828
Awale M, Reymond JL (2015) Similarity mapplet: interactive visualization of the directory of useful decoys and ChEMBL in high dimensional chemical spaces. J Chem Inf Model 55:1509–1516
Awale M, Reymond JL (2012) Cluster analysis of the DrugBank chemical space using molecular quantum numbers. Bioorg Med Chem 20:5372–5378
Awale M, van Deursen R, Reymond JL (2013) MQN-Mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13. J Chem Inf Model 53:509–518
Schwartz J, Awale M, Reymond JL (2013) SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules. J Chem Inf Model 53:1979–1989
Ruddigkeit L, Awale M, Reymond JL (2014) Expanding the fragrance chemical space for virtual screening. J Cheminform 6:27–39
Reymond JL (2015) The chemical space project. Acc Chem Res 48:722–730
Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V et al (2011) DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res 39:D1035–D1041
Jin X, Awale M, Zasso M, Kostro D, Patiny L, Reymond JL (2015) PDB-Explorer: a web-based interactive map of the protein data bank in shape space. BMC Bioinformatics 16:339
Gutlein M, Karwath A, Kramer S (2012) CheS-Mapper—chemical space mapping and visualization in 3D. J Cheminf 4:7
Wetzel S, Klein K, Renner S, Rauh D, Oprea TI, Mutzel P, Waldmann H (2009) Interactive exploration of chemical space with Scaffold Hunter. Nat Chem Biol 5:581–583
Ertl P, Rohde B (2012) The molecule cloud—compact visualization of large collections of molecules. J Cheminf 4:12
Hoksza D, Skoda P, Vorsilak M, Svozil D (2014) Molpher: a software framework for systematic chemical space exploration. J Cheminf 6:7
Hilbig M, Rarey M (2015) MONA 2: a light cheminformatics platform for interactive compound library processing. J Chem Inf Model 55:2071–2078
Lewis R, Guha R, Korcsmaros T, Bender A (2015) Synergy Maps: exploring compound combinations using network-based visualization. J Cheminf 7:36
Korb O, Kuhn B, Hert J, Taylor N, Cole J, Groom C, Stahl M (2016) Interactive and versatile navigation of structural databases. J Med Chem. doi:10.1021/acs.jmedchem.5b01756
Lewell XQ, Jones AC, Bruce CL, Harper G, Jones MM, McLay IM, Bradshaw J (2003) Drug rings database with web interface. A tool for identifying alternative chemical rings in lead discovery programs. J Med Chem 46:3257–3274
Goede A, Dunkel M, Mester N, Frommel C, Preissner R (2005) SuperDrug: a conformational drug database. Bioinformatics 21:1751–1753
Nickel J, Gohlke B-O, Erehman J, Banerjee P, Rong WW, Goede A, Dunkel M, Preissner R (2014) SuperPred: update on drug classification and target prediction. Nucleic Acids Res 42:W26–W31
Cobanoglu MC, Oltvai ZN, Taylor DL, Bahar I (2015) BalestraWeb: efficient online evaluation of drug–target interactions. Bioinformatics 31:131–133
Nguyen KT, Blum LC, van Deursen R, Reymond J-L (2009) Classification of organic molecules by molecular quantum numbers. ChemMedChem 4:1803–1805
van Deursen R, Blum LC, Reymond JL (2010) A searchable map of PubChem. J Chem Inf Model 50:1924–1934
Awale M, Reymond JL (2014) Atom pair 2D-fingerprints perceive 3D-molecular shape and pharmacophores for very fast virtual screening of ZINC and GDB-17. J Chem Inf Model 54:1892–1897
Hagadone TR (1992) Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases. J Chem Inf Comput Sci 32:515–521
MA designed and realized webDrugCS and wrote the paper. J-LR co-designed and supervised the project and wrote the paper. Both authors read and approved the final manuscript.
This work was supported financially by the University of Berne, the Swiss National Science Foundation and the NCCR TransCure.
The authors declare that they have no competing interests.