Skip to main content

qHTSWaterfall: 3-dimensional visualization software for quantitative high-throughput screening (qHTS) data

Abstract

High throughput screening (HTS) is widely used in drug discovery and chemical biology to identify and characterize agents having pharmacologic properties often by evaluation of large chemical libraries. Standard HTS data can be simply plotted as an x–y graph usually represented as % activity of a compound tested at a single concentration vs compound ID, whereas quantitative HTS (qHTS) data incorporates a third axis represented by concentration. By virtue of the additional data points arising from the compound titration and the incorporation of logistic fit parameters that define the concentration–response curve, such as EC50 and Hill slope, qHTS data has been challenging to display on a single graph. Here we provide a flexible solution to the rapid plotting of complete qHTS data sets to produce a 3-axis plot we call qHTS Waterfall Plots. The software described here can be generally applied to any 3-axis dataset and is available as both an R package and an R shiny application.

Graphical Abstract

Introduction

Quantitative high-throughput screening (qHTS) was established over a decade ago as an approach to enable large-scale pharmacological analysis of chemical libraries [1]. The method, an advance over the long-standing practice of testing compound libraries at a single concentration, was made possible by developments in assay technology, instrumentation, microtiter plate designs, and both commoditization and academic interest in chemical library generation [2,3,4,5]. qHTS has been applied to enzymes, receptors, and biological processes using diverse libraries [6,7,8]. For example, the National Center for Advancing Translational Sciences (NCATS), within the National Institutes of Health (NIH), has used qHTS in various aspects of drug and chemical probe discovery, including the evaluation of natural product extracts, drug repurposing, and drug combination testing [9,10,11].

The large-scale acquisition of concentration–response curve (CRC) profiles has allowed a detailed study of the extent and structure–activity relationships (SAR) of chemotypes responsible for several confounding artifacts encountered in drug development [12,13,14]. The technique has also formed the basis of library toxicological profiling used in programs developing toxicity assessment methods [15, 16]. Furthermore, by exploring a chemical library spanning 4–5 orders of magnitude in concentration (e.g., nM to μM) relatively low potency starting points can be identified by including test concentrations far higher than previously considered [6].

In addition to establishing a nascent library-wide SAR among the chemotypes in each library for the enzyme or phenotype under study, qHTS can provide insights related to a compound’s pharmacology. For example, in the work of Kinder et al., the CRC-derived Hill slopes from the qHTS of 4500 drugs and investigational agents could be correlated with graded hyperbolic vs. ultrasensitive “switch-like” responses revealing a mechanistic basis for activity such as cooperativity or signal amplification (Fig. 1A–C) [8].

Fig. 1
figure 1

Various outputs for 3D visualization algorithm. AC Multiple graph options obtained using a single readout dataset covering 5191 compounds. A Active compounds are displayed using data points and the corresponding concentration response curve (CRC) fit, while inactive compound data responses are plotted as gray dots only. Compounds are randomly ordered in this representation. B Data and CRCs are grouped according to qHTS curve classification (CC) criteria which take into consideration the nature of the pharmacological response as described in ref. [1]. Inactive responses are not shown. For A and B, colors correspond to CC criteria ranging from a fully efficacious sigmoidal response (red curve) to partial or incomplete responses (yellow, green, and blue) described in detail in ref. [1]. C Illustration of data demonstrating the ability to rotate the view to better appreciate differences in potency. Here, white curves are a combination of the yellow, green, and blue curves represented in A and B. D Gain-of-signal (blue), loss-of-signal (red) and inactive compound (grey dots) outputs plotted from a 51,441 compound qHTS assessing the library effect on the enzymatic activity of pyruvate kinase. E Chemotypes a, c and e are associated with loss-of-signal response output, while chemotypes d and e display a gain-of-signal response as discussed in Martinez et al. [26]. Data for graphs was obtained from the following PubChem AIDs, for plots in AC: 1,347,405, 1,347,407 and 1,347,411; for plot D: 361; for plot E: 1,508,643

Nevertheless, despite the increased use of this technique, delineating qHTS data remains challenging compared to the pairwise or two-axis graph types representing standard HTS data usually plotted as % activity vs compound ID [17]. While large-scale and efficient two-dimensional analysis for qHTS screening data has been developed, there remains a lack of 3-dimensional visualization tools for such libraries [18,19,20]. In addition to providing a high-level overview of a qHTS experiment, three-dimensional graphs can allow the observation of patterns from thousands of CRCs not visible in two dimensions. For example, the output can be arranged and coded to highlight specific chemical and pharmacological properties embodied by the data, such as overall response efficacy (Fig. 1D) as depicted in waterfall plot formats [21,22,23,24] or related by structural chemotypes within the library (Fig. 1E).

While the usage of qHTS has been increasing, few software packages can process the data to create three-dimensional graphs straightforwardly for chemical libraries on the order of 10 to 100 s of thousands of members. With this in mind, we have developed an R package and associated application that creates three-dimensional graphs more efficiently than what is currently available in the market.

Implementation

Development

The 3D qHTS Waterfall Plot has been implemented in the R statistical programming language, using RStudio, and is developed as an R package to ease installation and use within developed R scripts and data analysis pipelines. The qHTSWaterfall package is also implemented as an R Shiny application so that in addition to R command line use, the application can be run through a user interface. The implementation can be installed on a user’s machine or hosted on a central Shiny Server instance as shown in Fig. 2.

Fig. 2
figure 2

qHTSWaterfall code repository and operating environments

Results/discussion

Installation and modes of operation

The qHTSWaterfall package is implemented as an R package and as an R shiny application, having a user interface. Instructions for installation of the package can be found at our GitHub site in the readme section [25] and are included in Additional file 1: Fig. S1.

Starting the application using runQHTSWaterfallApp() in R will bring up a window with the application interface in the default application window. A button at the top of the interface allows users to move the application into an internet browser window, if desired. Clicking on the button labeled Plot Our Sample Data will access an included sample data set and plot the results. Note that the mouse scroller wheel or the zoom buttons in the upper right will allow one to zoom in and out. Other controls within the upper right context menu on the view, supported by the plotly package in R, allow one to pan and rotate the waterfall plot as well as capture the plot to a png image file [26]. Figure 3 shows the view of the qHTSWaterfall application user interface with the included sample data plotted, in this case having coincident reporter readouts of firefly luciferase (FLuc) and NanoLuc luciferase (NLuc) [26]. The plot controls are intuitive to use and include options to hide or show the various readouts, set colors for readout points and curve fit data, axis formatting, line weight, point sizes, and plot aspect ratio and background colors.

Fig. 3
figure 3

qHTSWaterfall interface showing a plot of sample data, hiding inactive results. The green and blue curves are individual coincidence reporter responses

Input file format

Standard input file formats have been developed and sample files are available to plot and view within the application. These sample files can serve as templates for users’ input data. The software accepts comma-separated text files (.csv) or Microsoft Excel (.xlsx) files. The data within the files can be formatted in one of two forms. One format is specific to NCATS qHTS export format, however, most users will make use of the more generic format for general use, which is described in some detail here. A link in the upper left of the application will deliver an xlsx format sample input file with a color-coded header and notes on specific fields to include in the data. The header of this file, shown in Fig. 4, illustrates the left (A) and right (B) columns of data, respectively.

Fig. 4
figure 4

File format overview. A Top row format tags which include compound annotations in column 5 (e.g., SMILES), and concentration–response curve parameters (Log AC50, S0, Sinf, and Hill slope) in columns 6–9. B Example data columns, here an example of an 11-point titration with log base 10 transformed molar concentrations in the upper row, aligned with normalized data below

The upper left of the file, referring to Fig. 4A, has the keyword Format and the next cell to the right has the file format value. The value should typically be ‘generic_qhts’ unless working with NCATS format qHTS data in which this field will be ‘ncats_qhts’ and the format would be specific to NCATS qHTS format. The left-most column, Fit_Output, has values of 1 or 0, indicating if that compound response should be represented as a dose–response curve fit, or just by the data points that define that curve. Often users tend to only render full curves for active responses, those passing some level of curation, or responses that are of particular importance to show, such as results associated with a particular chemotype or readout type. Examples of this include a coincidence reporter response (Fig. 3) where the two orthogonal reporter responses (FLuc or NLuc) are shown in green or blue, respectively [27]. Another example would be a gain-of-signal vs loss-of-signal as shown in Fig. 1D. Note that the order of compounds and associated response data will be preserved in the generated plot. This means that users can group compounds and responses based on activity criteria, readout type, chemical structure, or any other user-defined criteria (e.g., Fig. 1A vs B). The Comp_ID column holds a user-supplied compound ID. Note that these IDs need not be unique, each compound can have multiple responses according to the specific Readout being reported on a particular row. The Readout column contains a descriptive name indicating the kind of response that is being reported on. In some assays, as shown, each compound may have different kinds of readout or even assay types. The readout column allows a compound to be represented more than once, to report on other measures of compound activity. In the sample file, a coincidence reporter assay reports on FLuc and NLuc outputs for each compound. Note here that within the application or R package, different Readout types can be shown or hidden, and point and line colors can be customized.

The curve fit parameter columns consist of those shown in light blue on the second row of Fig. 4A, labeled Log_AC50_M, S_0, S_Inf, and Hill_Slope. These are the standard curve fit parameters associated with a four-parameter concentration–response curve fit against the Hill Equation. Please see Additional file 2: Fig. S2 for the Hill Equation and explanation of the 4 associated parameters shown here.

The titration concentrations are captured in the input file, in the first row, just above the response values. Note that in Fig. 4A upper right, we have a data tag, Log_Conc_M, to indicate the first column prior to the set of concentrations to read for data display. In Fig. 4B, we show the primary data columns of the file. Each column has a specific log base 10 transformed molar concentration value and below that a data header labeled from Data0 to Data10, in this case. The input file can have any number of data columns but should use this naming convention to label starting at Data0.

Note that the 3D waterfall plot is constructed based on the order of compounds and their responses in the input file. This permits users to sort compounds based on a variety of criteria prior to plotting. Compound ordering can reflect structure-based clusters, response metrics such as potency and efficacy, readout type, or any combination of compound or compound response attributes. As an illustration of compound pre-sorting, Fig. 1B features compound responses ordered and colored by NCATS curve class, a criteria-based response-curve classification system, and then ordered within each curve class by decreasing AC50.

Extra columns may be present in the file. In this example, we include a compound name and smiles structure string. Extra columns can be appended. The current restrictions are that the first two cells in the upper left should include the Format tag and the format value, and the data columns (Data0-DataN) and associated concentrations should be in a block of consecutive/contiguous columns as shown in Fig. 4B.

Conclusions

Obtaining a comprehensive view of bioactivity from a qHTS is highly informative from several perspectives. 3D data visualization can provide a high-level pharmacological assessment of overall library-assay activity allowing, for example, comparative analysis of assay activity vs library or vice versa. [1, 7, 8, 10, 12, 13, 27,28,29] Further, by using specified sorting of compound similarity vs AC50, hill slope, max response, etc. highlighted information such as pharmacologic mechanism or chemical tractability can be conveyed to reveal actionable insights. For example, Fig. 1E shows a plot from a qHTS follow-up where five firefly enzyme ligand chemotypes (a-e) are shown to have varying cellular consequences effects on firefly luciferase reporter output (PubChem AID = 1508643) [27].

Producing overview plots for large screening campaigns had previously been a laborious process, using commercial software that were not optimally designed to handle this specific data and visualization type. The qHTSWaterfall application we present in this paper has allowed our lab to graph 3-dimensional qHTS data for various assays in a simple, and time-efficient manner. Generating overview presentations of qHTS data is roughly analogous to omics heatmaps in showing activity patterns over large data sets. To our knowledge, a free, open-source qHTS Waterfall plot software has not been previously available.

In addition, this program offers a facile means to generate a high-level analysis of the ever-increasing qHTS data appearing in repositories such as PubChem for anyone interested in studying a large and varied chemical biology data set. At the time this paper was written, there were over 15 k HTS data sets in PubChem [30]. While qHTS data can be represented by a 3-axis plot, the information content includes more than 3 parameters. For example, in addition to structural relationships among active compounds, each CRC contains pharmacologic parameters including an EC50 equivalent, a measure of potency, the hill slope, a mechanistic indicator, as well as the efficacy or magnitude of the response.

Our program allows biologists, chemists, informaticians, and the public to create 3-dimensional qHTS graphs clustered according to their preference as well as color aesthetics.

The user interface featured in the Shiny application helps users that are not proficient in R to produce plots, while others that wish to integrate the qHTSWaterfall plot into an existing R analysis workflow, can easily do so. 3-dimensional qHTS graphing allows researchers a general sense of trends, difficult to observe in a two-dimensional graphing format, relating to the interaction of chemical libraries with biological assays. Furthermore, this visualization can illustrate the scale of noise and artifacts between reporters and assays. In addition, our program allows scientists, regardless of previous programming experience, to create 3-dimensional qHTS data plots in an effective and timely manner. Researchers have the option to present data in clusters by the mechanism of action, activity, inhibition, or compound ID to organize data repositories such as PubChem [30, 31].

Availability of data and materials

The software and sample data files are free and open source, licensed under Apache v2.0. Project name: qHTS Waterfall (qHTSWaterfall R Package), Project home page: https://github.com/ncats/qHTSWaterfall, Installation Instructions: https://github.com/ncats/qHTSWaterfall#readme; Operating systems: Platform independent; Programming language: R; Other Requirements: None if running locally on a user’s machine. R Shiny Server is needed if users intend to host the software on a shared server machine. License: Apache v2.0. Example data files: Included in the qHTSWaterfall R package or can be found in the source repository in this location: https://github.com/ncats/qHTSWaterfall/tree/main/inst/extdata, https://github.com/ncats/qHTSWaterfall/raw/main/inst/extdata/Generic_qHTS_Format_Example.xlsx

References

  1. Inglese J et al (2006) Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. P Natl Acad Sci USA 103:11473–11478

    Article  CAS  Google Scholar 

  2. Inglese J, Auld DS (2008) High throughput screening (HTS) techniques: applications in chemical biology. In Wiley encyclopedia of chemical biology, Wiley, pp 1–15

  3. Brown LE et al (2011) Discovery of new antimalarial chemotypes through chemical methodology and library development. Proc Natl Acad Sci USA 108:6775–6780

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Inglese J et al (2007) High-throughput screening assays for the identification of chemical probes. Nat Chem Biol 3:466–479

    Article  CAS  PubMed  Google Scholar 

  5. Macarron R et al (2011) Impact of high-throughput screening in biomedical research. Nat Rev Drug Discov 10:188–195

    Article  CAS  PubMed  Google Scholar 

  6. Rai G et al (2017) Discovery and optimization of potent, cell-active pyrazole-based inhibitors of lactate dehydrogenase (LDH). J Med Chem 60:9184–9204

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Solinski HJ et al (2019) Inhibition of natriuretic peptide receptor 1 reduces itch in mice. 11: eaav5464

  8. Kinder TB, Dranchak PK, Inglese J (2020) High-throughput screening to identify inhibitors of the type I interferon-major histocompatibility complex class I pathway in skeletal muscle. ACS Chem Biol 15:1974–1986

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Inglese J et al (2014) Genome editing-enabled HTS assays expand drug target pathways for Charcot-Marie-tooth disease. ACS Chem Biol 9:2594–2602

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Cheng KCC et al (2015) Actinoramide a identified as a potent antimalarial from titration-based screening of marine natural product extracts. J Nat Prod 78:2411–2422

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Mott BT et al (2015) High-throughput matrix screening identifies synergistic and antagonistic antimalarial drug combinations. Sci Rep 5:13891

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Simeonov A et al (2008) Fluorescence spectroscopic profiling of compound libraries. J Med Chem 51:2363–2371

    Article  CAS  PubMed  Google Scholar 

  13. Thorne N et al (2012) Firefly luciferase in chemical biology: a compendium of inhibitors, mechanistic evaluation of chemotypes, and suggested use as a reporter. Chem Biol 19:1060–1072

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Feng BY et al (2007) A high-throughput screen for aggregation-based inhibition in a large compound library. J Med Chem 50:2385–2390

    Article  CAS  PubMed  Google Scholar 

  15. Xia M et al (2008) Compound cytotoxicity profiling using quantitative high-throughput screening. Environ Health Persp 116:284–291

    Article  CAS  Google Scholar 

  16. Huang R et al (2019) The NCATS BioPlanet—an integrated platform for exploring the universe of cellular signaling pathways for toxicology, systems biology, and chemical genomics. Front Pharmacol 10:445

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Zhang XD, Zhang ZZ (2013) displayHTS: a R package for displaying data and results from high-throughput screening experiments. Bioinformatics 29:794–796

    Article  CAS  PubMed  Google Scholar 

  18. Motulsky H, Christopoulos A (2004) Fitting models to biological data using linear and nonlinear regression : a practical guide to curve fitting. Oxford University Press, Oxford; New York

    Google Scholar 

  19. Seethala R, Zhang L (2009) Handbook of drug screening. Informa Healthcare, New York

  20. Wang Y, Jadhav A, Southal N, Huang R, Nguyen DT (2010) A grid algorithm for high throughput fitting of dose-response curve data. Curr Chem Genomics 4:57–66

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Castanon Alvarez E, Aspeslagh S, Soria JC (2017) 3D waterfall plots: a better graphical representation of tumor response in oncology. Ann Oncol 28:454–456

    Article  CAS  PubMed  Google Scholar 

  22. Gillespie TW (2012) Understanding waterfall plots. J Adv Pract Oncol 3:106–111

    PubMed  PubMed Central  Google Scholar 

  23. Kim MS, Prasad V (2019) Assessment of accuracy of waterfall plot representations of response rates in cancer treatment published in medical journals. JAMA Netw Open 2:e193981

    Article  PubMed  PubMed Central  Google Scholar 

  24. Kwak EL et al (2010) Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. N Engl J Med 363:1693–1703

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. The 3D qHTS Waterfall Plot package. https://github.com/ncats/qHTSWaterfall#readme.

  26. Cheng KC, Inglese J (2012) A coincidence reporter-gene system for high-throughput screening. Nat Methods 9:937

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Martinez NJ et al (2021) Genome-edited coincidence and PMP22-HiBiT fusion reporter cell lines enable an artifact-suppressive quantitative high-throughput screening strategy for PMP22 gene-dosage disorder drug discovery. ACS Pharmacol Transl Sci 4:1422–1436

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Dranchak P et al (2013) Profile of the GSK published protein kinase inhibitor set across ATP-dependent and-independent luciferases: implications for reporter-gene assays. PLoS ONE 8:e57888

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Miller TW et al (2019) Quantitative high-throughput screening assays for the discovery and development of SIRPα-CD47 interaction inhibitors. PLoS ONE 14:e0218897–e0218897

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Kim S et al (2016) PubChem substance and compound databases. Nucleic Acids Res 44:D1202-1213

    Article  CAS  PubMed  Google Scholar 

  31. Fu G et al (2015) PubChemRDF: towards the semantic annotation of PubChem compound and substance databases. J Cheminform 7:34

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Funding

Open Access funding provided by the National Institutes of Health (NIH). This research was supported (in part) by the Intramural Research Program of the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH) under projects 1ZIATR000053 and 1ZIATR000052 (J.I.)

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the drafting of the manuscript. JI and PD conceived the plot concept. BQ conceived the tool and did core code implementation. JB refined the code, formed the R package, and developed the Shiny UI. PD worked on UI/UX improvements and user testing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to John C. Braisted.

Ethics declarations

Competing interests

The authors declare that they do not have any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Fig. S1.

Instructions for installation and starting the qHTSWaterfall Application. The package devtools is required for installation from github.com and can be installed if needed.

Additional file 2: Fig. S2.

A sigmoidal concentration response curve. The 4 parameters contained in the input file (denoted in the file as S_0, S_Inf, Hill_Slope and logAC50) are explained here. Note that some software programs that generate these fit parameters may use different nomenclature to refer to these parameters.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Queme, B., Braisted, J.C., Dranchak, P. et al. qHTSWaterfall: 3-dimensional visualization software for quantitative high-throughput screening (qHTS) data. J Cheminform 15, 39 (2023). https://doi.org/10.1186/s13321-023-00717-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13321-023-00717-9

Keywords