Skip to content


  • Poster presentation
  • Open Access

Hit series selection in noisy HTS data: clustering techniques, statistical tests and data visualisations

  • 1Email author,
  • 1,
  • 2,
  • 2,
  • 2 and
  • 1
Journal of Cheminformatics20146 (Suppl 1) :P27

  • Published:


  • Vortex
  • High Throughput Screening
  • Data Visualisation
  • Cluster Scheme
  • Chemical Descriptor

High throughput screening (HTS) is one of the most prominent techniques used in the beginning stages of a drug discovery programme to identify those few hit compounds that can be used as starting points in subsequent studies [1, 2]. However, an HTS experiment often entails a very data-intensive and challenging hit prioritization process that yields the mentioned hit compounds. The workflow described in this study aims to make this decision-making process easier by combining the structural and biological information of compounds used in an HTS. In particular, the workflow combines various clustering and nearest neighbourhood schemes with a non-parametric statistical test in order to prioritize those groupings of compounds that are likely of being relevant to the biological target of interest [3].

The novel workflow was evaluated under various aspects in a retrospective study using publicly available quantitative HTS (qHTS) datasets [4]. One of the main benchmarking aspects in this study was the ability to correctly identify as many true active compounds as possible. Therefore different chemical descriptors and clustering schemes were tested in combination with the statistic to measure their classification performance.

The workflow was integrated into Dotmatics’ Vortex, a platform for analysing chemical information using chemoinformatics methods and data visualisations tools [5]. This integration enables researchers to easily extend their current HTS workflow in order to discover new hit series and reveal hidden relationships between compounds, scaffolds and clusters.

Authors’ Affiliations

Dotmatics Ltd, Windhill, Bishop’s Stortford, CM23 2ND, UK
AstraZeneca AB, Peppardsleden 1, Mölndal, 43183, Sweden


  1. Rocke D: Design and analysis of experiments with high throughput biological assay data. Cell and Developmental Biology. 2004, 15 (6): 703-713.View ArticleGoogle Scholar
  2. Keseru GM, Makara GM: Hit discovery and hit-to-lead approaches. Drug Discovery Today. 2006, 11 (15-16): 741-748. 10.1016/j.drudis.2006.06.016.View ArticleGoogle Scholar
  3. Varin T, Gubler H, Parker CN, Zhang JH, Raman P, Ertl P, Schuffenhauer A: Compound set enrichment: A novel approach to analysis of primary HTS data. J Chem Inf Model. 2010, 50 (12): 2067-2078. 10.1021/ci100203e.View ArticleGoogle Scholar
  4. []
  5. Dotmatics Ltd: []


© Müller et al; licensee Chemistry Central Ltd. 2014

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.