Skip to main content
  • Poster presentation
  • Open access
  • Published:

Kernel density estimation of CSD distributions - an application to knowledge based molecular optimisation

The Cambridge Structural Database ( CSD ) contains a large amount of molecular structure data ( bond length, bong angle and torsion angle data.) Much of this data has previously been extracted in histogram form and provided in the Mogul program. Histograms however have several disadvantages e.g. they are not smooth, they depend on bin widths and bin end points.

Kernel density estimators do not bin data and have no end points but centre a kernel function at each data point and smooth kernel functions will generate smooth density estimates [1]. A difficulty of the approach though is how wide to make the kernel functions.

In this work kernel density estimation is used to generate probability density functions ( pdfs ) for bond length, bond angle and torsion angle histograms derived from the CSD. Gaussian kernels are used for bond length and bond angle data and a von Mises kernel is used for the torsion angle data [2]. The resulting pdfs are smooth and are suitable for application to molecular geometry optimisation.

References

  1. Silverman BW: Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. 1986, Chapman and Hall/CRC

    Google Scholar 

  2. Evans M, Hastings NAJ & Peacock: Statistical distributions. 2000, Wiley

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick McCabe.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

McCabe, P., Korb, O., Cole, J. et al. Kernel density estimation of CSD distributions - an application to knowledge based molecular optimisation. J Cheminform 6 (Suppl 1), P10 (2014). https://doi.org/10.1186/1758-2946-6-S1-P10

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1758-2946-6-S1-P10

Keywords