Skip to main content

Integration of chemical information with protein sequences and 3D structures

The Protein Data Bank (PDB) contains a wealth of small molecule - macro molecule complexes the study of which contribute enormously to our understanding of the interactions. However, exploiting and mining this treasure trove of data requires advanced analysis and retrieval methods that take into account both types of molecules. One such method is PDBeMotif, that has been developed by the Protein Data Bank in Europe (PDBe) at EMBL-EBI. Utilizing a relational database model at the back-end, the data structure represents a network of molecule, residue and motif interactions as well as their relative positions in the sequence and in 3D. The loader applies a number of algorithms to analyse PDB and derive necessary information, such as planarity and aromaticity of the chemical compounds, hydrogen-bonds network, coordination geometry, bond types (including pi electron interactions), 3D structural motifs, sequence domains and families. It collects information about sequence features, motifs and catalytic sites from available Distributed Annotation System (DAS) resources. The web application allows for a wide variety of searches and data analysis including protein motifs with chemical fragments association, protein sites characterisation, correlating properties, hits multiple sequence and 3D alignments. The whole system is released under GPL and available with the source code from and on line at


  1. Golovin A, Henrick K: Chemical Substructure Search in SQL. J Chem Inf Model. 2009, 49 (1): 22-27. 10.1021/ci8003013.

    Article  CAS  Google Scholar 

  2. Golovin A, Henrick K: MSDmotif: exploring protein sites and motifs. BMC Bioinformatics. 2008, 9: 312-10.1186/1471-2105-9-312.

    Article  Google Scholar 

  3. Golovin A, Dimitropoulos D, Oldfield T, Rachedi A, Henrick K: MSDsite: A Database Search and Retrieval System for the Analysis and Viewing of Bound Ligands and Active Sites. PROTEINS: Structure, Function, and Bioinformatics. 2005, 58 (1): 190-9. 10.1002/prot.20288.

    Article  CAS  Google Scholar 

  4. Golovin A, Oldfield TJ, Tate JG, Velankar S, Barton GJ, Boutselakis H, Dimitropoulos D, Fillon J, Hussain A, Ionides JMC, John M, Keller PA, Krissinel E, McNeil P, Naim A, Newman R, Pajon A, Pineda J, Rachedi A, Copeland J, Sitnov A, Sobhany S, Suarez-Uruena A, Swaminathan J, Tagari M, Tromm S, Vranken W, Henrick K: E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Research. 2004, D211-D216. 10.1093/nar/gkh078. 32 Database

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Adel Golovin.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Golovin, A., Henrick, K. & Kleywegt, G. Integration of chemical information with protein sequences and 3D structures. J Cheminform 2 (Suppl 1), O17 (2010).

Download citation

  • Published:

  • DOI: