Integration of chemical information with protein sequences and 3D structures
Journal of Cheminformatics volume 2, Article number: O17 (2010)
The Protein Data Bank (PDB) contains a wealth of small molecule - macro molecule complexes the study of which contribute enormously to our understanding of the interactions. However, exploiting and mining this treasure trove of data requires advanced analysis and retrieval methods that take into account both types of molecules. One such method is PDBeMotif, that has been developed by the Protein Data Bank in Europe (PDBe) at EMBL-EBI. Utilizing a relational database model at the back-end, the data structure represents a network of molecule, residue and motif interactions as well as their relative positions in the sequence and in 3D. The loader applies a number of algorithms to analyse PDB and derive necessary information, such as planarity and aromaticity of the chemical compounds, hydrogen-bonds network, coordination geometry, bond types (including pi electron interactions), 3D structural motifs, sequence domains and families. It collects information about sequence features, motifs and catalytic sites from available Distributed Annotation System (DAS) resources. The web application allows for a wide variety of searches and data analysis including protein motifs with chemical fragments association, protein sites characterisation, correlating properties, hits multiple sequence and 3D alignments. The whole system is released under GPL and available with the source code from http://sourceforge.net/projects/pdbsam and on line at http://www.ebi.ac.uk/pdbe-site/PDBeMotif/
Golovin A, Henrick K: Chemical Substructure Search in SQL. J Chem Inf Model. 2009, 49 (1): 22-27. 10.1021/ci8003013.
Golovin A, Henrick K: MSDmotif: exploring protein sites and motifs. BMC Bioinformatics. 2008, 9: 312-10.1186/1471-2105-9-312.
Golovin A, Dimitropoulos D, Oldfield T, Rachedi A, Henrick K: MSDsite: A Database Search and Retrieval System for the Analysis and Viewing of Bound Ligands and Active Sites. PROTEINS: Structure, Function, and Bioinformatics. 2005, 58 (1): 190-9. 10.1002/prot.20288.
Golovin A, Oldfield TJ, Tate JG, Velankar S, Barton GJ, Boutselakis H, Dimitropoulos D, Fillon J, Hussain A, Ionides JMC, John M, Keller PA, Krissinel E, McNeil P, Naim A, Newman R, Pajon A, Pineda J, Rachedi A, Copeland J, Sitnov A, Sobhany S, Suarez-Uruena A, Swaminathan J, Tagari M, Tromm S, Vranken W, Henrick K: E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Research. 2004, D211-D216. 10.1093/nar/gkh078. 32 Database
About this article
Cite this article
Golovin, A., Henrick, K. & Kleywegt, G. Integration of chemical information with protein sequences and 3D structures. J Cheminform 2 (Suppl 1), O17 (2010). https://doi.org/10.1186/1758-2946-2-S1-O17
- Protein Data Bank
- Relational Database
- Coordination Geometry
- Retrieval Method
- Sequence Domain