Skip to main content


Prediction of highly-connected 'hub'-proteins in protein interaction networks using QSAR

Proteins that are most essential for functioning and viability of bacterial cell have been shown to exhibit larger number of interactions with other cell components. Thus, by identifying the most connected proteins (or hubs) in protein interaction networks (PINs), one may discover prospective drug targets that can be utilized to combat emergent and drug-resistant pathogens such as Methicillin-Resistant Staphylococcus aureus 252 (MRSA). The advantage of using such hub proteins as drug targets lies in their essentiality, non-replaceable position in the PIN and lower rate of mutation, which can help to counter bacterial resistance.

However, finding or predicting such hub proteins remains a challenging task as the corresponding experiments are very costly, while traditional bioinformatics approaches generally fail in forecasting PIN data due to the general lack of agreement between the existing datasets [1].

Thus, we have decided to utilize various structural and physicochemical features of proteins, related to traditional QSAR properties for predicting highly connected proteins. Using our own in-house generated PIN for the MRSA cell we have trained a boosting tree-based classifier that uses 75 physical and chemical QSAR descriptors computed for all proteins in the interaction network [2]. The utilized parameters included molecular weight, net charge, isoelectric point, hydrophobicity, surface area, solvent accessibilities, electronegativity, secondary structure composition, surface coils and flexibility among other QSAR descriptors.

The developed QSAR model has yielded a high prediction accuracy of 80% for the validation set and was used to predict additional hubs in the rest of the MRSA proteome. The predicted hubs have then been evaluated experimentally and 55% of them were confirmed as high interactors what corresponds to >5 fold dataset enrichment for potential hub-proteins provided by the developed QSAR model.

Thus, the successful development of accurate hub classifiers demonstrated that highly-connected proteins tend to share certain structural and physicochemical features that can be characterized and quantified by conventional QSAR descriptors.

It is anticipated that the developed hub classifiers will represent a useful tool for the prediction of highly-interacting proteins and can find broad application for planning and executing large-scale proteomic experiments and for identification of novel and prospective antibacterial drug targets -- even in those organisms that currently lack protein interaction data.


  1. 1.

    Huang H, Jedynak BM, Bader JS: PLoS Comput Biol. 2007, 3: e214-10.1371/journal.pcbi.0030214.

  2. 2.

    Byler K, Hsing M, Cherkasov A: QSAR & Combinatorial Science. 2009, 28: 509-519. 10.1002/qsar.200860108.

Download references

Author information

Correspondence to M Hsing.

Rights and permissions

Reprints and Permissions

About this article


  • Protein Interaction Network
  • Protein Interaction Data
  • Connected Protein
  • Secondary Structure Composition
  • Antibacterial Drug Target