Skip to main content


Classification of CYP450 1A2 inhibitors using PubChem data

Article metrics

  • 1172 Accesses

  • 2 Citations

Cytochromes P450 (CYP450) are a superfamily of enzymes, involved in metabolism of a large number of xenobiotic compounds. CYP450 are involved in degradation of a large amount of drugs, currently present on the market. The promiscuity with respect to substrates makes the CYP450 enzymes prone to inhibition by a large amount of drugs, which gives way to clinically significant drug-drug interactions.

In this work different machine learning methods were applied to classify the inhibitors/noninhibitors of human CYP450 1A2. The structures and the active/inactive classification concerning CYP1A2 inhibition were taken from PubChem BioAssay database. This assay uses human CYP1A2 to measure the demethylation of luciferin 6' methyl ether (Luciferin-ME; Promega-Glo) to luciferin.

The tested methods include k nearest neighbors (kNN), decision tree, random forest, support vector machine (SVM) and associative neural networks (ASNN). The descriptors used were those from the Dragon software, the fragment descriptors and the E-state indices.

The training and test sets were handled separately to avoid different possibilities of overfitting - including overfitting by descriptor selection. Different applicability domain (AD) approaches were used to estimate the confidence of classification.

As a result the models managed to correctly classify 80% of the test set instances. The accuracy of classification was found to be up to 95%, if only 30% most confident predictions were taken into account. The model was also applied to an external test set of 187 molecules, collected from literature and measured using a different etalon reaction. For this set accuracy of 78% was achieved on the 30% most confident predictions.

All the developed models are fast enough to be used for virtual screening of CYP1A2 inhibitors and noninhibitors. The developed models are publicly available on-line at the web site.

Author information

Correspondence to Sergii Novotarskyi.

Rights and permissions

Reprints and Permissions

About this article


  • Support Vector Machine
  • Random Forest
  • Virtual Screening
  • Methyl Ether
  • Luciferin