- Open Access
XMetDB: an open access database for xenobiotic metabolism
Journal of Cheminformaticsvolume 8, Article number: 47 (2016)
Xenobiotic metabolism is an active research topic but the limited amount of openly available high-quality biotransformation data constrains predictive modeling. Current database often default to commonly available information: which enzyme metabolizes a compound, but neither experimental conditions nor the atoms that undergo metabolization are captured. We present XMetDB, an open access database for drugs and other xenobiotics and their respective metabolites. The database contains chemical structures of xenobiotic biotransformations with substrate atoms annotated as reaction centra, the resulting product formed, and the catalyzing enzyme, type of experiment, and literature references. Associated with the database is a web interface for the submission and retrieval of experimental metabolite data for drugs and other xenobiotics in various formats, and a web API for programmatic access is also available. The database is open for data deposition, and a curation scheme is in place for quality control. An extensive guide on how to enter experimental data into is available from the XMetDB wiki. XMetDB formalizes how biotransformation data should be reported, and the openly available systematically labeled data is a big step forward towards better models for predictive metabolism.
Xenobiotic metabolism covers the biochemical modification of drugs and xenobiotics by living organisms. These biotransformations are usually carried out by specialized enzymatic systems such as the cytochrome P450s and the UDP-glucuronosyltransferases , with the aim to make compounds more soluble and more easily excreted from the body.
Understanding how xenobiotic metabolism occurs in the human body is important in two fields particularly: drug discovery and toxicology. In drug discovery, one needs to understand what metabolites of a potential drug are formed to be able to study what effects they have . Occasionally, some metabolites are more potent than the drug itself. For example, cyclophosphamide is a prodrug that is implicated to be activated by P450s . Furthermore, metabolism can cause unwanted side effects, for example by interfering with the potential use of other drugs by inhibiting certain enzymes and, for example, change blood concentrations of drugs . In the field of toxicology, understanding the metabolism of all types of chemicals is important, even if the compounds themselves are not toxic, and it is their metabolites that cause the toxic effects. This also makes it important to understand metabolism when building in silico models that predict toxicity, as the molecular properties of the original compound and its metabolites may differ significantly. The evaluation of the metabolic fate and metabolism similarity of target and analog compounds in the context of read-across is an essential part of the framework for toxicological assessment proposed by Wu et al.  and relevant methods and tools are emphasized in a recent review on in-silico approaches for predicting toxicity .
The creation and validation of in silico models for predicting the metabolism of xenobiotics is an active field of research [7–17]. However, the available of suitable data is limited in two ways. First, relevant data used to construct these models are either locked away in proprietary and commercial databases, such as the Accelrys Metabolite database (http://accelrys.com/products/collaborative-science/databases/bioactivity-databases/biovia-metabolite.html), or is put together specifically for each new model . The availability of data in a public database accessible in an open format and curated by the scientific community would be a major step forward in decreasing the work required to create new models, and to enable comparisons of different models.
Second, the experiments to accurately link the metabolic conversion to a specific enzyme are rare. Often microsomes and hepatocytes of animal and human origin are used , in which it cannot always fully be certain which enzyme really does the conversion. Currently, the most detailed experimental procedures used to verify which metabolites are formed use cDNA-expressed drug-metabolizing enzymes, followed by LC/MS analysis of the formed metabolites . A further complication is that even when reference compounds of the metabolites are available in such studies, it is still not always possible to accurately identify at which atomic position the conversion happens, for example, of an aromatic ring hydroxylation reaction. Other types of experiments give information about the metabolism, but few studies give the full picture. Nevertheless, it is this precise recording of experimental detail is critical, whatever the used method is.
Published data on xenobiotic metabolism, which may serve as an information source of a new database, are currently fragmented over many journals and publications, using different experiment types, and provided in different formats, conforming to different standards. There are several existing databases that contain information relevant for xenobiotic metabolism, including DrugBank , SuperCYP , hDBMdb , Metrabase , Human Metabolome Database (HMDB)  and Transformer ; Drugbank reports substrate—product—enzyme, and while it includes references, these are typically to other database rather than primary literature; Transformer reports substrate—enzyme—reference, but not what the product is; Metrabase focuses on transporters and xenobiotic metabolism; HMDB lists the enzyme and literature but not the species the transformation was observed in, and, SuperCYP reports substrate—product—enzyme. Importantly, these databases do not include atom–atom mapping which is crucial for building predictive models for site-of-metabolism. hDBMdb does not seem to be operational anymore. Further, none of the previously mentioned databases has an open application programming interface (API) for direct consumption in scripts and third party applications, making them less useful for large scale modeling [26–29]
Further, the commercially available databases are very expensive and sometimes contain unpublished data, leading to further literature studies to find validation data that can be used in publications. Also, since different formats and standards are used across different labs, there is no consistent evaluation system for these kinds of models, resulting in published statistics that are not directly comparable between publications, e.g. mapping from mechanism to atoms that have not been performed in an identical fashion.
This lack of publicly available data has led to numerous repetitive literature searches for many academic research groups interested in metabolite prediction modeling. However, finding appropriate literature with enough detail is a challenge in itself. The knowledge about biotransformations is limited and reports about it are scattered. Peer-reviewed literature is not the only source of information, nor does literature report all the details we want to capture. Anecdotal examples even shows literature citing conference posters and package inserts as primary sources . Other literature includes FDA submissions, but these too can provide only limited detail and may also lack information about products formed.
We here present XMetDB—an open access database for xenobiotic metabolism implemented as an online system for deposition and sharing of experimental data. It is the first database that contain atom reaction center mapping and also includes a new reporting standard for this, with data and software available under Open licenses.
XMetDB consists of a database, an application layer for interacting with the database, and a user interface. The core data stored in the database are Observations, which consists of an experimentally detected enzyme-catalyzed reaction of a substrate that yields a product. XMetDB contains the chemical structure of substrates and products, and also includes annotations of atoms which are affected by the metabolizing reaction, indicating experimentally derived site-of-metabolism annotations. The experimental conditions are limited to the type of experiment (one of enzymes, hepatocytes and microsomes) and enzymes involved. The species assumed is human. The initial design does not allow specifying the species, as the intention was to built a database for human metabolism. Associated structured information includes uncertainty of the atom mappings (either certain or uncertain), amount of product formed in the reaction (one of major, minor, or unknown), a literature reference, and comments as free text. Enzymes, such as members of the Cytochrome P450 family, can be added separately and are linked out to UniProt via UniProt ID.
In addition, each observation can be tagged as curated or not, and assigned free text comment.
All Observations have unique identifiers, for example the arbitrarily chosen entry XMETDB153 for the observation of the P450-mediated oxidation of quinoline into 3-hydroxyquinoline by the CYP2E1 isoform, and comprises a metabolic transformation (substrate, product), the enzyme, a literature reference, and a categorization of the experiment with the enzyme identified by its name and UniProt identifier. The reference in the example is provided as free text, while increasingly the DOI is given instead of a textual reference.
Version 1.0 of XMetDB contains 162 observations from 21 scientific papers from 14 journals, covering 117 chemical structures and 95 enzymes.
XMetDB provides two interfaces: an HTML-based interface for access via a web browser and aimed at humans, and an API set up for programmatic interaction. The key features of the interfaces are: (1) browse and search for observations, (2) submit new observations, and (3) add and browse enzymes. The API is described later, and we first explain the graphical user interface.
The principal way to browse the data in XMetDB is by listing all observations (see Fig. 1 for an example). The list shows observations paginated, and the number of observations shown can be changed from 10 to 20, 50, 100, and all observations. Alternatively, it is possible to browse the observations by enzyme, presenting a list of observations for a particular enzyme (see Fig. 2 for an example). Both lists can be sorted by any of the available columns and lead to individual observations, each with a unique identifier.
The user interfaces in XMetDB supports searching for observations by status, experiment type, enzyme, reference, XMETID, and allele. It is also possible to search for substrates and products by chemical similarity and substructure using a structure diagram editor or by chemical identifier (CAS, Chemical Name, SMILES, InChI or SMARTS for a substructure search). The resulting list of structures is ordered by similarity, clicking the folder icon returns a list of observations involving this structure.
Submission of a new observation works as a two-step procedure: (1) Enter substrate and product structures and meta-data describing the experiment; (2) Select atom mappings for the reaction, which is included to enable more accurate site-of-metabolism modeling. All atom mappings used in XMetDB should be annotated as either Certain or Uncertain, where certain means that as far as possible it is known at which atom (or atoms) the reaction is initiated. To help the users, a large collection of metabolic reactions and their corresponding sites of metabolism have been collected, and are available in the XMetDB wiki: http://www.xmetdb.org/wiki/, together with an extensive guide for how to enter experimental data. An example of hydroxylation of aromatic carbon atoms by cytochromes P450 is shown in Fig. 3.
XMetDB supports a curator role, with the purpose to ensure high quality data in the database. Data that has been verified by a curator besides the original submitter has been validated by at least one additional person, and the curator has been verified by the administrators as an expert on biotransformations for at least one enzyme family. The curators can edit all observations but not essential info such as experiment and enzymes. The curators can change atom highlighting and comments and typos in references. The curators can set the flag curated to yes for any observations. The users may indicate their availability to act as a curator, but the role must be assigned by an administrator.
Lists of observations, such as the results of a query, can be exported as column separated values (CSV) or structure data file (SDF) formats (see Table 1 for details). CSV files include the observations along with chemical structures represented as SMILES and InChI, with column names as listed in Table 2. SDF files includes chemical structures with the observations data and metadata stored as a set of SDF tags (an example file is given in Additional file 1). Single reaction entries can be exported in the GPML format, which is the XML language used by WikiPathways for storage of pathways.
The XMetDB database exposes an open Application Program Interface (API) that allows the data to be accessed programmatically by software applications (e.g. workflow engines, scripts) and other web services .
The main XMetDB API components are observations, enzymes, and chemical structures. Every component is assigned an unique URI, e.g. http://xmetdb.org/protocol/XMETDB1 for an observation. The API follows RESTful principles, where each component (the term in the REST vocabulary is a “resource”) allows only four operations:
GET—to retrieve the content of the resource
e.g. GET http://xmetdb.org/protocol/XMETDB1
PUT—update the content of the resource
e.g. PUT (some data to) http://xmetdb.org/protocol/XMETDB1
POST—-create a new resource
DELETE—remove the resource
e.g. after DELETE http://xmetdb.org/protocol/XMETDB1 this resource will no more exist
The Web API is described extensively in XMetDB Wiki at: http://www.xmetdb.org/wiki/API.
From a deployment perspective, XMetDB consists of two web containers: xmetdb.war and ambit2.war, available both as pre-deployed online web services at http://xmetdb.org, and as a downloadable web application which can be deployed in a compatible servlet container. The chemical structures and datasets are represented as described in , enhanced with JSON representation to facilitate the user interface implementation (http://ambit.sourceforge.net/api.html). The supported data formats are JSON and RDF (RDF/XML and RDF/N3) and asynchronous jobs are handled according to OpenTox API version 1.2 specifications .
Users and roles
The XMetDB application supports the following user roles: (1) regular user, (2) curator and (3) administrator. A regular user is any registered user who is logged in; and can add observations to the database, save searches as alerts, edit the user profile, and flag availability as curators. A curator is a regular user who has been approved as a curator. Beyond regular user rights, a curator can approve that data submitted by other users has been added in a consistent manner, and verify that the atom mappings and references are correct. As part of the curation process he/she can also edit atom mappings, and metadata of an observation. An administrator is a curator with additional rights (to grant the curator role to other users, to modify the enzyme list, delete observations, modify user information etc.). Searching and browsing the observations as well as exporting data through the web interface or API does not require the user to be logged in.
Discussion and conclusions
Extracting biotransformation data from literature is a cumbersome process, and crowdsourcing initiatives are needed in order to propel scientific discoveries and enable computational model building. The aim of XMetDB is to provide scientists with a public repository where xenobiotic metabolism data can be uploaded, shared, and integrated with other observations. Other databases related to metabolism do not include atom–atom mapping, and in this manuscript we propose a formalization of how xenobiotic metabolism data should be reported in order to improve computational model building. We acknowledge here, however, that current experimental methods may not always provide all the detail we ask to be reported. The open access philosophy of XMetDB allows for any content to be uploaded, and a curation system allows for curators to ensure that the database contents are of high quality. XMetDB hence provides the means for the community to collectively build up a knowledge base over time, and it is our hope that the community will adopt the system and deposit annotated metabolism data upon publication in scientific journals. For the future we also envision to import reaction data from other databases or possibly via text mining, which could encourage reporting and investigations on atom–atom mapping.
The publicly available systematically labeled data in XMetDB will be a big step forward towards improved models for predictive metabolism. Future plans include to integrate XMetDB in Bioclipse  and expand the content with more data.
Testa B, Pedretti A, Vistoli G (2012) Reactions and enzymes in the metabolism of drugs and other xenobiotics. Drug Discov Today 17(11–12):549–60
Claesson A, Spjuth O (2013) On mechanisms of reactive metabolite formation from drugs. Mini Rev Med Chem 13(5):9–720
Huttunen KM, Raunio H, Rautio J (2011) Prodrugs—from serendipity to rational design. Pharmacol Rev 63(3):750–771
Ito K, Iwatsubo T, Kanamitsu S, Ueda K, Suzuki H, Sugiyama Y (1998) Prediction of pharmacokinetic alterations caused by drug–drug interactions: metabolic interaction in the liver. Pharmacol Rev 50(3):387–412
Wu S, Blackburn K, Amburgey J, Jaworska J, Federle T (2010) A framework for using structural, reactivity, metabolic and physicochemical similarity to evaluate the suitability of analogs for SAR-based toxicological assessments. Regul Toxicol Pharmacol 56(1):67–81
Patlewicz GY, Fitzpatrick J (2015) Current and future perspectives on the development, evaluation and application of in silico approaches for predicting toxicity. Chem Res Toxicol 5:00388
Rostkowski M, Spjuth O, Rydberg P (2013) WhichCyp: prediction of cytochromes P450 inhibition. Bioinformatics 29(16):2051–2052
Kirchmair J, Williamson MJ, Tyzack JD, Tan L, Bond PJ, Bender A, Glen RC (2012) Computational prediction of metabolism: sites, products, SAR, P450 enzyme dynamics, and mechanisms. J Chem Inf Model 52(3):48–617
Kirchmair J, Bender A, Kubinyi H, Folkers G (2014) Drug metabolism prediction. Methods and principles in medicinal chemistry, vol 63. Wiley, Weinheim
Kirchmair J, Göller AH, Lang D, Kunze J, Testa B, Wilson ID, Glen RC, Schneider G (2015) Predicting drug metabolism: experiment and/or computation? Nat Rev Drug Discov 14(6):387–404. doi:10.1038/nrd4581
Carlsson L, Spjuth O, Adams S, Glen RC, Boyer S (2010) Use of historic metabolic biotransformation data as a means of anticipating metabolic sites using MetaPrint2D and Bioclipse. BMC Bioinform 11:362
Piechota P, Cronin MTD, Hewitt M, Madden JC (2013) Pragmatic approaches to using computational methods to predict xenobiotic metabolism. J Chem Inf Model 53(6):92–1282
Rudik AV, Dmitriev AV, Lagunin AA, Filimonov DA, Poroikov VV (2014) Metabolism site prediction based on xenobiotic structural formulas and pass prediction algorithm. J Chem Inf Model 54(2):498–507
Rydberg P, Olsen L (2012) Ligand-based site of metabolism prediction for cytochrome P450 2D6. ACS Med Chem Lett 3(1):69–73
Kolanczyk RC, Schmieder P, Jones WJ, Mekenyan OG, Chapkanov A, Temelkov S, Kotov S, Velikova M, Kamenska V, Vasilev K, Veith GD (2012) MetaPath: an electronic knowledge base for collating, exchanging and analyzing case studies of xenobiotic metabolism. Regul Toxicol Pharmacol RTP 63(1):84–96
Rudik A, Dmitriev A, Lagunin A, Filimonov D, Poroikov V (2015) SOMP: web server for in silico prediction of sites of metabolism for drug-like compounds. Bioinformatics 31(12):2046–2048
Lapins M, Worachartcheewan A, Spjuth O, Georgiev V, Prachayasittikul V, Nantasenamat C, Wikberg JES (2013) A unified proteochemometric model for prediction of inhibition of cytochrome P450 isoforms. PLoS ONE 8(6):66566
Iwatsubo T, Hirota N, Ooie T, Suzuki H, Shimada N, Chiba K, Ishizaki T, Green CE, Tyson CA, Sugiyama Y (1997) Prediction of in vivo drug metabolism in the human liver from in vitro metabolism data. Pharmacol Ther 73(2):147–171
Ekins S, Ring BJ, Grace J, McRobie-Belle DJ, Wrighton SA (2000) Present and future in vitro approaches for drug metabolism. J Pharmacol Toxicol Methods 44(1):313–324
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34:72–668 (Database issue)
Preissner S, Kroll K, Dunkel M, Senger C, Goldsobel G, Kuzman D, Guenther S, Winnenburg R, Schroeder M, Preissner R (2010) SuperCYP: a comprehensive database on cytochrome P450 enzymes including a tool for analysis of CYP–drug interactions. Nucleic Acids Res 38:43–237 (Database issue)
Erhardt PW (2003) A human drug metabolism database: potential roles in the quantitative predictions of drug metabolism and metabolism-related drug–drug interactions. Curr Drug Metab 4(5):22–411
Mak L, Marcus D, Howlett A, Yarova G, Duchateau G, Klaffke W, Bender A, Glen RC (2015) Metrabase: a cheminformatics and bioinformatics database for small molecule transporter data analysis and (Q)SAR modeling. J Cheminform 7:31. doi:10.1186/s13321-015-0083-5
Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, Liu Y, Djoumbou Y, Mandal R, Aziat F, Dong E, Bouatra S, Sinelnikov I, Arndt D, Xia J, Liu P, Yallou F, Bjorndahl T, Perez-Pineiro R, Eisner R, Allen F, Neveu V, Greiner R, Scalbert A (2013) HMDB 3.0—the human metabolome database in 2013. Nucleic Acids Res 41:7–801. doi:10.1093/nar/gks1065 (Database issue)
Hoffmann MF, Preissner SC, Nickel J, Dunkel M, Preissner R, Preissner S (2014) The Transformer database: biotransformation of xenobiotics. Nucleic Acids Res 42:7–1113 (Database issue)
Jeliazkova N (2012) Web tools for predictive toxicology model building. Expert Opin Dug Metab Toxicol 8(5):1–11
Frey JG, Bird CL (2013) Cheminformatics and the semantic web: adding value with linked data and enhanced provenance. Wiley Interdiscip Rev Comput Mol Sci 3(5):465–481
Goldmann D, Montanari F, Richter L, Zdrazil B, Ecker GF (2014) Exploiting open data: a new era in pharmacoinformatics. Future Med Chem 6(5):503–14
Williams AJ, Harland L, Groth P, Pettifer S, Chichester C, Willighagen EL, Evelo CT, Blomberg N, Ecker G, Goble C, Mons B (2012) Open PHACTS: semantic interoperability for drug discovery. Drug Discov Today 17(21–22):1188–98
DeLeon A, Patel NC, Lynn Crismon M, Aripiprazole A (2004) A comprehensive review of its pharmacology, clinical efficacy, and tolerability. Clin Ther 666(5):26–649
Willighagen EL, Jeliazkova N, Hardy B, Grafström RC, Spjuth O (2011) Computational toxicology using the OpenTox application programming interface and Bioclipse. BMC Res Notes 4:487
Jeliazkova N, Jeliazkov V (2011) AMBIT RESTful web services: an implementation of the OpenTox application programming interface. J Cheminform 3:18
Jeliazkova N, Kochev N (2011) AMBIT-SMARTS: efficient searching of chemical structures and fragments. Mol Inform 30(8):707–720
Burger M (2015) ChemDoodle web components: HTML5 toolkit for chemical graphics, interfaces, and informatics. J Cheminform 7(1):1–7
Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR, Evelo C (2008) WikiPathways: pathway editing for the people. PLoS Biol 6(7):184
Kutmon M, Riutta A, Nunes N, Hanspers K, Willighagen EL, Bohler A, Mélius J, Waagmeester A, Sinha SR, Miller R, Coort SL, Cirillo E, Smeets B, Evelo CT, Pico AR (2016) WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res 44(D1):488–494
Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J, Murray-Rust P, Steinbeck C, Wikberg JES (2007) Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinform 8:59
OS, PR and NJ: Original idea, database and GUI design. EW, CE: Experimental data and interpretations. All authors read and approved the final manuscript.
Funding was received from Stiftelsen Olle Engkvist Byggmästare, and the Swedish strategic research program eSSENCE. During the XMetDB project, we were shocked by the tragic death of Patrik Rydberg. We would here like to acknowledge his scientific contributions in the field of xenobiotic metabolism and will continue the XMetDB project in his memory.
The authors declare that they have no competing interests.
XMetDB is available at http://www.xmetdb.org. The data is available under the CC-BY license while the software is licensed under the LGPL.