SmartGraph: a network pharmacology investigation platform

Motivation Drug discovery investigations need to incorporate network pharmacology concepts while navigating the complex landscape of drug-target and target-target interactions. This task requires solutions that integrate high-quality biomedical data, combined with analytic and predictive workflows as well as efficient visualization. SmartGraph is an innovative platform that utilizes state-of-the-art technologies such as a Neo4j graph-database, Angular web framework, RxJS asynchronous event library and D3 visualization to accomplish these goals. Results The SmartGraph framework integrates high quality bioactivity data and biological pathway information resulting in a knowledgebase comprised of 420,526 unique compound-target interactions defined between 271,098 unique compounds and 2018 targets. SmartGraph then performs bioactivity predictions based on the 63,783 Bemis-Murcko scaffolds extracted from these compounds. Through several use-cases, we illustrate the use of SmartGraph to generate hypotheses for elucidating mechanism-of-action, drug-repurposing and off-target prediction. Availability https://smartgraph.ncats.io/.


SmartGraph Relationships
Tested -on. The relation is defined between a compound C and a target node T and represents experimentally determined bioactivity, i.e. drug-target interaction (DTI). This relation is attributed with the aggregated AC50/IC50/EC50 bioactivity values between the compound and target at hand, in the unit of µM. If multiple values are reported for the same C -T relation, then these values are aggregated as the median of such values. The edge representing this relation is directed: the start and end-nodes is C and T, respectively.
An important distinction was made between compounds in the context of this relation. That is, an activity threshold was defined to identify so-called potent compounds of a target. The activity cutoff attributes reflect the 80 th percentile of the potency values of all compounds tested on the target at hand. Of note, determining the activity cutoff involved all interactions regardless of the type of the bioactivity, e.g. if it is an AC50, IC50 or EC50 value. The unit of the activity cutoff is -logM. If the number of interactions a target is associated with is <=5 than the activity cutoff was set to 7.
Of note, an additional attribute, e.g. type of activity, is also associated with DTI relations.
However, currently SmartGraph includes both inhibitory and stimulatory relations, hence this attribute is set uniformly to: "activity". In case other type of activity types are incorporated into the knowledge base this attribute can accommodate other type of activation data, e.g. inhibition, activation, dissociation constant.
Regulates. The relation is defined between two target nodes T1 and T2. The edge representing the relation is directed: the edge defined by start-node T1 and end-node T2 represents the regulatory relation where T1 regulates T2. PPI relations were extracted from the SIGNOR database [8].
Pattern -of. The relation is defined between a compound C and pattern P. In the current release of SmartGraph P is a Bemis-Murcko scaffold [6] of C. The edge representing the relation is directed: the start-node is P and the end-node is C. The overlap ratio is an attribute of the edge that gives the ratio between the number of heavy atoms in the P and C. The relation is attributed by the, the overlap between the scaffold and compound expressed as the fraction of overlapping heavy atoms.
Potent pattern -of. Potent compounds of a target T are collected. Given a compound C in this collection, an associated pattern P of C is the potent pattern of T. Of note, it is possible that inactive compounds of the target at hand also contain this pattern. While this is a known phenomenon in the case of structure-activity relationship (SAR) series, SmartGraph utilizes a permissive strategy to maximize the likelihood of identifying active (potent) chemotypes. DTI relations extracted from ChEMBL database are attributed by the internal identifier of the involved compound and target nodes, activity is expressed in unit of µM.
The direction of PPI and DTI relations is taken into account when analyzing the network as contrast to the direction of "pattern-of" and "potent-pattern-of" relations.

Bioactivity Data Aggregation
Substances of ChEMBL database were converted to compounds by keeping only the largest component (CDK KNIME node [9], [10]). Next, InChI-keys and Morgan fingerprint [11], [12] of radius 3 and length of 2048 bits were generated for those compounds (RDKit KNIME node [13], [14]). Compounds associated with multiple fingerprints were removed from the data set (157 such compounds were found). The ChEMBL IDs of targets were resolved to UniProd IDs, leading to many-to-many relations. Accordingly, original DTIs (compound -target ChEMBL ID) were expanded to all possible compound-UniprotID tuples. In creating the knowledge base, only targets annotated as "SINGLE TARGET" according to the "chembl2uniprot" file, which is a part of the ChEMBL distribution, were used. Unique interactions and associated potency values were obtained by aggregating bioactivities according to their median values. In this process, we used the InChI keys and UniProt IDs to identify aggregate bioactivity data. Of note, only those bioactivities were aggregated that were associated with only one type of activity: AC 50, IC50 or EC50. Otherwise, the respective DTIs were ignored in deriving the knowledge base of SmartGraph.
BM-Scaffolds of compounds were detected in KNIME with the help of RDKit node. Scaffolds were deduplicated using 'Group By' KNIME Node and using the scaffold structure as key. InChI keys for scaffolds were generated by RDKit KNIME node.  Table S1. Compounds involved in the use-cases.