MORTAR aims to support workflows for molecular in silico fragmentation and substructure analysis as well as the development of new fragmentation methods. The Java rich client application provides a graphical user interface for visualising the fragmentation results of individual molecular compounds or whole sets of molecules. It supports performing fragmentation with a single fragmentation algorithm or with a pipeline that can be any combination of the integrated fragmentation algorithms. Furthermore, MORTAR allows straightforward integration of additional fragmentation and substructure analysis methods.
Import and molecules tab
Single molecular compounds or sets of molecules can be imported into MORTAR from various file formats. Text files containing SMILES codes and structure-data files (SDF) [45] can be used to import molecule sets. Individual molecular compounds can be read as Molfiles, where V2000 and V3000 are supported [45].
Figure 3 depicts the MORTAR Molecules tab displaying the structures of the COCONUT natural products database imported from a SMILES file. They can be browsed with a pagination tool and sorted by name in the table. Moreover, the molecules can be selected or deselected for fragmentation.
A single fragmentation process with a selected fragmentation algorithm can be started with the button in the bottom left corner. It is labelled with the selected fragmentation algorithm, which can be changed in a respective menu. The Ertl algorithm for functional group detection is selected for the fragmentation process in Fig. 3. In its MORTAR implementation, it returns identified functional groups and resulting alkane remnants as fragments by default.
The settings of the individual fragmentation algorithms and the MORTAR application in general can be adjusted via separate dialogs (see Figs. 4 and 2). The view for the settings of the algorithms is automatically extended for a new algorithm when it is integrated. All settings are made persistent by line-based text files.
Fragmentation results
Two new tabs open when a fragmentation job is completed: Fragments tab and Items tab. If multiple jobs are executed consecutively, the two result tabs are generated for each of them and remain open. The content of result views can be exported as a PDF file or as a text file in CSV format. The PDF file contains the 2D structures in addition to the information displayed as text. The resulting fragments can be exported as SD, Mol, or PDB files. The librepdf library version 1.3.26 [46] is used for PDF export.
Fragments tab
The Fragments tab visualises the resulting fragments. Similar to the Molecules tab, the resulting fragments are displayed on multiple pages. In addition to the SMILES representation and the 2D structure of each fragment, one column shows a randomly selected parent molecule of the corresponding fragment (see Fig. 5). The Frequency column indicates how often the corresponding fragment appears in the fragmented set of molecules. The column Molecule Frequency contains the number of molecules in which this fragment appears. In Fig. 5, the five most frequent fragments (functional groups and alkane remnants) resulting from the Ertl algorithm analysis of the COCONUT natural product database are displayed. The most frequent fragment is a hydroxy group connected to an aliphatic carbon atom. Almost as frequently, a single aliphatic carbon atom or methyl group was detected, followed by an ether group that is only half as frequent. In fourth place, there is a hydroxy group connected to an aromatic carbon atom, as it results, i.a., from a phenol (the aromatic character of the attached carbon atom is indicated in the SMILES representation of the group, “[H]Oc”, by lower-case letter notation; this corresponds to the fragment depiction where the carbon atom is not fully saturated with hydrogen atoms). The fifth most frequent FG is an alkene group occurring 165,396 times. The fragments are sorted according to their absolute frequency in Fig. 5. If they would be sorted according to their molecule frequency, the ranking would be different with the methyl group being at the top. The comparatively high number of hydroxy groups detected in absolute numbers as opposed to the molecule frequency means that natural products usually have multiple substituents of this type (3.3 on average, dividing the frequency by the molecule frequency). One type of structure that may be responsible for this trend is glycosidic moieties that occur frequently in natural products [30]. The results of this proof-of-concept analysis using MORTAR are in general agreement with an analogous systematic study of functional group frequencies in natural products conducted by Ertl et al. [28].
Items tab
The Items tab also visualises the results of a fragmentation process. Here, however, the fragments are assigned to their individual molecules from the originally imported set. Each molecule is shown with its name, its 2D structure, and the 2D structures of its fragments, including the frequencies of how often the respective fragment appears in this molecule (see Fig. 6).
Pipelining
In addition to executing a single method of fragmentation, MORTAR offers the option of executing fragmentation pipelines, which can be defined and executed with any combination of the integrated fragmentation algorithms where each selected algorithm may have its own individual settings. The Pipeline Settings view (see Fig. 7) provides a straightforward way to create a pipeline by adding new methods via buttons and choice boxes. A simple application example of the pipelining functionality is to apply a Sugar Removal Utility (SRU) processing step to remove terminal glycosidic moieties from the studied molecules, as it is usually done in chemical space analyses prior to another fragmentation step to avoid redundancies.
A more sophisticated example study for the MORTAR pipelining functionality is inspired by the recent work of Peter Ertl to identify the most common substituents in natural products [47]. Here, NP structures were first deglycosylated, and (ring) substituents were recursively extracted. To set up a similar pipeline in MORTAR, the first step has to be a Sugar Removal Utility processing configured to only return aglycones of input structures. The second step would be a Scaffold Generator fragmentation that only returns side chains. A recursive fragmentation of the side chains is currently not included in MORTAR but for demonstration purposes, an Ertl algorithm processing can be chosen to extract functional groups from them. This pipeline is shown in Fig. 7 and described in more detail in the MORTAR tutorial that can be found on GitHub [48]. Figure 8 depicts the five most frequent functional groups resulting when this pipeline is applied to the natural product structures taken from COCONUT. The most frequent functional group identified in ring side chains is an ether or a hydroxy group (both belong to the same group because the bonds to ring atoms are cut without preserving any information). Following is a hydroxy group connected to an aliphatic carbon atom. In the depicted Sample Parent structure in row 2, this group results from the ether group that connects the side chain with the ring. When it is cleaved and saturated with hydrogen, a hydroxy group results. The third functional group is a carboxylic acid functionality. The alkene functionality is identified as the fourth-frequent substituent, followed by the ester functionality.
Histogram view
To get an overview of the fragment frequencies, a histogram can be created with both types of frequency—the Frequency, which indicates how often the respective fragment occurs in the fragmented set of molecules, or the Molecule Frequency, which indicates the number of molecules in which a fragment occurs. Figure 9 shows a MORTAR histogram view of the ten most frequent fragments of the COCONUT database fragmented with ErtlFunctionalGroupsFinder in default settings (compare Fig. 5). The fragments are sorted according to their absolute frequencies and can also be displayed as a 2D structure image by hovering over the bar of the desired fragment. The fragment structure image is displayed in the lower right corner of the histogram.
Performance
Performance snapshots of MORTAR were performed for different fragmentation processes on two different hardware systems. For this purpose, the complete COCONUT database was used first. The fragmentation algorithm employed for the first snapshot was ErtlFunctionalGroupsFinder with default settings. On a standard notebook using eight cores of an Intel(R) Core(TM) i7-8750H CPU [49] and allocating 20 GB of memory to the JVM, MORTAR took 100 s to decompose the 406,747 molecules of the COCONUT database imported as SMILES codes into 35,791 fragments and post-process them. On the same machine with the same configuration, MORTAR needed 154 s to decompose the COCONUT database using the Sugar Removal Utility with default settings. Using the Scaffold Generator (Fragmentation type setting set to SCAFFOLD_ONLY, i.e., generate the molecular scaffold of a structure but do not dissect it further; default settings for the rest) with eight cores on the machine described above, MORTAR took 64 s to process COCONUT.
Using 12 cores of an Intel(R) Xeon(R) Gold Processor 6254 workstation CPU [50] and 250 GB memory for the JVM resulted in a computation time of 229 s to fragment the 2,136,187 molecules of the ChEMBL30 database [51, 52] into 62,722 distinct fragments using ErtlFunctionalGroupsFinder with default settings. With more than 24 parallelised threads, no further performance increase could be achieved.