Skip to main content
Fig. 3 | Journal of Cheminformatics

Fig. 3

From: MAW: the reproducible Metabolome Annotation Workflow for untargeted tandem mass spectrometry

Fig. 3

Overview of the candidate selection performed within MAW-Py. The workflow takes the results of top-scoring candidates from all sources and merges the list of top-scoring SMILES from these sources in a single CSV file. These SMILES are then searched for structural similarity. For the pairwise “Candidate Similarity” calculation part, the SMILES of candidate compounds with Tanimoto similarity scores of < 0.85 when compared with other SMILES in the list of top scoring candidates, are discarded. For the rest of the SMILES, a Maximum Common Substructure (MCSS) is calculated for each pair of SMILES in the list. Based on this MCSS calculation, a TSV file is generated which can be used in Cytoscape to visualize the chemical similarity among the candidates. Candidates that belong to a cluster with the most number of candidates from top ranks should be considered as the most probable structures and substructures for the particular feature. For the “Candidate Identity” part, the threshold is >  = 0.99. The candidate identity among SMILES leads to a single structure as the top-ranking candidate for each feature. If there is no identity, or the sources provide different top-ranking structures, a prioritization is performed. In total, there are four sources (SIRIUS, GNPS, HMDB, and MassBank). The scheme is—three sources with the same candidate > two sources with the same candidate > single source (GNPS) > single source (SIRIUS) > single source (MassBank or HMDB). The scheme is defined from the results obtained with known compounds from the standards dataset from diatoms

Back to article page