Skip to main content
Fig. 4 | Journal of Cheminformatics

Fig. 4

From: Sachem: a chemical cartridge for high-performance substructure search

Fig. 4

Noteworthy samples from three main classes of performance outliers identified during development: a PubChem compound CID20652954 (together with many similar compounds) is likely not a real molecule. Nevertheless, since there is no general method to identify a non-existent compound, it is not possible to reliably filter them out from the database. Trying to find an odd-length carbon cycle substructure in CID20652954 causes a complexity explosion; for example, matching the cycloheptadecane structure in it takes tens of minutes in all available cartridges before failing. b Matching a query that contains n benzene rings (above) in a compound that contains n or more benzene rings, but can only accommodate \(n-1\) non-overlapping benzene rings (below) backtracks 12 times for each possible individual benzene position in the target molecule. In total, \(\mathcal {O}((12n)^n)\) different atom permutations must be examined before the query fails. c A multi-fragment query that is too simple to produce any fingerprint information with enough filtering power for efficient screening. The performance of evaluating such queries mainly depends on the efficiency of data serialization and deserialization at the software interfaces between the back-end database and user

Back to article page