Skip to main content

Table 4 Summary statistics of detected homologous series with CH2 repeating units in the three datasets. The algorithm’s default settings were used, as listed in Table 2. Full details and results are available in Additional file 1: Sect. 2

From: An algorithm to classify homologous series within compound datasets

 

NORMAN-SLE (n = 98,116)

PubChemLite (n = 392,465)

COCONUT (n = 407,270)

No. of homologous series detected

2098

12,105

5329

No. of molecules classified as members of homologous series

8775

82,476

18,528

No. of molecules consisting purely of CH2 repeating units

0

0

0

No. of molecules containing CH2 repeating units but not forming homologous series (unique cores)

10,778

35,111

36,864

No of molecules not containing CH2 repeating units

78,559

274,861

351,527

No of molecules discarded from analysis (failed sanitation)

4

17

351