Skip to main content

Table 2 Analyzed sets of descriptors

From: The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

Package name

Type of descriptorsa

Number of descriptors

Matrix size, billions

Number of descriptors after filtering

Non-zero values, millions

Sparsenessb

EFG

Binary

595

0.18

347

3.1

33

QNPR

Integer

1502

0.45

1040

6.3

49

MolPrint

Binary

688,634

205

197,367

8.1

7200

E-state count

Float

631

0.19

487

10

14

Inductive

Float

54

0.02

39

11

1

ECFP4

Binary

1024

0.31

1021

12

25

ISIDA

Integer

5886

1.75

2275

18

37

ChemAxon

Float

498

0.15

114

23

1.5

GSFrag

Integer

1138

0.34

469

24

5.7

CDK

Float

239

0.07

182

27

2

Adriana

Float

200

0.06

139

32

1.3

Mera, Mersy

Float

571

0.17

235

61

1.1

Dragon

Float

1647

0.49

911

183

1.5

  1. aThe dominating type of descriptors within the corresponding package
  2. bAverage number of zero entries per one non-zero value of the descriptor matrix