Skip to main content

Table 1 The number of compounds and average properties of molecules of the analyzed datasets and their drug-like subsets

From: The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

Dataset

Type

Whole set

Drug-like set, % of the total set

N

Average

T (°C)

MW

NA

PATENTS

Training

241,958

159

357

25

89

 Decomposing

Training

13,785

209

358

25

76

 Non-decomposing

Training

228,173

155

357

25

93

Bergström

Validation

277

151

295

20.8

92

Bradley

Validation

2878

59

174

11.4

53

OCHEM

Validation

21,832

117

249

16.7

73

Enamine

Validation

22,449

143

223

14.9

91

COMBINED

Validation, merge of four sets

47,436

126

233

15.6

81

  1. MW molecular weight, NA number of non-hydrogen atoms