Skip to main content

Table 1 QSPR ready datasets

From: The influence of solid state information and descriptor selection on statistical models of temperature dependent aqueous solubility

Dataset name

Instance identifiera

Extra filtering?

No. instances

Total no. descriptorsb

Endpoint

Sourcec

Integrated with crystal structures?

Avdeef_ExDPs_CS_False

[name]_[CAS no.]_[polymorph description]

Low quality data points removed [17]

364

4776

Enthalpy of solution

Avdeef [17] derived dataset

No

Avdeef_ExDPs_Cal_CS_False

[name]_[CAS no.]_[polymorph description]

Only calorimetry data points retained [17]

50

4776

Klimenko_CS_False

[name]_[CAS no.]_[polymorph description]_[temperature value]

No

882

3764

Solubility at some defined temperature

Klimenko et al. [14] derived dataset

Avdeef_ExDPs_CS_True

[name]_[CAS no.]_[refcode]

Low quality data points removed [17]

169

4820

Enthalpy of solution

Avdeef [17] derived dataset

Yes

Avdeef_ExDPs_Cal_CS_True

[name]_[CAS no.]_[refcode]

Only calorimetry data points retained [17]

30

4820

Klimenko_CS_True

[name]_[CAS no.]_[refcode]_[temperature value]

No

530

3808

Solubility at some defined temperature

Klimenko et al. [14] derived dataset

  1. aThe [name] (or [CAS no.]) and/or [polymorph description] could be “none”, denoting the absence of the relevant information
  2. bThis denotes the complete set of all 2D molecular, temperature (for the solubility datasets), melting point and, for crystal structure integrated datasets, lattice energy and 3D descriptors calculated for instances in this dataset. Different subsets were considered for different models, as described under “Descriptor combinations investigated”
  3. cSee Fig. 1