Skip to main content

Table 5 RMSE of models developed with filtering of outliers

From: The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

 

No filtering

0.001 (N = 1414)a

0.01 (N = 4727)

0.1 (N = 21,928)

PATENTS

 CDK

38.9 (36.2)

38.9 (36.1)

38.8 (36.1)

38.9 (36.1)

 Isida Fragmentor

38.5 (35.5)

38.4 (35.4)

38.3 (35.2)

38.2 (35.2)

 ChemAxon

40.1 (37.1)

40 (37.1)

40.1 (37.1)

40.1 (37.2)

 QNPR

39.7 (36.6)

39.7 (36.3)

39.4 (36)

39.2 (35.9)

 E-state

38.3 (35.6)

38.1 (35.6)

38.1 (35.5)

38.0 (35.5)

 Consensus

36.3 (33.3)

36.2 (33.3)

36.3 (33.2)

36.4 (33.5)

COMBINED

 CDK

51.6 (35.6)

51.3 (35.5)

50.8 (35.5)

49.9 (35.4)

 Isida Fragmentor

47.6 (35.9)

47.5 (35.6)

47.2 (35.3)

47.3 (35.4)

 ChemAxon

49.7 (36.5)

49.6 (36.5)

49.5 (36.4)

49 (36.5)

 QNPR

50.2 (38.1)

50.5 (37.8)

49.9 (37.7)

49.5 (37.6)

 E-state

45.9 (35.4)

46.1 (35.4)

45.8 (35.3)

45.8 (35.2)

 Consensus

46.5 (33.4)

46.3 (33.4)

46.2 (33.3)

46.1 (33.4)

  1. aThe numbers in parentheses indicate the number of molecules detected as outliers and filtered from the PATENTS set. The RMSE values for the PATENTS set are calculated for all molecules in this set (including the outliers)