Skip to main content

Table 2 Dataset

From: Transformer-based molecular optimization beyond matched molecular pairs

Datasets Training (2000-2017) Validation (2018) Test (2019-2020)
MMPs 2,287,588 143,978 166,582
Similarity (\(\ge\)0.5) 6,543,684 418,180 475,070
Similarity ([0.5,0.7)) 4,543,472 286,682 327,606
Similarity (\(\ge\)0.7) 2,000,212 131,498 147,464
Scaffold 2,850,180 171,914 199,786
Scaffold generic 4,127,058 255,580 289,034