Fig. 7

Precision-Recall curves for the Evaluation with Test Sets experiments. As the data in the TRAIN-SOIL package is more representative for the evaluated TEST-SOIL package in terms of chemical and biological properties compared to the EAWAG-BBD package, the relative reasoning model trained without the EAWAG-BBD package is more compatible with the evaluation data set. We can see that the Multi-Generation evaluation approach better reflects the compatibility between compound structures and the transformation rules used to train the model