Fig. 8From: An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptorThe average score of generated SMILES sequences during the training processes of deep reinforcement learning with different ε, β and Gφ. The pre-trained model on the ZINC set (a) and the fine-tuned model on the A2AR set (b) were used as Gφ. After 200 epochs, the average scores for all training processes converged and whole of these models were well trainedBack to article page