Skip to main content
Fig. 1 | Journal of Cheminformatics

Fig. 1

From: An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor

Fig. 1

Architecture of recurrent neural networks for the training and sampling processes with A2AR antagonist ZM241385 as an example. a In the training process of RNNs, each molecule is decomposed to a series of tokens and then taken as input. Subsequently, the input and output are combined with a start token and an end token, respectively. b Beginning with the start token “GO”, the model calculates the probability distribution of each token in the vocabulary. For each step, one of the available tokens is randomly chosen based on the probability distribution and is again received by RNNs as input to calculate the new probability distribution for the next step. The maximum of steps was set as 100 and the process will end if the end token “EOS” is sampled or the maximum of steps is reached

Back to article page