Fig. 2From: Diversity oriented Deep Reinforcement Learning for targeted molecule generationThe outline of the training procedure for the SMILES string ‘GCNC(C)=OE’. For each step t, the input of the model is the vectorized token yt, and the output is the probability of the token predicted by the model being the correct token. The goal of the training process is to maximize this probabilityBack to article page