Skip to main content
Fig. 1 | Journal of Cheminformatics

Fig. 1

From: DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology

Fig. 1

The workflow of the training process of our deep learning-based molecule generator DrugEx2 utilizing reinforcement learning. After the generator has been pre-trained/fine-tuned, (1) a batch of SMILES are generated by sampling tokens step by step based on the probability calculated by the generator; (2) These valid SMILES are parsed to be molecules and encoded into descriptors to get the predicted pXs with predictors; (3) The predicted pXs are transformed into a single value as the reward for each molecule based on Pareto optimization; (4) These SMILES sequences and their rewards are sent back to the generator for training with policy gradient methods. These four steps constitute the training loop of reinforcement learning

Back to article page