Skip to main content
Fig. 3 | Journal of Cheminformatics

Fig. 3

From: DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology

Fig. 3

The mechanism of the updated exploration strategy. Shown are the agent net GA, mutation net GM (red) and crossover net GC (blue). In the training loop, GM is fixed, Gc is updated iteratively and GA is trained at each epoch. For each position, a random number from 0 to 1 is generated. If it is larger than the mutation rate (ε), the probability for token sampling is controlled by the combination of GA and GC, otherwise, it is determined by GM

Back to article page