Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation

Table 3 CPU hours required for RL strategies to optimize the DRD2 docking score benchmark task to different thresholds

	CPU hours required for optimization beyond prior at a given threshold					CPU hours required for optimization beyond external thresholds
Threshold	120%	140%	160%	180%	200%	Inactive mean	Active mean	80% precision
REINFORCE	74 (0)	173 (0)	– (20)	– (34)	– (96)	2 (0)	103 (0)	177 (0)
REINFORCE + KL regularization	183 (0)	– (0)	– (33)	– (74)	– (216)	22 (0)	204 (0)	– (0)
REINVENT	79 (0)	– (0)	– (8)	– (164)	– (–)	4 (0)	93 (0)	– (0)
REINVENT 2.0	38 (0)	202 (0)	– (16)	– (53)	– (92)	12 (0)	51 (0)	198 (0)
BAR	– (0)	– (0)	– (32)	– (32)	– (–)	4 (0)	0 (0)	– (0)
Hill-Climb	44 (0)	114 (0)	177 (0)	218 (24)	– (85)	16 (0)	57 (0)	99 (0)
Hill-Climb + KL regularization	45 (0)	106 (0)	157 (0)	– (45)	– (45)	8 (0)	58 (0)	99 (0)
Hill-Climb*	11 (0)	31 (1)	52 (6)	– (15)	– (31)	2 (0)	11 (0)	24 (0)
Hill-Climb* + KL regularization	14 (0)	28 (0)	74 (1)	– (17)	– (17)	6 (0)	17 (0)	31 (0)
Augmented Hill-Climb	9 (0)	16 (0)	72 (0)	151 (14)	216 (15)	2 (0)	13 (0)	27 (0)

Time is representative of when the batch mean exceeds the respective internal / external threshold (time of the earliest sample exceeding threshold is shown in brackets). Run using an AMD Threadripper 1920 × CPU and Nvidia GeForce RTX 2060 super GPU. Failing to reach a threshold is marked by a “–”

ISSN: 1758-2946