Small molecule autoencoders: architecture engineering to optimize latent space utility and sustainability

Table 1 Model architectures considered for latent space evaluation

Name	Variational autoencoder	Architecture	Molecule representation	Enumerated
SMILES-AE-can2can	No	GRU, 128 hidden and latent size, 3 layers, attention	SMILES	No
SMILES-AE-enum2can	No	GRU, 128 hidden and latent size, 3 layers, attention	SMILES	Yes (input)
SMILES-AE-can2enum	No	GRU, 128 hidden and latent size, 3 layers, attention	SMILES	Yes (output)
SMILES-VAE-can2can	Yes	GRU, 128 hidden and latent size, 3 layers, attention	SMILES	No
SELFIES-can2can	No	GRU, 128 hidden and latent size, 2 layers, attention	SELFIES	No
SELFIES-enum2can	No	GRU, 128 hidden and latent size, 2 layers, attention	SELFIES	Yes (input)
SELFIES-can2enum	No	GRU, 128 hidden and latent size, 2 layers, attention	SELFIES	Yes (output)
SELFIES-VAE	Yes	GRU, 128 hidden and latent size, 2 layers, attention	SELFIES	No

Architectures are chosen based on the results of the additive optimisation for SMILES and SELFIES

ISSN: 1758-2946