Skip to main content

EC-Conf: A ultra-fast diffusion model for molecular conformation generation with equivariant consistency

Abstract

Despite recent advancement in 3D molecule conformation generation driven by diffusion models, its high computational cost in iterative diffusion/denoising process limits its application. Here, an equivariant consistency model (EC-Conf) was proposed as a fast diffusion method for low-energy conformation generation. In EC-Conf, a modified SE (3)-equivariant transformer model was directly used to encode the Cartesian molecular conformations and a highly efficient consistency diffusion process was carried out to generate molecular conformations. It was demonstrated that, with only one sampling step, it can already achieve comparable quality to other diffusion-based models running with thousands denoising steps. Its performance can be further improved with a few more sampling iterations. The performance of EC-Conf is evaluated on both GEOM-QM9 and GEOM-Drugs sets. Our results demonstrate that the efficiency of EC-Conf for learning the distribution of low energy molecular conformation is at least two magnitudes higher than current SOTA diffusion models and could potentially become a useful tool for conformation generation and sampling.

Scientific Contributions

In this work, we proposed an equivariant consistency model that significantly improves the efficiency of conformation generation in diffusion-based models while maintaining high structural quality. This method serves as a general framework and can be further extended to more complex structure generation and prediction tasks, including those involving proteins, in future steps.

Key points

  • A novel ultra-fast equivariant diffusion model, EC-Conf, was proposed for low-energy conformation generation by construction of a consistency process.

  • Compared with other SOTA diffusion models running with thousands denoising steps, EC-Conf can achieve comparable quality with only one sampling step and keep improving with a few more sampling iterations.

  • The efficiency of EC-Conf is at least two magnitudes higher than current SOTA diffusion models.

  • The EC-Conf is universal and can be easily extended to various conformation generation tasks such as protein–ligand docking pose.

Introduction

Three dimensional conformations of a molecule can largely influence its biological and physical properties and biological active conformations are usually low-energy conformers [1]. Thus, many drug design strategies, including structure-based [2] or ligand-based virtual screening [3,4,5], three-dimensional quantitative structure–activity relationships (QSARs) [6], and pharmacophore modeling [7], require a fast 3D conformation generation and elaboration protocol for sampling biologically relevant conformations. For the former task, programs such as CONCORD [8], CORINA [9], and OMEGA [10] are the most popular applications. These approaches make use of known optimal geometries of molecular fragments that are used as templates for constructing reasonable, low energy 3D models of small molecules. These methods usually utilize fallback strategies for structure generation when novel structures appear and belong to rule based approaches. However, because producing a single 3D structure of a flexible molecule is almost invariably followed by conformational elaboration, it is the latter process that is both the time and quality bottleneck.

With the advancement of deep learning technologies, deep learning methods have been used to learn the distribution of 3D bioactive conformation and generate 3D conformer directly. For a given molecules, one of the most valuable applications of conformation generative model is to generate the conformation ensembles satisfied Boltzmann distributions, which is critical for fast-estimating the free-energy. The biggest difference between the above-mentioned traditional 3D conformation generators and deep learning-based conformation generative models is that those generative models don’t rely on explicit rules to construct 3D conformation but learn implicitly the distribution of conformation data. Mansimov and coworkers reported the early attempt CVGAE to generate 3D conformations in Cartesian coordinates using the variational autoencoder (VAE) architecture in one-shot [11]. However, its performance is not comparable with traditional rule-based methods. Simm et al. proposed a conformation generative model based on distance geometry instead of directly modeling the distributions on Cartesian coordinates, named Graph-DG [12], while ConfVAE take the distribution of distances as intermediate variables to generate conformations [13]. Ganea et al. proposed GeoMol [14], which, at the first step, constructs the local structure (LS) by predicting the coordinates of non-terminal atoms, and then refines the LS by the predicted distances and dihedral angles. The quality of generated conformation in these one-shot methods has reached the level of traditional methods on fragments, while there are still improving rooms for drug-like molecules.

Another type of attempt explores the construction of conformation via a set of sequential samplings instead of one-shot generation. Xu et al. proposed CGCF method combining the advantages of normalizing flows and energy-based approaches to improve the prediction of multimodal conformation distribution [15]. Then, Xu et al. proposed another score-based method ConfGF by learning the pseudo-force on each atom via force matching and obtaining new conformations via Langevin Markov chain Monte Carlo (MCMC) sampling on the distance geometry [16]. Its performance on the GEOM-Drugs dataset is comparable to that of a rule-based method called experimental-torsion-knowledge distance geometry (ETKDG), which is a conformation generation model implemented in RDKit [17, 18]. Xia and Liu et al. proposed DMCG trained with a dedicated loss function [19]. GeoDiff [20]and SDEGen [21] employed diffusion model for conformation generation and can directly predict the coordinates without using intermediate distances. Similarly, models based on torsion diffusion was also proposed to generate conformations in torsion space instead of Cartesian coordinates [22]. Although these diffusion methods based on DDPM [23], SGMs [24, 25] or stochastic differential equations (Score SDEs) [26] improve the quality of generated conformations of drug-like molecules, the slow sampling speed, which due to large number of diffusion/denoising iteration, still limit their application.

More recently, Song et al. proposed consistency model, a new diffusion model which can generate high quality samples by directly mapping noise to data instead of through the reverse-time SDE [27]. Their results show that consistency models only require a few steps (2 ~ 5) of refinement for high quality image generation, which inspired us to incorporate the consistency diffusion process into molecular conformation generation. Here, we proposed an equivariant consistency model (EC-Conf) for fast diffusion of molecular conformation, largely achieving a balance between the conformation quality and computational efficiency. EC-Conf is inspired from the consistency model based on the probability flow with ordinary differential equation (ODE) and can smoothly transform the conformation distribution into a noise distribution with a tractable trajectory satisfied SE (3)-equivariance. Instead of solving the reverse-time stochastic differential equation (SDE) in diffusion models, EC-Conf can either directly map noise vector of prior Gaussian distribution to low-energy conformation in Cartesian coordinates or the solutions in the same diffusion trajectory to molecular conformation. These characteristics are critical for decreasing the number of iterations while still keeping its high capacity to approximate the complex distribution of conformations. The performance of EC-Conf is evaluated in both GEOM-QM9 and GEOM-Drugs datasets. Our model out-performs non-diffusion models and is comparable with SOTA diffusion models, but with at least two magnitudes higher denoising efficiency.

Theory and methods

Problem definition

Molecular conformer generation is defined as a conditional generative problem, which generating low-energy conformations C with a given 2D molecular graph \(G\). For given multiple molecular graphs \(\mathcal{G}\), we aim to learn a transferable generative model \({p}_{\theta }\left(C|G\right)\) mapping the Boltzmann distribution of C under condition of \(G\in \mathcal{G}\) to a prior Gaussian distribution.

Diffusion process for conformation generations

As mentioned above, high quality conformation generations are still challenging for drug-like molecules and big bio-molecules. Among commonly used generative models, GAN models are suffering from the unstable training and less diversity in generation due to their adversarial training nature [28]. VAE models rely on a surrogate loss, limits its performance on high quality generation [29]. Similarly, Flow-based models have to employ a specialized architectures to construct reversible transform [30]. Compared with GAN, VAE and Flow-based models, diffusion-based methods are generally more stable, robust for training, can capture the underlying data distribution more comprehensively and produce high-quality, high-fidelity conformations but suffering from the sampling efficiency in conformation generations [31].

In DDPM-based diffusion process like GeoDiff, noise from fixed posterior distributions \(q({C}^{t}|{C}^{t-1})\) is gradually added until the ground truth conformation \({C}^{0}\) is completely destroyed with \(T\) steps. During generation process, an initial state \({C}^{T}\) is sampled from standard Gaussian distribution, the conformation is progressively refined via the model learned Markov kernels \({p}_{\theta }({C}^{t-1}|G,{C}^{t})\) for a given molecular graph \(G\). Song et al. have demonstrated that this diffusion process can be described as a discretization process on the time and noise in form of SDE as defined in Eq. (2). [32]

$$dC=f\left(C,t\right)dt+g(t)dw$$
(1)

This process can be reversed by solving the following reverse-time SDE:

$$dC=\left [f\left(C,t\right)-g{\left(t\right)}^{2}{\nabla }_{C}{\text{log}}\,{p}_{t}\left(C\right)\right]dt+g(t)d\overline{w }$$
(2)

where, \(f(C,t)\) and \(g\left(t\right)\) are diffusion and drift functions of the SDE, \(w\) and \(\overline{w }\) are the standard Brownian motion when time flows forward and backwards respectively, and \({\nabla }_{C}log{q}_{t}\left(C\right)\) is the gradient of the log probability density \({\nabla }_{C}logp\left(C\right).\) The rough propagation path for between target distributions and prior gaussian distributions largely limits the efficiency of sampling.

The denoising process can also be described in the form of ordinary differential equation (ODE), representing the probability flow to the same marginals as the reverse-time SDE, i.e. the PF ODE method:

$$dC= [f\left(C,t\right)-1/2g{\left(t\right)}^{2}{\nabla }_{C}{\text{log}}\,{p}_{t}(C)]dt$$
(3)

It allows a smoother propagation between distributions to accelerate the sampling phase. However, ODE trajectory still follows the chain rules, limiting the efficiency.

Karras et al. further simplified the Eq. (3) by setting \(f\left(C,t\right)=0\) and \(g\left(t\right)=\sqrt{2t}\). In this case, the \({p}_{t}\left(C\right)={p}_{data}\left(C\right)\otimes \mathcal{N}\left(0,{T}^{2}I\right)\) and a empirical PF ODE was obtained as shown in Eq. (4) [33]:

$$\frac{d{C}_{t}}{dt}=-t{s}_{\phi }({C}_{t},t)$$
(4)

which allows us to sample \({\widehat{C}}_{T}\) from \(\pi =\mathcal{N}\left(0,{T}^{2}I\right)\) to initialize the empirical PF ODE and solve it backwards in time with any numerical ODE solver including Euler and Heun solvers. From there, we then get a solution trajectory \({\left\{{C}_{t}\right\}}_{t\in [0,T]}\) and the \({\widehat{C}}_{0}\) is an approximate sample from the conformation distribution.

Based on the ODE trajectory \({\left\{{C}_{t}\right\}}_{t\in [0,T]}\), Song et al. proposed the consistency model by directly mapping the points on the same trajectory to the same initial point in diffusions, greatly improving the sample efficiency [27].

Equivariant consistency models for conformation generations

Inspired from the consistency model, we proposed a novel equivariant diffusion model, named EC-Conf, for ultra-fast diffusion of molecular conformation in cartesian system, largely achieving a balance between the conformation quality and computational efficiency.

Basic Concepts. Given a solution trajectory \({\left\{{C}_{t}\right\}}_{t\in [\epsilon ,T]}\) of the ODE in Eq. (4), where \(\epsilon \to 0\), we desire an equivariant consistency function \(f:\left({C}_{t},t|G\right)\to {C}_{\epsilon }\) which satisfies both the boundary condition and the self-consistency as shown in Eqs. (5) and (6), respectively.

$$f\left({C}_{\epsilon },\epsilon \right)={C}_{\epsilon }$$
(5)
$$f\left({C}_{t},t|G\right)=f\left({C}_{{{t}^{\prime}}},{{t}^{\prime}}|G\right) \forall t,{{t}^{\prime}} \in [\epsilon ,T]$$
(6)

As proved by Song et al., all the solutions in \({\left\{{C}_{t}\right\}}_{t\in [\epsilon ,T]}\) on the ODE trajectory can directly be mapped to the original ground truth \({C}_{0}\) if both of these two conditions are satisfied. It allows a smoother transformation between \({p}_{data}(C|G)\) to \(\pi \sim \mathcal{N}(0,{T}^{2}I)\) to accelerate the diffusion process than DDPM and Score-SDE models.

Additionally, the consistency function should also be SE (3)-equivariant, which means a molecular coordinate \(C\) are allowed to be changed via translation and rotation in 3D space, while its scalar properties should still be invariant. Formally, a function \(f:\mathcal{X}\to \mathcal{Y}\) is equivariant to a group of transformation \({\mathbb{G}}\), thus:

$$f\left({D}_{\mathcal{X}}\left(g\right)C|G\right)={D}_{\mathcal{Y}}\left(g\right)f\left(C|G\right), g\in {\mathbb{G}}$$
(7)

where \({D}_{\mathcal{X}}\left(g\right)\) and \({D}_{\mathcal{Y}}\left(g\right)\) are transformation matrices parameterized by g in coordinates system \(\mathcal{X}\) and \(\mathcal{Y}\).

These characteristics not only allow EC-Conf model to make one step generation from the prior distribution but also can improve the quality of generation by chaining the outputs of consistency models at multiple time steps as shown in Fig. 1b.

Fig. 1
figure 1

a The diffusion process and the generative phase in normal diffusion models. b The equivariant consistency model smoothly transforms the conformation data to the noise among a tractable probability flow (equivariant consistency process), and maps any solution on this trajectory to its origin for fast generation

Here, we parameterize the equivariant consistency model with a learnable function \({f}_{\theta }\) using skip connections as shown in Eqs. (8 ~ 10):

$${f}_{\theta }\left(C,t|G\right)={c}_{skip}\left(t\right)C+{c}_{out}\left(t\right){F}_{\theta }(C,t|G)$$
(8)
$${c}_{skip}\left(t\right)=\frac{{\sigma }_{data}^{2}}{{\left(t-\epsilon \right)}^{2}+{\sigma }_{data}^{2}}$$
(9)
$${c}_{out}\left(t\right)=\frac{{\sigma }_{data}\left(t-\epsilon \right)}{\sqrt{{\sigma }_{data}^{2}+{t}^{2}}}$$
(10)

where \({F}_{\theta }\) is an \((G,t)\) related noise model fulfilling the SE (3) equivariance. It ensures the SE (3) equivariance of \({f}_{\theta }.\) It is evident that as \(\epsilon \to 0,\) \({c}_{skip}\left(\epsilon \right)\to 1\) and \({c}_{out}\left(\epsilon \right)\to 0\) for \(t=\epsilon\), making \({f}_{\theta }(C,\epsilon |G)={C}_{\epsilon }\), thus satisfying the boundary conditions. Due to \({c}_{skip},{ c}_{out}, {and F}_{\theta }\) are both differentiable functions, we can train the \({f}_{\theta }\) by minimizing the prediction difference between time steps \(t\) and \({{t}^{\prime}}\) to satisfy the self-consistency, as illustrated in Eq. (11).

$$L=MSE({f}_{\theta }\left({C}_{t},t|G\right),{f}_{\theta }({C}_{{t}^{\prime}},{{t}^{\prime}}|G))$$
(11)

Model Architecture In principle, EC-Conf could utilize any type of graph conditioned equivariant neural network for modelling \({F}_{\theta }\). Here, we employed a customized EquiFormer [34] neural network, which is a transformer model based on the irreducible representation and depth-wise tensor production (DTP) based equivariant attention mechanism. This model incorporates embedding of time step factor \({t}_{n}\). The overall architecture of EC-Conf is shown in Fig. 2a. The time factor \({t}_{n}\) and the atom type \({Z}_{i}\) are embedded with a linear layer respectively and added together as the input of EquiFormer. By repeated stacking equivariant graph attention (EGA) modules and feed-forward modules, Equiformer integrates the content and geometric information to predict the coordinates changes \(\delta {C}_{{t}_{n}}\) from time step \({t}_{n}\). The coordinates \({C}_{{t}_{n+1}}\) are then computed by combining \({C}_{{t}_{n}}\) and \(\delta {C}_{{t}_{n}}\) using skip connections.

Fig. 2
figure 2

a The model architecture in EC-Conf. b The mechanism of equivariant graph attention

The architecture of EGA module is illustrated in Fig. 2b. For a pair of neighboring nodes \(\{i,j\}\) in graph \(G\), their embeddings \({x}_{i}\), \({x}_{j}\) are added after a linear transformation and updated through an equivariant DTP, incorporating their distance \({r}_{ij}\). The updated embeddings are reshaped into scalar features \({f}_{ij}^{0}\) and irreps features \({f}_{ij}^{L}\), which are sent to scalar and irreps feature blocks, respectively. Finally, the updated scalar features \({a}_{ij}\) and irreps features \({v}_{ij}\) are multiplicated to update \({x}_{i}.\) By summing over all neighbors of node \(i\) as message passing neural network, both the content and geometric information are incorporated into its embedding \({x}_{i}\). A more detailed description of EGA can be found from reference 34.

As the time factor \({t}_{n}\) and atom type are independent to the coordinates, thus, this modification won’t affect the SE (3)-equivariance of EquiFormer.

Model Training and Conformer Generation For a conformation \(C\) sampled from the dataset, we use \(C+{t}_{n+1} \cdot z\) and \(C+{t}_{n}\cdot z\) to replace \({C}_{n+1},{C}_{n}\) on the ODE trajectory \({\left\{{C}_{t}\right\}}_{t\in [\epsilon ,T]}\), thus the \({f}_{\theta }\) can be trained with the following loss function:

$$\mathcal{L}\left(\theta \right)=MSE({f}_{\theta }\left(C+{t}_{n+1}\cdot z,{t}_{n+1}|G\right),{f}_{\theta }(C+{t}_{n},{t}_{n}|G))$$
(12)

To make the training process more stable and improve the final performance of \({f}_{\theta }\), exponential moving average (EMA) technique are adopted as mentioned in Ref 27. Here, we create another function \({f}_{{\theta }^{-}}\) with the EMA of original parameter \(\theta\) during training process, and minimize the difference between \({f}_{\theta }\left(C+{t}_{n+1}\cdot z,{t}_{n+1}|G\right)\) and \({f}_{{\theta }^{-}}\left(C+{t}_{n}\cdot z,{t}_{n}|G\right)\) as Song et al.’s work [27]. Here, we refer \({f}_{\theta }\) as “online network” and \({f}_{{\theta }^{-}}\) as the “target network” for clarity. The loss function is reformulated as following:

$$\mathcal{L}\left(\theta ,{\theta }^{-}\right)=MSE({f}_{\theta }\left(C+{t}_{n+1}\cdot z,{t}_{n+1}|G\right),{f}_{{\theta }^{-}}\left(C+{t}_{n}\cdot z,{t}_{n}|G\right))$$
(13)

Parameter \(\theta\) is updated with stochastic gradient descent, wile \({\theta }^{-}\) is updated with exponential moving average as shown in Eq. (14), where \(\mu\) is the decay rate predefined by EMA schedule.

$${\theta }^{-}=\mu {\theta }^{-}+\left(1-\mu \right)\theta$$
(14)

By doing in this way, we could perform the equivariant consistency training to get the approximate function \({f}_{\theta }\) to \(f\) as shown in Fig. 3a and Algorithm S1.

Fig. 3
figure 3

The training (a) and sampling (b) workflow of EC-Conf

In the conformer generation phase, samples are drawn from the initial distribution \({\widehat{C}}_{T}\sim N\left(0,{T}^{2}I\right)\) and the consistency model is used to generate conformers: \({\widehat{C}}_{\epsilon }={f}_{\theta }({C}_{T},T)\). The conformers are refined with greedy algorithm by alternating denoising and noise injection steps with a set of time points \(\{{\tau }_{1},{\tau }_{2}\cdots {\tau }_{N-1}\}\) as shown in Fig. 3b and Algorithm S2.

Computational details

Following previous work, we also used GEOM-QM9 [35] and GEOM-Drugs datasets [36] for evaluation. To make a fair benchmark study, we used the same training, validation and test sets produced by Shi et al. for both datasets [13]. Specifically, the GEOM-QM9 is split into training, validation and test set of 39,860, 4979, 200 unique molecules, corresponding to 199,300, 24,895 and 797 conformations; The GEOM-Drugs set contains 39,852, 4983 and 200 molecules in training, validation and test set, corresponding to 199,260, 24,915 and 15,864 conformations, respectively. We examined model performance on the same test set used by GeoDiff model. [20] In current study, following hyperparameters \({\sigma }_{data}=0.5, \epsilon ={10}^{-8},T=80\) were used. For a given molecule, in case there are K ground truth conformations in test set, 2 K conformations are sampled for evaluation.

In our benchmark, 8 recent or established SOTA models including GraphDG [12], ConfVAE [13], GeoMol [14], CGCF [15], ConfGF [16], GeoDiff [20], SDEGen [21], and ETKDG [18] in RDKit. The results of GraphDG, CGCF, ConfVAE, ConfGF were taken from previous study [20], while the performance of ETKDG, GeoMol, SDE-Gen and GeoDiff were evaluated by ourselves on the same test set using the provided models with various settings.

Model performance was evaluated by measuring the quality and diversity of conformations generated by different models. Here, we mainly compared four metrics built on the root mean square deviation (RMSD) proposed by Ganea et al., defined as the normalized Frobenius norm of two atomic coordinate matrices after Kabsch alignment [14]. Formally, let \({S}_{g}\) and \({S}_{r}\) denote the set of generated conformations and the set of reference conformations, respectively. Then, the coverage and matching measures following the traditional Recall measure can be defined as follows:

$$COV-R\left({S}_{g},{S}_{r}\right)=\frac{1}{\left|{S}_{r}\right|}|C\in {S}_{r}|RMSD(C,\widehat{C}\le \sigma ,\widehat{C}\in {S}_{g})|$$
(15)
$$MAT-R\left({S}_{g},{S}_{r}\right)=\frac{1}{\left|{S}_{r}\right|}{\sum }_{C\in {S}_{r}}\underset{\widehat{C}\in {S}_{g}}{\text{min}}(RMSD(C,\widehat{C}))$$
(16)

where δ is a predefined threshold. The other two Precision based metrics, COV-P and MAT-P can be defined similarly, but with the generated and referenced set exchanged as shown in Eqs. (17 , 18).

$$COV-P\left({S}_{g},{S}_{r}\right)=\frac{1}{\left|{S}_{g}\right|}|C\in {S}_{g}|RMSD(C,\widehat{C}\le \sigma ,\widehat{C}\in {S}_{r})|$$
(17)
$$MAT-P\left({S}_{g},{S}_{r}\right)=\frac{1}{\left|{S}_{g}\right|}{\sum }_{C\in {S}_{g}}\underset{\widehat{C}\in {S}_{r}}{\text{min}}(RMSD(C,\widehat{C}))$$
(18)

In practice, the \({S}_{g}\) of each molecule is set as two times of the size of \({S}_{r}\). Intuitively, the COV score measures the percentage of structures in one set covered by another set, where covering means that the RMSD between two conformations is within a certain threshold δ. In contrast, the MAT score measures the average RMSD in one set with its closest neighbor in the other set. In general, a higher COV score or a lower MAT score indicates generating more realistic conformations. Moreover, the Precision metric measures the proportion of the true low-energy conformation among all generated conformations by the model. Here, high precision means that most generated conformation coming from the target low-energy conformation distributions within the error threshold of \(\sigma\). The Recall measures the proportion of low-energy conformation generated by models among the true conformations in datasets. High recall means the model can generate a wide variety of low-energy conformations that cover the entire target distribution. Briefly, the Precision reflects more on quality while Recall metric reflects on diversity. Following previous works, \(\delta\) is set as 0.5 Å and 1.25 Å for GEOM-QM9 and GEOM-Drugs dataset respectively.

Results and discussions

The performance of EC-Conf with different diffusion steps

The first thing to investigate is how diffusion step influences the model performance. Various models were trained by using different diffusion time steps and evaluated on a random test set. For EC-Conf, user can set the maximal diffusion steps in both forward diffusion steps and reverse iteration process, corresponding to the training and generation phase. We first evaluated the performance of EC-Conf trained with different maximal time steps of 2, 5, 10, 15, 20, 25, 50, 150 in forward ODE process, the iteration steps during generation are the same as training phase and the results are as shown in Table 1. It seems that both COV-R and MAT-R metrics reached best level when the diffusion step was set to 25. The performance on COV-R and MAT-R got improved when the diffusion step increased from 2 to 25, while it got worse for 50 and 150. There are two possible reasons for the performance decreasing with the increase of diffusion steps. First, EC-Conf are trained by minimizing the error between adjacent steps during the forward diffusion process. During sampling, the error between adjacent iteration steps accumulates with the increase in sampling iterations, causing the final results to deviate from the ground truth structure. Second, since Ec-Conf learns the evolution relationship between diffusion time and structure, instead of only learning the relation between structure and its noise as in DDPM. Thus, an excessive number of diffusion steps increases the difficulty of learning, decreasing the overall effectiveness.

Table 1 Results of EC-Conf with different iteration steps on the GEMO-QM9 dataset

For COV-P and MAT-P, the performance of EC-Conf with 5 diffusion steps has reached 0.857 and 0.356 Å, indicating EC-Conf have learned the main features of the main low-energy conformations, but its distribution learning ability still needs improvement. The COV-P and MAT-P reached the top when the time step was set to 15, and got worse for number of iteration larger than 15. The COV-R increased 6% with the increase of diffusion step from 15 to 25, while COV-P only decreased 3%. Take both Recall and Precision measurement into consideration, it seems that the models with diffusion step of 25 in the forward ODE trajectories gave best result in general and it was set as optimum diffusion step in training for the following experiments.

Performance metrics of EC-Conf under various sampling iterations

Once EC-Conf is trained, it allows either one step generation from the prior Gaussian distribution or carrying out iterative refinement via chaining the outputs of multiple time steps as shown in Algorithm S2. Here, we evaluate the performance of EC-Conf under various sampling iterations. In the meantime, rule based ETKDG method and 7 ML-based baselines were also compared, including one-shot methods: Graph-DG, ConfVAE, GeoMol and iterative refinement methods: CGCF, Conf-GF, SDE-Gen, and GeoDiff. Additionally, RDKit structure fast correction (FC) option can be introduced to correct the abnormal bond lengths and angles in ML-based methods by using the bond lengths and angles of MMFF force field [37, 38]. Here, we focus on the performance of three corrected diffusion models, ie. SDE-Gen-FC, Geo-Diff-FC and EC-Conf-FC.

For QM9 dataset, the recall measurement of COV-R and MAT-R are as shown in Table 2 and Fig. 4a, b, representing the diversity of generated conformations. It’s clear that the diffusion-based models out-performed most one-shot models except GeoMol, suggesting their superior capability in reproducing ground truth conformations. Interestingly, one-shot generation of EC-Conf is already comparable with one-shot models, and the result for two iterations is almost the same as SDE-Gen model running 1500 iterations, i.e., sampling efficiency improved almost 750 times. The performance of the EC-Conf on the diversity gradually converged with more than 5 iterations. Although the recall performance is somewhat lower than GeoDiff model, the improvement on sampling efficiency of 1000 times could justify that EC-Conf model may be a better choice in dealing with large number of molecules. The precision-based metrics of COV-P and MAT-P are also as shown in Table 2, where the EC-Conf out-performed all the ML baselines indicating its superior quality of conformation generation under all sampling iterations. The optimal performance of EC-Conf was obtained at around 25 iterations steps on GEOM-QM9 dataset.

Table 2 The average benchmark results for EC-Conf under different sampling iterations with fixed (25) diffusion steps on GEMO-QM9 test set
Fig. 4
figure 4

The benchmarked recall and precision results of COV (a, c) and MAT (b, d) on GEOM-QM9 and GEOM-Drugs test sets, respectively

We also evaluate our EC-Conf models on drug-like molecules with maximum of 50 heavy atoms in GEOM-Drugs set, which is more challenging for one-shot baselines. The recall-based metrics of COV-R and MAT-R are shown in Table 3 and Fig. 4c, d. Both the GeoDiff and EC-Conf out-perform the best one-shot generative model, GeoMol. Again, even the one-shot generation of EC-Conf greatly out-performed those one-shot baselines and some of the iterative methods such as CGCF, Conf-GF, SDE-Gen. The performance of EC-Conf generation with 5 iterations already performed better than that of GeoDiff with 1000 iterations, representing 200 times better efficiency. In the concept of precision-based metrics on GEOM-Drugs set, although GeoMol out-performed the rule-based RDKit ETKDG method generally, performance of EC-Conf model is quite close. Among all deep learning-based models, EC-Conf with a single iteration outperformed other models and achieved the best results with 15 sampling iterations. To intuitively demonstrate the conformation generation efficiency of EC-Conf, the conformation evolution in the sampling phase is shown in Fig. 5. It is evident that for EC-Conf, a single iteration already roughly shapes the structure, with the quality of the structure continuously improving over the next 25 steps. In contrast, for SDE-Gen and GeoDiff, the molecular structure only starts to take shape after 3500 iterations. These results further demonstrate the efficiency of EC-Conf in conformation generation.

Table 3 The average benchmark results for EC-Conf under different sampling iterations with fixed (25) diffusion steps on GEMO-Drugs test set
Fig. 5
figure 5

The conformation evolution in a EC-Conf, b Geo-Diff diffusion process and distance geometry to coordinates phase of c SDE-Gen.

Due to the FC option slightly adjust the ML generated structures, it decreases the model performance on both GEOM-QM9 and GEOM-Drugs test set, but it can optimize force field calculated molecular energy as discussed later. Among three corrected diffusion models, EC-Conf-FC outperforms other methods in general, only get slightly worse value on COV-R median as shown in Table S3.

Some sample conformations generated by selected models are shown in Fig. 6 to provide a qualitative comparison, where EC-Conf is shown to nicely capture both local and global structures in 3D space. Additionally, SDEGen fails to generate reasonable structures for some molecules.

Fig. 6
figure 6

Examples of generated conformations for 7 random selected molecules and their aligned RMSDs to the closest reference structures in GEOM-Drugs test set with different methods

C. The quality of EC-Conf generated structures on GEOM-Drugs

As discussed above, machine learning methods exhibit advantages in terms of generating diverse conformations and achieving precision in reproducing ground truth conformations. Here, we evaluated the quality of EC-Conf generated conformations by examining their deviation to the optimized conformations with MMFF94 force field, and the ground truth structures with the lowest MMFF94 energy. Here, we evaluated both structural deviation and conformational energy. Comparison with structures generated by other methods, including ETKDG, SDEGen with 6000 iterations and GeoDiff model with 5000 denoising iterations. Additionally, the results from FC corrected conformations to the original optimized structure and ground truth were also compared.

The average and minimum root mean square deviation (RMSD) of the generated structures to the optimized structures on GEOM-Drugs dataset are as depicted in Fig. 7 a, b, respectively. ETKDG structures have the smallest average deviation from their optimized structures (with a median value of 0.905 Å), and the order of increasing average deviation is: ETKDG < EC-Conf < GeoDiff < EC-Conf-FC < GeoDiff-FC < SDEGen-FC < SDEGen. The structural deviation to the ground truth structures also follows the same trend as shown in Fig. 7 c, d.

Fig. 7
figure 7

Evaluation of conformation quality for various models. Average (a) and minimum (b) RMSD between generated conformations and their optimized ones. Average (c) and minimum (d) RMSD of generated structures to the ground truth structures with lowest MMFF94 energy. Minimum energy difference of generated structure to its optimized structures (e) and the ground truth structure with lowest MMFF94 energy (f). *SDEGen-FC/50 means scaling the result by 50 times

Conformational energy can also serve as an indicator of structure quality, as structures with abnormal bond lengths or ring configurations can significantly increase internal energy. For methods not using FC option, GeoDiff has the smallest energy difference between generated structures and their optimized counterparts (with a median value of 0.68 kcal/mol/atom) and the order is: GeoDiff < ETKDG < EC-Conf < SDEGen as depicted in Fig. 4e. We need to note that SDEGen generates many unreasonable structures difficult to optimize, making optimized structures energy is even higher than FC corrected structures. Thus, the analysis of energy difference between SDEGen-FC and its optimized structures are meaningless. The energy difference to the ground truth follows the same trend. These findings highlight that GeoDiff model generates energywise most favorable conformations. However, when the FC option was used, the energy difference between generated conformation and its optimized conformation for both EC-Conf-FC and GeoDiff-FC model decrease dramatically, which are 0.25 and 0.29 kcal/mol/atom, respectively. This suggests that user can combine EC-Conf with the structure correction functionality in RDKit to directly generate reasonable conformations without further optimization, which can save a lot of processing time considering the low diffusion step for EC-Conf. The results for GEOM-QM9 set are similar to those of GEOM-Drugs, as shown in Figure S1. All these results demonstrate that our EC-Conf model can achieve a good balance between conformation quality and sampling efficiency.

Conclusions

Here, an equivariant consistency generative model (EC-Conf) was proposed as an ultra-fast diffusion method with only a few iterations for low-energy conformation generation. A time factor-controlled SE (3)-equivariant transformer was used to encode the Cartesian molecular conformations and a highly efficient consistency diffusion process was carried out to generate molecular conformations, largely achieving a balance between the conformation quality and computational efficiency. Our results demonstrate that EC-Conf can potentially learn the distribution of low energy molecular conformation with at least two magnitudes higher efficiency than conventional diffusion models and could potentially become a useful tool for conformation generation and sampling.

Data availability

The source code is available from https://github.com/DeepLearningPS/EcConf.

References

  1. Perola E, Charifson PS (2004) Conformational analysis of drug-like molecules bound to proteins: an extensive study of ligand reorganization upon binding. J Med Chem 47(10):2499–2510

    Article  CAS  PubMed  Google Scholar 

  2. Lyne PD (2002) Structure-based virtual screening: an overview. Drug Discov Today 7(20):1047–1055

    Article  CAS  PubMed  Google Scholar 

  3. Jain AN (2004) Ligand-based structural hypotheses for virtual screening. J Med Chem 47(4):947–961

    Article  CAS  PubMed  Google Scholar 

  4. Bhunia SS, Saxena M, Saxena AK. Ligand-and structure-based virtual screening in drug discovery. In: Biophysical and Computational Tools in Drug Discovery. Springer; 2021. p. 281–339.

  5. Broccatelli F, Brown N (2014) Best of both worlds: on the complementarity of ligand-based and structure-based virtual screening. J Chem Inf Model 54(6):1634–1641

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Cruciani G, Carosati E, Clementi S. Three-dimensional quantitative structure-property relationships; 2003.

  7. Schwab CH (2010) Conformations and 3D pharmacophore searching. Drug Discov Today Technol 7(4):e245–e253

    Article  CAS  Google Scholar 

  8. Hendrickson MA, Nicklaus MC, Milne GW, Zaharevitz D (1993) CONCORD and CAMBRIDGE: comparison of computer generated chemical structures with x-ray crystallographic data. J Chem Inf Comput 33(1):155–163

    Article  CAS  Google Scholar 

  9. Gasteiger J, Rudolph C, Sadowski J (1990) Automatic generation of 3D-atomic coordinates for organic molecules. Tetrahedron Comput Methodol 3(6):537–547

    Article  CAS  Google Scholar 

  10. Hawkins PC, Skillman AG, Warren GL, Ellingson BA, Stahl MT (2010) Conformer generation with OMEGA: algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database. J Chem Inf Model 50(4):572–584

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Mansimov E, Mahmood O, Kang S, Cho K (2019) Molecular geometry prediction using a deep generative graph neural network. Sci Rep 9(1):20381

    Article  PubMed  PubMed Central  Google Scholar 

  12. Simm GN, Hernández-Lobato JM. A generative model for molecular distance geometry. In: ICML 2020. PMLR 2020.

  13. Xu M, Wang W, Luo S, Shi C, Bengio Y, Gomez-Bombarelli R, Tang J. An end-to-end framework for molecular conformation generation via bilevel programming. In: ICML 2021: 2021. PMLR: 11537–11547.

  14. Ganea O, Pattanaik L, Coley C, Barzilay R, Jensen K, Green W, Jaakkola T. Geomol: Torsional geometric generation of molecular 3d conformer ensembles. In: NIPS 2021. 2021: 13757–13769.

  15. Xu M, Luo S, Bengio Y, Peng J, Tang J. Learning neural generative dynamics for molecular conformation generation. In: ICLR 2021. 2021.

  16. Shi C, Luo S, Xu M, Tang J. Learning gradient fields for molecular conformation generation. In: ICML 2021: 2021. PMLR: 9558–9568.

  17. Schärfer C, Schulz-Gasch T, Ehrlich H-C, Guba W, Rarey M, Stahl M (2013) Torsion angle preferences in druglike chemical space: a comprehensive guide. J Med Chem 56(5):2016–2028

    Article  PubMed  Google Scholar 

  18. Wang S, Witek J, Landrum GA, Riniker S (2020) Improving conformer generation for small rings and macrocycles based on distance geometry and experimental torsional-angle preferences. J Chem Inf Model 60(4):2044–2058

    Article  CAS  PubMed  Google Scholar 

  19. Zhu J, Xia Y, Liu C, Wu L, Xie S, Wang Y, Wang T, Qin T, Zhou W, Li H: Direct molecular conformation generation. In: Transactions on Machine Learning Research. 2022.

  20. Xu M, Yu L, Song Y, Shi C, Ermon S, Tang J. Geodiff: A geometric diffusion model for molecular conformation generation. In: ICLR 2022. 2022.

  21. Zhang H, Li S, Zhang J, Wang Z, Wang J, Jiang D, Bian Z, Zhang Y, Deng Y, Song J et al (2023) SDEGen: learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem Sci 14(6):1557–1568

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Jing B, Corso G, Chang J, Barzilay R, Jaakkola T. Torsional diffusion for molecular conformer generation. In: NIPS 2022. 2022: 24240–24253.

  23. Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. NIPS 2020:6840–6851

    Google Scholar 

  24. Song Y, Ermon S. Generative modeling by estimating gradients of the data distribution. In: NIPS 2019. 2019.

  25. Song Y, Ermon S. Improved techniques for training score-based generative models. In: NIPS 2020. 2020: 12438–12448.

  26. Song Y, Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, Poole B. Score-based generative modeling through stochastic differential equations. In: ICLR 2021. 2021.

  27. Song Y, Dhariwal P, Chen M, Sutskever I. Consistency models. In: ICML 2023. 2023.

  28. Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: An overview. IEEE Signal Process Mag 35(1):53–65

    Article  Google Scholar 

  29. Kingma DP, Welling M. Auto-Encoding Variational Bayes. 2013: arXiv:1312.6114.

  30. Jimenez Rezende D, Mohamed S. Variational Inference with Normalizing Flows. In: ICML 2015: May 01, 2015. 2015: arXiv:1505.05770.

  31. Yang L, Zhang Z, Song Y, Hong S, Xu R, Zhao Y, Shao Y, Zhang W, Cui B, Yang M-H. Diffusion models: A comprehensive survey of methods and applications. arXiv preprint arXiv:220900796 2022.

  32. Song Y, Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, Poole B. Score-based generative modeling through stochastic differential equations. In: ICLR 2021. 2020.

  33. Karras T, Aittala M, Aila T, Laine S. Elucidating the design space of diffusion-based generative models. In: NIPS 2022. 2022: 26565–26577.

  34. Liao Y-L, Smidt T. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. In: ICLR 2023. 2022.

  35. Ramakrishnan R, Dral PO, Rupp M, Von Lilienfeld OA (2014) Quantum chemistry structures and properties of 134 kilo molecules. Sci Data 1(1):1–7

    Article  Google Scholar 

  36. Axelrod S, Gomez-Bombarelli R (2022) GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci Data 9(1):185

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Zhang X, Zhang O, Shen C, Qu W, Chen S, Cao H, Kang Y, Wang Z, Wang E, Zhang J et al (2023) Efficient and accurate large library ligand docking with KarmaDock. Nat Comput Sci 3(9):789–804

    Article  CAS  PubMed  Google Scholar 

  38. Stärk H, Ganea O, Pattanaik L, Barzilay R, Jaakkola T: Equibind. Geometric deep learning for drug binding structure prediction. In: ICML 2022: 2022. PMLR: 20503–20521.

Download references

Funding

H.C. thanks the financial support of Pearl River Recruitment Program of Talents No. 2021CX020227. This study was supported by the R&D Program of Guangzhou National Laboratory (YW-YWYM0205, GZNL2023A01008, GZNL2023A01005) and Overseas Experts Supporting Programs under National Research Platform (WGZJ22-001).

Author information

Authors and Affiliations

Authors

Contributions

Zhiguang Fan completed the main experiments in the article, while Mingyuan Xu designed and implemented the algorithms and code presented in the paper, and drafted the initial version of the manuscript. Hongming Chen led the overall project, providing guidance throughout, with Yuedong Yang offering input on the manuscript. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Mingyuan Xu or Hongming Chen.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

13321_2024_893_MOESM1_ESM.docx

Supplementary file1 (DOCX 530 KB) 1. The algorithm of equivariant consistency training and sampling is as shown in Algorithm S1 and S2. 2. The median of benchmark results and 7 conformation examples on GEOM-QM9 are as shown in Table S1 and Figure S1. The median of benchmark result on GEOM-Drugs is as shown in Table S2. 3. Quality of generations on GEOM-QM9 Datasets is as shown in Figure S2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, Z., Yang, Y., Xu, M. et al. EC-Conf: A ultra-fast diffusion model for molecular conformation generation with equivariant consistency. J Cheminform 16, 107 (2024). https://doi.org/10.1186/s13321-024-00893-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13321-024-00893-2

Keywords