Chaos-embedded particle swarm optimization approach for protein-ligand docking and virtual screening

Tai, Hio Kuan; Jusoh, Siti Azma; Siu, Shirley W. I.

doi:10.1186/s13321-018-0320-9

Methodology
Open access
Published: 14 December 2018

Chaos-embedded particle swarm optimization approach for protein-ligand docking and virtual screening

Hio Kuan Tai¹,
Siti Azma Jusoh² &
Shirley W. I. Siu¹

Journal of Cheminformatics volume 10, Article number: 62 (2018) Cite this article

3918 Accesses
19 Citations
4 Altmetric
Metrics details

Abstract

Background

Protein-ligand docking programs are routinely used in structure-based drug design to find the optimal binding pose of a ligand in the protein’s active site. These programs are also used to identify potential drug candidates by ranking large sets of compounds. As more accurate and efficient docking programs are always desirable, constant efforts focus on developing better docking algorithms or improving the scoring function. Recently, chaotic maps have emerged as a promising approach to improve the search behavior of optimization algorithms in terms of search diversity and convergence speed. However, their effectiveness on docking applications has not been explored. Herein, we integrated five popular chaotic maps—logistic, Singer, sinusoidal, tent, and Zaslavskii maps—into PSOVina$^{{\mathrm{2LS}}}$, a recent variant of the popular AutoDock Vina program with enhanced global and local search capabilities, and evaluated their performances in ligand pose prediction and virtual screening using four docking benchmark datasets and two virtual screening datasets.

Results

Pose prediction experiments indicate that chaos-embedded algorithms outperform AutoDock Vina and PSOVina in ligand pose RMSD, success rate, and run time. In virtual screening experiments, Singer map-embedded PSOVina$^{{\mathrm{2LS}}}$ achieved a very significant five- to sixfold speedup with comparable screening performances to AutoDock Vina in terms of area under the receiver operating characteristic curve and enrichment factor. Therefore, our results suggest that chaos-embedded PSOVina methods might be a better option than AutoDock Vina for docking and virtual screening tasks. The success of chaotic maps in protein-ligand docking reveals their potential for improving optimization algorithms in other search problems, such as protein structure prediction and folding. The Singer map-embedded PSOVina$^{{\mathrm{2LS}}}$ which is named PSOVina-2.0 and all testing datasets are publicly available on https://cbbio.cis.umac.mo/software/psovina.

Introduction

Small-molecule drugs exert their pharmacological effects through binding to their biological targets and subsequently modulating the activities that are associated with diseases to be treated. To rationally design new drugs for a target protein, specific interactions of the binding partners must be correctly predicted. This prediction can be achieved with a computational approach called protein-ligand docking [1]. Given the three-dimensional structures of a target protein and a ligand, the main goal of protein-ligand docking is to dock the ligand at the active site of the protein and to score the different binding poses of the ligand. Then, through a virtual screening process, a large library of ligands can be docked, ranked and filtered according to their docking scores, enabling the rapid identification of lead candidates. Therefore, accurate and efficient docking tools are indispensable for reducing the cost and time in the drug discovery process.

From an algorithmic point of view, the docking problem is a conformational search problem, which is to find the combination of parameters that yields the optimal ligand binding pose. Assuming fixed topologies of the protein and ligand, then the conformational parameters of the complex include the position and orientation of the ligand with respect to the protein and the angles of all rotatable bonds of the ligand (and even the protein if flexibility of the protein is considered). Assessment of a ligand binding pose is done by a scoring function of interaction types and distances of the atoms between the protein and ligand, which can be force field-based, empirical-based or knowledge-based [2].

Various optimization strategies have been proposed to solve the protein-ligand docking problem. For example, a Monte Carlo (MC)-based approach was implemented in AutoDock Vina [3], and a genetic algorithm (GA)-based approach was implemented in GOLD [4] and AutoDock [5]. Recently, swarm-intelligence-based approaches using particle swarm optimization (PSO) and other nature-inspired methods, such as artificial bee colony (ABC) and ant colony optimization (ACO), have become very popular for solving nonlinear and complex optimization problems. The advantages of these metaheuristic algorithms are that they tend to find good solutions quickly, they are easy to implement, and there are many variants to allow easy customization of the algorithm fitting the domain of interest. Some metaheuristic docking methods have been implemented, such as SODOCK [6], PSO@AutoDock [7], FIPSDock [8], PSOVina [9] based on the PSO algorithm and variants, PLANTS [10] based on ACO and FlABCps [11] based on ABC. All of these docking methods have been shown to improve the pose prediction accuracy and docking efficiency compared to traditional optimization methods. In these implementations, the metaheuristic algorithms were utilized as the global optimizer to quickly locate promising regions in the conformational search space. Some of these methods included a local search algorithm to refine the solution from the global search to the closest local minimum.

In this paper, we present an improvement of the PSOVina docking method that was previously developed in our group [9]. The first version of PSOVina implemented the canonical PSO algorithm with a convergence detection strategy to effectively reduce the execution time of AutoDock Vina docking by 51–60% [9]. The second version of PSOVina, named PSOVina$^{{\mathrm{2LS}}}$ [12], further enhanced the docking performance by incorporating a novel two-stage local search (2LS) algorithm to quickly examine the potential of the global search solutions. Only promising solutions will be refined by the expensive full-length local search. Our experimental results showed that the 2LS achieved an approximate threefold acceleration in finding optimal docking solutions relative to the conventional one-stage local search. In this work, we investigate the use of chaotic maps in PSOVina in an attempt to further enhance the search capability of the algorithm. A chaotic map is a function to mimic the dynamics of some nonlinear systems. Previous studies [13,14,15] indicated that using chaotic variables rather than the conventional random number generators might improve the search behavior of evolutionary algorithms in terms of search diversity and convergence speed. To evaluate the effectiveness of chaotic functions in docking applications, we implemented five chaotic functions in PSOVina$^{{\mathrm{2LS}}}$ and analyzed their performances on four benchmark docking datasets and two virtual screening datasets. In our experiments, chaos-embedded methods outperformed AutoDock Vina and our previous PSOVina in both the success rate of ligand pose prediction and the speed of virtual screening.

Methods

PSOVina and PSOVina$^{{\mathrm{2LS}}}$

PSOVina is a metaheuristic molecular docking program based on the AutoDock Vina software [3]. In PSOVina, the fast converging PSO algorithm was used as the global optimizer integrated with the Broyden–Fletcher–Goldfarb–Shanno (BFGS) local search algorithm and the scoring function of Vina. PSO is a population-based search method that is inspired by the social learning behaviors of bird flocking and fish schooling when searching for food [16]. The population, called swarm, consists of N members, called particles. Each individual particle represents a potential solution and moves in a D-dimensional search space based on its current position and velocity. During the search process, each particle adjusts its position according to its own experience and the swarm’s experience. The former is the best position that the particle has ever visited, called pbest, and the latter is the best position that the swarm has ever visited, called gbest. The velocity $V_i$ and position $X_i$ of the particle i are updated iteratively over time t according to the following equations:

$$\begin{aligned} V_i(t+1)&= w V_i(t)+c_1r_1(\text {pbest}_i(t)-X_i(t))\nonumber \\&\quad+ c_2r_2(\text {gbest}(t)-X_i(t)){,} \end{aligned}$$

(1)

$$\begin{aligned} X_i(t+1)&= X_i(t)+V_i(t+1){,} \end{aligned}$$

(2)

where $V_i = [v_{i1}, \ldots , v_{iD}]$ and $X_i = [x_{i1}, \ldots , x_{iD}]$. w is a constant parameter called the inertia weight, and it determines the contribution of the current velocity of the particle to its new velocity. A large w encourages exploration of the entire search space, and a small w facilitates local exploitation and convergence. Therefore, a suitable value of w (typically between 0.8 and 1.2) will help maintain a proper balance between the global and local search capabilities of the swarm. Rather than using a predefined constant value, many studies have proposed strategies to dynamically adjust w during the search process [17]. Two other coefficients $c_1$ and $c_2$ are the cognitive and social parameters, respectively. The former controls the particle’s movement toward the region where the best solution has been encountered by itself before, and the latter controls its movement toward the best region that the swarm has collectively found thus far. A similar treatment to adaptively adjust $c_1$ and $c_2$ was also proposed [18]. Finally, $r_1$ and $r_2$ are two uniform random variables between 0 and 1.

Although PSOVina has been improved in terms of global search efficiency, we found that each update step was still computationally expensive. The reason was that each particle update will undergo a local search step based on the BFGS algorithm to refine the solution to a local minimum. Since not all new solutions after the position update step of Eq. 2 are good solutions, we introduced the 2LS method into PSOVina to focus the computing resources on optimizing only promising solutions [12]. The first stage is to perform a short local search and decide the potential of a solution by comparing it to the current pbest solution obtained from a short local search. Only if the solution has improved energy will it enter the second stage to perform a complete local search for full optimization. This improved method, called PSOVina$^{{\mathrm{2LS}}}$, yielded greater prediction accuracy and achieved at least a threefold acceleration in run time compared to AutoDock Vina.

Chaotic maps

Chaos is a bounded unstable dynamic behavior, and it has characteristics of randomness, ergodicity, initial value sensitivity, and regularity [19]. A chaotic sequence $\{x_n : n=0,1,2,\ldots \}$ is generated deterministically from a dynamical system of the form:

$$\begin{aligned} x_{n+1} = f(x_n), \quad n=0, 1, 2, \ldots \end{aligned}$$

(3)

where f is a smooth nonlinear function. Generation of the sequence is fast and depends on only one ($x_0$) or a few initial parameters, making it easy to use and store. Apart from regularity, an important difference between random sequences and random-like chaotic sequences is their assurance of ergodicity. Under proper conditions, they can cover all values without repeat within a certain range. The sequence generation is also very sensitive to the initial parameters; thus, only a slight modification of the parameters will produce an entirely different sequence.

Chaotic maps have been found to be promising for enhancing search performance in various optimization algorithms [13, 20, 21]. Generally, the selected chaotic function is used as a replacement of the conventional random number generator (mostly with a uniform distribution) or to adaptively modify parameters in the metaheuristic optimization algorithm while the search evolves. In this way, it is to prevent the search from becoming trapped in local optima and intends to improve the balance between exploring the search space and exploiting the found solutions. Successes have been reported in applying chaotic optimization in machine learning tasks such as feature selection and clustering [22, 23] and in real-world applications in the fields of engineering [24, 25], image processing [26], and recently in bioinformatics [27,28,29]. Well-known chaotic maps for these applications include logistic map [30], Singer map [31], sinusoidal map [31], tent map [32], Zaslavskii map [33], Gauss map [31], circle map [34], Arnold’s cat map [35], Sinai map [36], and piecewise map. Each of these maps are different with respect to the density of the periodic orbits and ways of mixing topologies. In the following, we introduce the characteristics of the five most used chaotic maps.

The logistic map is one of the simplest and most popular maps describing the nonlinear dynamics of a biological population [30]. This map is defined by the following equation:

$$\begin{aligned} x_{n+1}=\alpha x_n(1-x_n), \end{aligned}$$

(4)

where $\alpha$ is the control parameter. As shown in Additional file 1: Fig. S1, the dynamical behaviors in the logistic map systems will be either in the periodic regimes or in the chaotic regimes depending on the value of $\alpha$. In the former, only a finite set of different values will be visited, whereas in the latter, the system evolves in a disordered way and never repeats itself exactly. When $\alpha = 4$, $x_0\in (0,1)$ and $x_0\notin \{0.25, 0.5, 0.75\}$, this function generates $x_n$ covering the entire range of (0, 1).

The Singer map [31] is defined by the following equation:

$$\begin{aligned} x_{n+1} & = \mu (7.86 x_n - 23.31 x_n^2 + 28.75 x_n^3\nonumber \\&\quad- 13.302875 x_n^4), \end{aligned}$$

(5)

with $x_n\in (0,1)$ under the condition that $x_0\in (0,1)$ and $\mu \in [0.9, 1.08]$. Similar to the logistic map, systems with larger $\mu$ values evolve in the chaotic regimes, as shown in Additional file 1: Fig. S2.

The sinusoidal map [31] is defined by the following equation:

$$\begin{aligned} x_{n+1}=\alpha x_n^2 sin(\pi x_n){.} \end{aligned}$$

(6)

The systems clearly evolve in the chaotic regimes when $\alpha = 2.3$ and $0.45 \le x_0 \le 0.92$, as shown in Additional file 1: Fig. S3. Note that under these conditions, the chaotic values are $> 0.4$; thus, the generated states do not cover the entire range of (0, 1).

The tent map [32] and logistic map are topologically conjugate, and they have similar dynamical behaviors. The tent map is defined by the following equation:

$$\begin{aligned} x_{n+1}={\left\{ \begin{array}{ll} \mu x_n & x_n < 0.5{,} \\ \mu (1-x_n) & x_n \ge 0.5{,} \\ \end{array}\right. } \end{aligned}$$

(7)

where $\mu$ is a positive real constant. For optimization tasks, the following equation is mostly used [13, 21, 23]:

$$\begin{aligned} x_{n+1}={\left\{ \begin{array}{ll} \frac{1}{0.7}x_n & x_n < 0.7{,} \\ \frac{10}{3}(1-x_n) & x_n \ge 0.7{.} \\ \end{array}\right. } \end{aligned}$$

(8)

The dynamical behavior of this system is shown in Additional file 1: Fig. S4.

The Zaslavskii map [33] is defined by the following equation:

$$\begin{aligned} x_{n+1} & = \left( x_n+v+ay_{k+1}\right) mod(1){,} \end{aligned}$$

(9)

$$\begin{aligned} y_{n+1} & = cos(2\pi x_n) + e^{-r}y_{n}{.} \end{aligned}$$

(10)

When $v=400$, $r=3$, $a=12$, and $y_{n+1}\in [-1.0512, 1.0512 ]$, the dynamical behaviors of Zaslavskii map systems are in a wide spectrum and very unpredictable, as shown in Additional file 1: Fig. S5.

Chaos-embedded PSOVina

One of the main disadvantages of PSO is that it often suffers from becoming trapped in local optima, particularly when dealing with functions that have multiple local extrema. The consequence is premature convergence leading to suboptimal solutions. Previous studies on improving the global convergence of the PSO algorithm were largely focused on modifying the inertia weight w and acceleration coefficients $c_1$ and $c_2$ to prevent the swarm from becoming trapped in local optima. For example, Chuanwen and Bompard [24] used a logistic map in PSO to decide w iteratively based on the evolution number t. Their result showed that a chaos-embedded algorithm can achieve better performance in terms of efficiency and convergence rate. Recently, Alatas et al. [13] tested twelve chaos-embedded PSO methods and eight chaotic maps with different combinations of chaos-adapted sequences for coefficients w, $c_1$ and $c_2$ and chaos-adapted sequences for random variables $r_1$ and $r_2$. They concluded that chaos-adapted w, $r_1$ and $r_2$ performed the best in experiments on three benchmark mathematical functions. Both studies suggested that the use of chaotic maps as a replacement for the static parameters or the normal random sequences can improve the global search capability by more easily escaping the local minimum. In addition, it was hypothesized that chaotic maps add the ergodicity property in the search, which is lacking in random sequences [13].

To investigate the use of chaotic maps in protein-ligand docking, we embedded a chaotic map in PSOVina$^{{\mathrm{2LS}}}$ as a means to generate random numbers. The inertia weight constant (w) and random variables ($r_1$ and $r_2$) in the velocity update equation of Eq. 1 were replaced by chaotic variables:

$$\begin{aligned} V_i(t+1) & = w_{cm} V_i(t)+c_1r_{cm}(\text {pbest}_i(t)-X_i(t))\nonumber \\&\quad +c_2(1-r_{cm})(\text {gbest}(t)-X_i(t)){,} \end{aligned}$$

(11)

where $w_{cm}$ and $r_{cm}$ are variables of two independent chaotic sequences; both of these variables were initialized with different random values and updated before the velocity update step was performed. A few alternatives of when to iterate the chaotic function and the number of chaotic variables used were tested. It was found that when the chaotic variables were updated once for each particle, the search could yield better solutions. In total, we implemented five chaos-embedded PSOVina$^{{\mathrm{2LS}}}$ methods, including logistic, Singer, sinusoidal, tent, and Zaslavskii maps. Parameters of the chaotic maps used in the methods are listed in Table 1.

Table 1 The parameters of chaotic maps used in this study

Full size table

Datasets for pose prediction test

Four datasets were used to evaluate the ligand pose predictions of the docking methods: PDBbind, Astex, GOLD, and SB2010. PDBbind [37] is a manually curated database of 3D protein-ligand structures with experimentally obtained ligand binding affinities. We used the core-set complexes from PDBbind version 2014, which contains 195 protein-ligand complexes. These complexes were representatives of protein clusters generated from the PDBbind 3446 refined-set complexes. The selection was performed carefully to ensure that strong, medium, and weak binding cases were included in the core set. Both Astex [38] and GOLD [39] are widely used benchmark datasets for comparing docking methods. Astex contains 85 diverse protein-ligand structures with a resolution better than 2.5 Å. The set was derived from different drug discovery studies, and therefore, all ligands are drug-like samples. Among them, 23 of the ligands are approved drugs, and some are in clinical trials [38]. The GOLD dataset contains 77 protein-ligand complexes. This dataset was used for assessing state-of-the-art docking programs such as Surflex, Glide, MolDock and the more recent FIPSDock [38]. The fourth dataset is SB2012 from the Rizzo Lab [40]. It is an updated release of the SB2010 docking validation database containing a large dataset of 1043 crystallographic protein-ligand complexes. As a summary, information of the four datasets used for pose prediction tests are listed in Table 2.

All datasets were prepared by converting structure files into PDBQT format using the Python scripts prepare_receptor4.py and prepare_ligand4.py provided in the MGLTools package with the parameters ‘-A hydrogens’ and -U ‘nphs_lps_waters’. With these options, missing hydrogens were added, but nonpolar hydrogens were merged to the neighboring carbon based on the united-atom model scheme. As the ligand is treated as flexible in docking, this preprocessing step establishes torsion tree of the ligand that contains a fixed set of atoms (the root) and rotatable groups of atoms (the branches). All non-ring torsions are considered rotatable except bonds that only rotate hydrogens. For both protein and ligand, the default AD4 atom type and Gasteiger partial charges were used. Atoms such as Au and Ce that cannot be recognized by the conversion tools were removed. For each PDBbind receptor, the docking box was calculated based on the pocket residues given in the dataset for each complex, i.e., the geometrical center of all pocket atoms as the center of the box and the largest distances between pocket atoms in the X-, Y-, Z-dimensions as the box lengths. For the Astex diverse set, the prepared PDBQT files were kindly provided by the author of rDock [41]. For the GOLD benchmark set, the coordinates of the protein and ligand atoms were extracted from the PDB files, which were downloaded from RCSB PDB, whereas for the SB2012 docking validation set, structure files in MOL2 format were obtained from the Rizzo Lab page. Following the procedure in Ref. [8, 11], a default docking box size of 22.5 Å in all three dimensions was created for receptors in the Astex, GOLD, and SB2012 datasets. The center of the docking box was defined as the geometric center of the bound ligand in the crystal structure.

Table 2 Four datasets for the pose prediction test

Full size table

Datasets for virtual screening test

The Database of Useful Decoys-Enhanced (DUD-E) [42] was used in the virtual screening experiments. The entire dataset consists of 102 protein targets with known active ligands and computationally generated inactive ligands. The inactive ligands, called decoys, were made to have similar physicochemical properties such as molecular weight, number of rotatable bonds, calculated log P, and hydrogen bond acceptors and donors, but dissimilar 2D topologies from the active ligands such that it is challenging for docking programs to identify real positives from the positive-like ligands. Smaller subsets of four protein classes (G protein-coupled receptors, kinases, nuclear receptors, and proteases) are also available in DUD-E for family-specific virtual screening sets. In this study, due to the limited computing power, a diverse subset that contains 8 representative targets from different protein families (herein named DIV8) was employed to assess the screening performance of different docking methods. In addition, the nuclear receptor subset (herein named NR11) was also evaluated as a comprehensive test of one of the major drug target classes.

The sets of actives and decoys were preprocessed using the LigPrep module of Schrödinger 2017-1. In LigPrep, each ligand was first generated from the given isomeric SMILES string, and it was subsequently subject to 6 steps of preprocessing: (1) add hydrogen atoms to make all hydrogen explicit; (2) remove unwanted atoms such as counter ions; (3) neutralize functional groups, if possible, by adding or removing protons; (4) find low-energy conformations of flexible ring systems in the ligand; (5) filter distorted conformations by performing energy minimization; and (6) generate energy-minimized structure by performing a series of Monte Carlo multiple minimum (MCMM) searches. Finally, only one structure coordinate was retained. For the target receptor structures, they were preprocessed using the Protein Preparation Wizard of Schrödinger. It includes checking the structure for correct bond orders and correct protonation states (at pH 7.0), deleting far waters, optimizing the hydrogen bonding network, and performing energy minimization using the OPLS2005 force field. The optimized structures were then converted into PDBQT format using the prepare_ligand4.py and prepare_receptor4.py programs without any additional parameters; this ensures the programs that no repairs on the structures were required. For each receptor, size of the docking box was determined based on the co-crystallized ligand using the eBoxSize script. As shown in Ref. [43], eBoxSize can improve ranking accuracy in virtual screening experiments for about two-third of target proteins. The final virtual screening datasets with the numbers of generated actives and decoys are listed in Table 3.

Table 3 Number of actives and decoys in the DUD-E datasets for virtual screening test after preprocessing

Full size table

Performance analysis

When the structure of the co-crystallized ligand is given, the standard root-mean-square deviation (RMSD) can be used to evaluate the accuracy of the predicted ligand binding pose. RMSD is a measure of the difference between the predicted position of each ligand atom and its actual position in the experimental structure with respect to the target protein. In this work, a predicted ligand pose with an RMSD of 2 Å or less was considered successful.

Being a stochastic algorithm, PSOVina (also AutoDock Vina) can provide different solutions in repeated runs. For pose prediction experiments, we performed docking of each complex 10 times and reported the performance measured from the best-scoring pose over all repeated runs (i.e., the docking pose found with the lowest binding affinity in 10 runs). We also measured the average performance of each run. For fairness, the same repetitive experiment was executed for all methods to be compared.

The performance of a docking method in virtual screening was evaluated based on the list of the screened compounds ranked by the predicted binding affinity. The more actives that are ranked high in the list, the more effective is the docking method for virtual screening. Two metrics were used in this study: the area under the receiver operating characteristic curve (AUC-ROC) and the enrichment factor (EF). The former is the global performance measure of a method from the ratios of true positive fraction over the false positive fraction at different classification thresholds. Ligands in the list above the threshold are classified as actives, whereas those below the threshold are classified as decoys. An AUC-ROC value of 1.0 indicates perfect classification, whereas a value of 0.5 indicates random prediction. Because drug discovery research will mainly consider the top-ranked ligands from the virtual screening result for further investigation, a measure of how good is the predicted top-$x\%$ ranked ligands is more indicative about the effectiveness of the docking method for virtual screening. The value of $\text {EF}_{x\%}$ is computed as:

$$\begin{aligned} \text {EF}_{x\%} = \frac{\text {actives at~} x\%}{\text {total actives}} \Bigg / \frac{\text {ligands at~} x\%}{\text {total ligands}}{.} \end{aligned}$$

(12)

Program efficiency, i.e., the run time, was measured as the elapsed time (or referred to as the real time) used by the docking program with the Linux command time.

Pose prediction tests were performed on a Dell XPS 8700 desktop with an Intel i7 quad-core 3.6 GHz processor and 24 GB of memory running Ubuntu 15. Virtual screening tests were run on a high-performance computing (HPC) cluster, where each node was equipped with a 24-core Intel Xeon E5-2690 GHz CPU and 256 GB of memory.

Results and discussion

To evaluate the effectiveness of chaotic maps in protein-ligand docking, two types of experiments were performed: ligand pose prediction and virtual screening.

Comparison of ligand pose prediction accuracy and docking speed

We conducted experiments using four independent datasets, namely, PDBbind, Astex, GOLD, and SB2012, to evaluate the docking performances of chaos-embedded PSOVina$^{{\mathrm{2LS}}}$ methods and compare them to AutoDock Vina and our previous versions of PSOVina. For PSOVina and PSOVina$^{{\mathrm{2LS}}}$, the following PSO parameters were used: $N=8$, $w=0.36$, and $c_1=c_2=0.99$. For PSOVina$^{{\mathrm{2LS}}}$, two additional parameters, $R=0.1$ and $C_r=18$, for the 2LS were used. For each complex in the dataset, 10 docking repetitions were performed, and the binding pose with the lowest binding affinity among the predicted poses was taken as the final docking solution.

The docking performances of five chaotic maps are compared in Table 4. Using the best-scoring pose among ten repeats as the final solution for each complex, the RMSDs and success rates of PSOVina and PSOVina$^{{\mathrm{2LS}}}$ are consistently better than those of AutoDock Vina. When a chaotic map was employed as a random number generator, variants of chaos-embedded PSOVina$^{{\mathrm{2LS}}}$ show a further improvement in success rate and in most cases also in RMSD. Regarding the average RMSD and success rate, there are more variations, presumably due to the stochastic nature of the docking algorithms. In all cases, PSOVina$^{{\mathrm{2LS}}}$ has the shortest run time. Replacing random numbers with chaotic sequences only introduced minor additional computing cost.

We summarize the overall pose prediction performances of the docking methods in Table 5. PSOVina$^{{\mathrm{2LS}}}$ with sinusoidal map yielded the highest best-scoring pose success rate of 74.62%, followed by Singer map with a rate of 73.21% and logistic map with a rate of 72.72%. AutoDock Vina only achieved a 65.68% success rate, and PSOVina$^{{\mathrm{2LS}}}$ achieved 70.89%. The fastest method is PSOVina$^{{\mathrm{2LS}}}$, which gained an almost sixfold acceleration with respect to AutoDock Vina, while chaos-embedded methods in general achieved a fivefold acceleration in docking.

Therefore, the experimental results presented in this section are strong evidence that chaotic maps can improve the global exploration capability of the PSO algorithm in protein-ligand docking and predict higher-quality docking poses than nonchaotic methods in a shorter amount of time. Specifically, sinusoidal map and Singer map appear to be the best options for the pose prediction considering the tradeoff between accuracy and run time.

Table 4 Docking performance comparison of AutoDock Vina, PSOVina, PSOVina$^{{\mathrm{2LS}}}$, and chaos-embedded PSOVina$^{{\mathrm{2LS}}}$ methods on four pose prediction datasets

Full size table

Table 5 Overall pose prediction performance of AutoDock Vina, PSOVina, PSOVina$^{{\mathrm{2LS}}}$, chaos-embedded PSOVina$^{{\mathrm{2LS}}}$ methods

Full size table

Comparison of virtual screening accuracy and screening speed

To assess the screening performances of the chaos-embedded docking methods, we performed virtual screening experiments using the DUD-E diverse target subset (DIV8) and the nuclear receptor target subset (NR11). Four docking methods, namely, AutoDock Vina, PSOVina, Singer map-embedded PSOVina$^{{\mathrm{2LS}}}$, and sinusoidal map-embedded PSOVina$^{{\mathrm{2LS}}}$, were compared with respect to the values of AUC-ROC, EF, and run time. Only one docking was performed per complex. PSO parameters were the same as in the pose prediction experiments except that more particles were used ($N=16$).

The DIV8 results are presented in Figs. 1 and 2 and Table 6, while the NR11 results are presented in Table 7 and Additional file 1: Figs. S6 and S7. The ROC curves show that all docking methods generated very similar ranking lists except for a few targets (DIV8’s akt1 and NR11’s ppara, ppard, and pparg). For DIV8, the AUC-ROCs averaged across targets are 0.62, 0.62, 0.61 and 0.62 for Vina, PSOVina, Singer, and sinusoidal, respectively. Singer has the largest mean EF_1% of 6.11, followed by PSOVina (5.92), sinusoidal (5.59) and finally Vina (5.38), whereas Autodock Vina has the largest mean EF_20% of 1.93, followed by PSOVina (1.91), sinusoidal (1.81) and Singer (1.77). However, these differences are statistically indistinguishable as suggested by paired Student’s t-test between pairs of the docking methods at the significance level of $\alpha =0.05$ (see Additional file 1: Table S1). Similarly, for NR11 the screening performance of Singer is comparable to AutoDock Vina and PSOVina in terms of AUC-ROC and EF_1%. It performs only slightly worse than AutoDock Vina in EF_20% with a p-value of 0.04199. In contrast, sinusoidal performs slightly worse than AutoDock Vina in both AUC-ROC and EF_20% with p-values of 0.04996 and 0.04755. Therefore, in terms of screening accuracy Singer map-embedded PSOVina$^{{\mathrm{2LS}}}$ is preferable to sinusoidal-map embedded method.

After confirming the screening accuracies of choas-embedded methods, we evaluated their screening speed. Figure 2 and Additional file 1: Fig. S7 show the violin plots of run time used by different methods in screening all compounds in the DIV8 and NR11 datasets, respectively. Notably, while the median run times varied in a wide range of approximately 6–23 s for AutoDock Vina and 6–17 s for PSOVina, the chaos-embedded methods varied in a small range of only approximately 1–3 s in screening the DIV8 dataset. The same observation can be obtained from the run time analysis of the NR11 virtual screening experiments. Taken together two data sets, sinusoidal has the shortest average run time of 2.51 s, followed by Singer of 2.81 s, PSOVina of 12.06 s and AutoDock Vina of 14.93 s. As indicated by the paired Student’s t-test (see Additional file 1: Table S1), the speed improvements of the chaos-embedded methods over AutoDock Vina and PSOVina are very significant at $\alpha =0.01$, achieving an average of five- to sixfold acceleration, where sinusoidal seems slightly faster than Singer at $\alpha =0.05$. As the docking run time is proportional to the number of rotatable bonds of the ligand and the size of the receptor pocket, we further analyzed the median run time with respect to the number of ligand rotatable bonds using the DIV8 and NR11 datasets. As shown in Fig. 3 docking run time of chaos-embedded methods is only minimally affected by the increase of the number of torsions.

Table 6 Results of area under ROC curves (AUC-ROC) and enrichment factor (EF) of virtual screening the DUD-E diverse targets (DIV8) using AutoDock Vina, PSOVina, and chaos-embedded PSOVina$^{{\mathrm{2LS}}}$ with Singer and sinusoidal maps

Full size table

Table 7 Results of area under ROC curves (AUC-ROC) and enrichment factor (EF) of virtual screening the DUD-E nuclear receptor targets (NR11) using AutoDock Vina, PSOVina, and chaos-embedded PSOVina$^{{\mathrm{2LS}}}$ with Singer and sinusoidal maps

Full size table

Conclusion

In this work, we explored the use of chaotic maps to enhance the search capability and speed in docking applications. Based on our previous version of PSOVina$^{{\mathrm{2LS}}}$, chaos-embedded docking algorithms of five popular chaotic maps were implemented. These algorithms were tested using four docking benchmark datasets for ligand pose prediction performance and two DUD-E subsets for virtual screening performance. The results of our analysis showed that chaos-embedded methods are superior in terms of ligand pose RMSD and docking success rate. In particular, Singer-embedded PSOVina$^{{\mathrm{2LS}}}$ gained a significant five- to sixfold acceleration in virtual screening experiments with similar screening accuracies to AutoDock Vina in terms of AUC-ROC and EF. Taken together, our results suggest that chaos-embedded PSOVina$^{{\mathrm{2LS}}}$ methods might be better alternatives than AutoDock Vina in virtual screening. The success of chaotic maps in protein-ligand docking reveals their potential for improving optimization algorithms in other molecular conformational search problems, such as protein structure prediction and folding.

References

Meng X-Y, Zhang H-X et al (2011) Molecular docking: a powerful approach for structure-based drug discovery. Curr Comput Aided Drug Des 7:146–157
Article CAS PubMed PubMed Central Google Scholar
Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inf Model 55:475–482
Article CAS PubMed Google Scholar
Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461
CAS PubMed PubMed Central Google Scholar
Jones G, Willett P et al (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267:727–748
Article CAS PubMed Google Scholar
Morris GM, Goodsell DS et al (1998) Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 19:1639–1662
Article CAS Google Scholar
Chen H-M, Liu B-F et al (2007) SODOCK: swarm optimization for highly flexible protein-ligand docking. J Comput Chem 28:612–623
Article CAS PubMed Google Scholar
Namasivayam V, Günther R (2007) pso@autodock: a fast flexible molecular docking program based on swarm intelligence. Chem Biol Drug Des 70:475–484
Article CAS PubMed Google Scholar
Liu Y, Zhao L et al (2013) FIPSDock: a new molecular docking technique driven by fully informed swarm optimization algorithm. J Comput Chem 34:67–75
Article PubMed Google Scholar
Ng MC, Fong S et al (2015) PSOVina: the hybrid particle swarm optimization algorithm for protein-ligand docking. J Bioinform Comput Biol 13:1541007
Article CAS PubMed Google Scholar
Korb O, Stützle T et al (2009) Empirical scoring functions for advanced protein-ligand docking with PLANTS. J Chem Inf Model 49:84–96
Article CAS PubMed Google Scholar
Uehara S, Fujimoto KJ et al (2015) Protein-ligand docking using fitness learning-based artificial bee colony with proximity stimuli. Phys Chem Chem Phys 17:16412–16417
Article CAS PubMed Google Scholar
Tai HK, Lin H et al (2016) Improving the efficiency of PSOVina for protein-ligand docking by two-stage local search. In: CEC, pp 770–777
Alatas B, Akin E et al (2009) Chaos embedded particle swarm optimization algorithms. Chaos Solitons Fractals 40:1715–1734
Article Google Scholar
Fister I Jr, Perc M et al (2015) A review of chaos-based firefly algorithms: perspectives and research challenges. Appl Math Comput 252:155–165
Google Scholar
Huang L, Ding S et al (2016) Chaos-enhanced Cuckoo search optimization algorithms for global optimization. Appl Math Model 40:3860–3875
Article Google Scholar
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: ICNN, pp 1942–1948
Bansal JC, Singh PK et al (2011) Inertia weight strategies in particle swarm optimization. In: NaBIC, pp 633–640
Ratnaweera A, Halgamuge SK et al (2002) Particle swarm optimization with self-adaptive acceleration coefficients. In: FSKD, pp 264–268
Schuster H (1988) Deterministic chaos: an introduction. Wiley, Hoboken
Google Scholar
Iztok FJ, Perc M et al (2015) A review of chaos-based firefly algorithms: perspectives and research challenges. Appl Math Comput 252:155–165
Google Scholar
Wang L, Zhong Y (2015) Cuckoo search algorithm with chaotic maps. Math Probl Eng 2015:715635
Google Scholar
Chuang L, Hsiao C et al (2011) Chaotic particle swarm optimization for data clustering. Expert Syst Appl 38:14555–14563
Article Google Scholar
Zawbaa HM, Emary E et al (2016) Feature selection via chaotic antlion optimization. PLoS ONE 11:e0150652
Article PubMed PubMed Central Google Scholar
Chuanwen J, Bompard E (2005) A hybrid method of chaotic particle swarm optimization and linear interior for reactive power optimisation. Math Comput Simul 68:57–65
Article Google Scholar
Li P, Xu D et al (2016) Stochastic optimal operation of microgrid based on chaotic binary particle swarm optimization. IEEE Trans Smart Grid 7:66–73
Article Google Scholar
Liu H, Wang X et al (2012) Image encryption using DNA complementary rule and chaotic maps. Appl Soft Comput 12:1457–1466
Article Google Scholar
Chuang L-Y, Yang C-H et al (2013) Operon prediction using chaos embedded particle swarm optimization. IEEE/ACM Trans Comput Biol Bioinform 10:1299–1309
Article PubMed Google Scholar
Chuang L-Y, Moi S-H et al (2016) A comparative analysis of chaotic particle swarm optimizations for detecting single nucleotide polymorphism barcodes. Artif Intell Med 73:23–33
Article PubMed Google Scholar
Gao C, Wang B et al (2016) Multiple sequence alignment based on combining genetic algorithm with chaotic sequences. Genet Mol Res 15:1–10
Google Scholar
May RM (1976) Simple mathematical models with very complicated dynamics. Nature 261:459–467
Article CAS PubMed Google Scholar
Peitgen H-O, Jürgens H et al (1992) Chaos and fractals, vol 199. Springer, Berlin, p 5
Book Google Scholar
Ott E (2002) Chaos in dynamical systems. Cambridge University Press, Cambridge
Book Google Scholar
Zaslavsky G (1978) The simplest case of a strange attractor. Phys Lett A 69:145–147
Article Google Scholar
Zheng W-M (1994) Kneading plane of the circle map. Chaos Solitons Fractals 4:1221–1233
Article Google Scholar
Peterson G (1997) Arnold’s cat map. Math Linear Algebra 45:1–7
Google Scholar
Sinai YG (1972) Gibbs measures in ergodic theory. Russ Math Surv 27:21
Article Google Scholar
Li Y, Liu Z et al (2014) Comparative assessment of scoring functions on an updated benchmark: 1. Compilation of the test set. J Chem Inf Model 54:1700–1716
Article CAS PubMed Google Scholar
Hartshorn MJ, Verdonk ML et al (2007) Diverse, high-quality test set for the validation of protein-ligand docking performance. J Med Chem 50:726–741
Article CAS PubMed Google Scholar
Nissink JW M, Murray C et al (2002) A new test set for validating predictions of protein-ligand interaction. Proteins Struct Funct Bioinf 49:457–471
Article CAS Google Scholar
Mukherjee S, Balius TE et al (2010) Docking validation resources: protein family and ligand flexibility experiments. J Chem Inf Model 50:1986–2000
Article CAS PubMed PubMed Central Google Scholar
Ruiz-Carmona S, Alvarez-Garcia D et al (2014) rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids. PLoS Comput Biol 10:e1003571
Article PubMed PubMed Central Google Scholar
Mysinger MM, Carchia M et al (2012) Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem 55:6582–6594
Article CAS PubMed PubMed Central Google Scholar
Feinstein WP, Brylinski M (2015) Calculating an optimal box size for ligand docking and virtual screening against experimental and predicted binding pockets. J Cheminf 7:18
Article Google Scholar

Download references

Authors' contributions

SWIS and SAJ designed research. HKT performed research. HKT and SWIS wrote the manuscript. All authors read and approved the final manuscript.

Acknowledgements

The authors thank the support of the Faculty of Science and Technology and the Information and Communication Technology Office of University of Macau for the high performance computing facilities.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

Program of PSOVina, its variants and the processed datasets used in this work are publicly available at http://cbbio.cis.umac.mo/software/psovina. Websites where the original datasets were downloaded are listed here for users’ reference: PDBbind-CN http://www.pdbbind.org.cn; Astex http://www.ccdc.cam.ac.uk/support-and-resources/downloads; SB2012 http://ringo.ams.sunysb.edu/index.php/SB2012; DUD-E subsets http://dude.docking.org/subsets. For GOLD dataset, PDB structures were obtained from the RCSB PDB database.

Funding

This work was supported by the University of Macau (Grant Nos. MYRG2015-00212-FST and MYRG2017-00146-FST).

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Macau, Avenida da Universidade, Taipa, Macau, China
Hio Kuan Tai & Shirley W. I. Siu
Bioinformatics Lab, Faculty of Pharmacy, Level 8, FF2 Building, Universiti Teknologi MARA (UiTM), 42300, Bandar Puncak Alam, Selangor, Malaysia
Siti Azma Jusoh

Authors

Hio Kuan Tai
View author publications
You can also search for this author in PubMed Google Scholar
Siti Azma Jusoh
View author publications
You can also search for this author in PubMed Google Scholar
Shirley W. I. Siu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shirley W. I. Siu.

Additional file

Additional file 1.

Dynamical behaviors of chaotic maps, virtual screening results of the DUD-E NR11 subset, and statistical test results of virtual screening performances.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Tai, H.K., Jusoh, S.A. & Siu, S.W.I. Chaos-embedded particle swarm optimization approach for protein-ligand docking and virtual screening. J Cheminform 10, 62 (2018). https://doi.org/10.1186/s13321-018-0320-9

Download citation

Received: 09 June 2018
Accepted: 10 December 2018
Published: 14 December 2018
DOI: https://doi.org/10.1186/s13321-018-0320-9

Chaos-embedded particle swarm optimization approach for protein-ligand docking and virtual screening