AtomicChargeCalculator: interactive web-based calculation of atomic charges in large biomolecular complexes and drug-like molecules

Background Partial atomic charges are a well-established concept, useful in understanding and modeling the chemical behavior of molecules, from simple compounds, to large biomolecular complexes with many reactive sites. Results This paper introduces AtomicChargeCalculator (ACC), a web-based application for the calculation and analysis of atomic charges which respond to changes in molecular conformation and chemical environment. ACC relies on an empirical method to rapidly compute atomic charges with accuracy comparable to quantum mechanical approaches. Due to its efficient implementation, ACC can handle any type of molecular system, regardless of size and chemical complexity, from drug-like molecules to biomacromolecular complexes with hundreds of thousands of atoms. ACC writes out atomic charges into common molecular structure files, and offers interactive facilities for statistical analysis and comparison of the results, in both tabular and graphical form. Conclusions Due to high customizability and speed, easy streamlining and the unified platform for calculation and analysis, ACC caters to all fields of life sciences, from drug design to nanocarriers. ACC is freely available via the Internet at http://ncbr.muni.cz/ACC. Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0099-x) contains supplementary material, which is available to authorized users.


Computational details EEM
Given the 3D structure of a molecule with N atoms and total charge Q, EEM estimates the partial atomic charges q 1 . . . q N and the average molecular electronegativity χ via a set of coupled linear equations: where r i,j is the distance between atoms i and j, and A i and B i are EEM parameters for atom i. The factor k, although originally a unit conversion factor [1], has been exploited in some EEM models as an adjustable parameter (e.g. [2,3]). ACC calculates the interatomic distances r i,j based on the atomic positions in the molecular structure file. The user is required to provide the total charge, or ACC assumes the molecule is neutral (total molecular charge is 0). EEM parameters for each atom type (e.g., carbon, oxygen) present in the molecule are read from a set of EEM parameters suitable for the molecule in question. Many parameter sets published in literature are available in ACC. Alternatively, the user may provide a custom set using ACC's dedicated XML template.

EEM Cutoff
For each atom in the molecule, ACC generates a fragment made up of all atoms within a cutoff radius R of the original atom. The values of the inter-atomic distances and EEM parameters are obtained in the same way as when solving the full EEM matrix. The total fragment charge Q F is a quota of the total molecular charge Q, proportional to the number of atoms in the fragment (N F ), and irrespective of the nature of these atoms: Then ACC solves the EEM matrix equation for this fragment, and returns the charge for the atom used when generating the fragment. The same procedure is applied for all fragments, obtaining a set of charges for all the atoms in the molecule. Then, each atomic charge q i is corrected by the addition of: so that the sum of all atomic charges equals the total molecular charge Q.
In total, for a molecule with N atoms, the EEM Cutoff approach solves N smaller EEM matrices, corresponding to a set of N overlapping fragments of the original molecule. Compared to solving the full EEM matrix equation in the original EEM approach, EEM Cutoff reduces the time complexity of the calculation to O((R 3 ) 3 N + R 3 N log N ), where N is the number of atoms and R is the cutoff radius parameter. The R 3 factor comes from the fact that a spherical fragment of radius R can contain at most R 3 atoms. Therefore, solving the EEM matrix requires O(R 9 ) steps. The O(R 3 N log N ) term represents the complexity of finding the atoms that belong to each fragment. However, for most molecules, there are on average O(R 2 ) atoms in each fragment, effectively reducing the complexity to O((R 2 ) 3 N + R 2 N log N ). The space complexity of the approach is O((R 3 ) 2 + N log N ) , where R 3 again comes from the number of atoms in the fragment (effectively R 2 in most practical cases), and represents the memory required to store the reduced EEM matrix. The N log N term represents the memory requirements of the spatial lookup data structure. EEM Cutoff is efficient for molecules containing at least several thousands of atoms.

EEM Cover
The EEM Cover approach builds on the principles of EEM Cutoff to split the EEM matrix into smaller matrices. However, EEM Cover generates fragments only for a subset of atoms in the molecule. The procedure selects fragment-generating atoms so that: (i) no two such atoms are connected to each other, and (ii) each atom in the molecule has at least one neighbor (within two bonds) which was selected. This procedure ensures that each atom in the molecule will eventually contribute to at least one fragment, and thus the entire volume of the molecule is covered. ACC solves the EEM matrix equation for each fragment, and returns a list of charge contributions for all atoms encountered in the calculations. The charge on each atom in the molecule is then computed as the sum of its charge contributions from all fragments where the atom is present. Further, each atomic charge q i is corrected by the addition of: so that the sum of all atomic charges equals the total molecular charge Q. Given the same cutoff radius R, EEM Cover has the same asymptotic complexity as EEM Cutoff. However, thanks to the procedure of selecting fragment-generating atoms (which requires O(N log N ) steps), the number of EEM matrices that need to be solved during EEM Cover is reduced by at least 50% compared to EEM Cutoff (since, for each atom, at least one neighbor will not be selected).

Default settings Total charge
ACC assumes that all uploaded molecules are neutral unless specified otherwise.

EEM parameter set
ACC choses the default EEM parameter set based on the number of atoms in the largest uploaded molecule, and the chemical elements of atoms in all the uploaded molecules. Specifically, if the largest uploaded molecule contains 255 atoms or fewer, a set with the target organic molecules is chosen. If the largest uploaded molecule contains more than 255 atoms, a set with the target biomolecules is chosen. The chosen set must cover all the chemical elements in all the uploaded molecules, and if more such sets are available, the one with the lowest priority is chosen. If no set covers all chemical elements in the uploaded molecules, then no default set is chosen.

Computation method
ACC chooses the default computation method based on the number of atoms in the largest uploaded molecule. Specifically, if the largest uploaded molecule contains 30,000 atoms or fewer, the default method will be Full EEM, in double precision (64-bit numbers) and with inclusion of potential water molecules. If the largest uploaded molecule contains more than 30,000 atoms, the default method will be EEM Cover with a cutoff radius of 12Å, in double precision and with inclusion of potential water molecules.

Benchmark
The reliability of the EEM Cutoff and EEM Cover approaches was evaluated in a benchmark against the classical procedure where the entire matrix equation is solved (Full EEM). The purpose of the benchmark was twofold: to evaluate the accuracy of the results produced by the newly developed approaches, and to quantify the reduction in computational resources required for these calculations. The structure of each molecule used in the benchmark was downloaded from the PDB, and no additional modifications were performed (adding H, optimizing geometry, etc.). For all jobs, the total molecular charge was 0, and the EEM parameter set employed was EX-NPA_6-31Gdd_gas [3]. The results in Table S1 show that the accuracy of both EEM Cutoff and EEM Cover increases with the cutoff radius, which is to be expected. For the typical application of EEM Cutoff and EEM Cover, the maximum expected deviation per atom is within 0.01e of the corresponding Full EEM calculation. Another observation is that for EEM Cutoff and EEM Cover, the precision used to represent the EEM matrix does not affect the accuracy.
The results in Table S2 suggest that there is no advantage in using the EEM Cover approximation for molecules with fewer than 10,000 atoms. However, with increasing molecular size, the reduction in run time and memory usage becomes significant. The EEM Cover approximation can decrease the requirements for computational resources by one order of magnitude, thus allowing calculations for much larger molecules to run on conventional desktop machines. For example, in the case of the molecules with PDB IDs 3unb and 4v99, the Full EEM would require 73GB and 2.5TB memory, respectively, just to represent the EEM matrix using 64-bit precision numbers.

Limitations
The most important limitation of ACC is inherent to the concept of atomic partial charges. A single number can give an idea about whether there is more electron density around some atoms compared to others, but it cannot characterize the actual distribution of electron density in the space between the atomic nuclei. Thus, all properties which flow from this distribution (e.g., multipole moments) are generally not well described using atomic partial charges of any sort [4]. Further, atomic charge definitions have their own specific limitations (e.g., AIM, NPA and Mulliken charges give poor estimates of electrostatic potentials [5]). The second limitation relates to the empirical nature of the EEM approach. EEM relies on empirical parameters fitted to reference QM data. As such, when employing a particular set of EEM parameters, it is important to consider the nature of the reference QM data, as well as the particular fitting strategy used in the development of the set of EEM parameters. Moreover, EEM incorrectly predicts superlinear scaling of the polarizability with increasing molecule size, leading to underestimation of multipole moments for extended systems like biomacromolecules. In general, one cannot expect that EEM charges will outperform QM charges. Similarly, EEM charges cannot be expected to outperform other empirical charges specifically developed for a certain purpose, as is the case of atomic partial charges optimized for use in simulations with specific force fields (e.g., RESP charges used with Amber force fields, Gromos-like charges used with Gromos force fields, etc.). While extensions to EEM have been proposed [6][7][8][9][10][11][12][13][14], ACC currently implements the classical EEM formalism, due to wider accessibility of EEM parameters for this formalism. One specific limitation exists for the EEM Cover approximation in the event that no EEM parameters are available for some atom types present in the molecule, and the connectivity network in that area of the molecular structure is sparse (typically, when H are missing, and some atoms appear to be bound only to an atom without EEM parameters). In this case, no charges will be calculated for these sparsely connected atoms, even if EEM parameters are available.
The third limitation relates to each particular molecular structure under investigation. The responsibility for suitable input for ACC, i.e., ensuring the structure is complete and in a relevant conformation, and the total molecular charge is reasonable, currently lies with the user. Providing inappropriate input may produce charges which are not representative of the particular phenomenon under study. Further, the size of the input file is limited to 50MB.

Case Study III
The 26S proteasome structure used in this study represents half of the biological assembly, and is made up of 33 subunits. Out of these, 14 subunits form the core particle (subunits beta1-7 and alpha1-7), while 19 subunits form the regulatory particle. Table S5 summarizes these subunits and their charge related properties in states 1 to 3. For each subunit, the total charge was calculated as the algebraic sum of atomic charges for all atoms in that subunit. These are available online at http://ncbr.muni.cz/ACC/CaseStudy/Proteasome.