Skip to content

Advertisement

  • Software
  • Open Access

SPICES: a particle-based molecular structure line notation and support library for mesoscopic simulation

Journal of Cheminformatics201810:35

https://doi.org/10.1186/s13321-018-0294-7

  • Received: 5 January 2018
  • Accepted: 3 August 2018
  • Published:

Abstract

Simplified Particle Input ConnEction Specification (SPICES) is a particle-based molecular structure representation derived from straightforward simplifications of the atom-based SMILES line notation. It aims at supporting tedious and error-prone molecular structure definitions for particle-based mesoscopic simulation techniques like Dissipative Particle Dynamics by allowing for an interplay of different molecular encoding levels that range from topological line notations and corresponding particle-graph visualizations to 3D structures with support of their spatial mapping into a simulation box. An open Java library for SPICES structure handling and mesoscopic simulation support in combination with an open Java Graphical User Interface viewer application for visual topological inspection of SPICES definitions are provided.
Graphical Abstract image

Keywords

  • Molecular structure representation
  • Line notation
  • Mesoscopic simulation
  • Dissipative Particle Dynamics
  • DPD

Background

A molecular simulation task comprises three successive steps: The definition of a simulation job with all necessary input information (preparation step), the actual loop over discrete integration time steps to numerically solve the equations of motion (the actual simulation step) and the analysis of the simulation record with all calculated results (evaluation step). The first (preparation) step of this triad has to provide data structures that can be leveraged by the algorithms of the second (simulation) step in an optimized manner to allow for a maximum performance of their interplay. This is commonly achieved by definition of adequate sets of arrays that encode all necessary molecular information like spatial positions or bonds of the interacting entities. The content of these arrays is usually provided by large tabular ASCII files that are often (at least partly) edited by hand. An example of these ASCII files may be found at [1] for 1,2-Dimyristoyl-sn-glycero-3-phosphocholine (DMPC) phospholipid molecules of a bilayer-membrane simulation task where each line contains an interacting entity, its spatial x,y and z coordinates, line offsets to bonded entities and specific indices for additional force assignments. The manual creation of these machine-oriented contents is not only a tedious but an error-prone type of work: For all but the simplest molecular ensembles errors are likely to be generated that may spoil the whole simulation process. Thus there is a valid necessity to prevent mistakes by safeguarded operations and to reduce manual preparation overhead by adequate automation.

Cheminformatics aims at supporting efficient and errorless human–machine interfaces where adequate molecular structure representations (line notations, connection tables, XYZ tables or Z-matrices, fragment codes or fingerprints, file formats like MOL file or PDB file) are at heart of the discipline [2]. The majority of existing structure representations are atom-based descriptions that comprise characteristic properties and topological or spatial aspects concerning a molecule’s atomic composition [2, 3] with additional approaches towards fragment-based molecular representations especially for polymers [48]. In order to support the preparation step of a molecular simulation task cheminformatics methods allow for an effective interplay of different levels of molecular encoding that are constitutive for a comfortable and safe human–machine interface (see Fig. 1): The topological structural formula is a common way used by molecular scientists to represent a chemical compound (e.g. drawn by hand with a structure editor or manually selected from structure repositories). Alternatively the compound may be represented by a textual line notation—where the interplay between structural formula and line notation may be realized by mutual conversion methods like an adequate structure diagram layout. The following transition from topological representations to 3D structures allows for the final mapping to their spatial positions within a simulation box which completes the preparation step. All prepared information may then be stored in form of the tabular ASCII files sketched above as an input for the actual simulation step.
Fig. 1
Fig. 1

Interplay between different encoding levels of molecular structures for a preparation step of a molecular simulation task (with examples of this work, compare Figs. 2, 4 and 5). a Structural formula of a DMPC phospholipid. b SPICES line notation of the particle-based topological DMPC structure with its corresponding structure diagram layout/particle graph and illustration of the particle bonds. c Conversion of the topological particle structure to a compressed 3D tube geometry plus spatial mapping into an oriented bilayer compartment of the simulation box

In order to contribute to the realization of a molecular fragment cheminformatics roadmap [9] this work tries to alleviate molecular structure handling and encoding for particle-based mesoscopic simulation techniques like Dissipative Particle Dynamics (DPD) [1014]: These techniques aim at describing supramolecular phenomena at the nanometer (length) and microsecond (time) scale for large interacting physical ensembles representing millions of atoms. DPD particles in particular may be identified with distinct small molecules of molar mass in the order of 100 Da where larger molecules are composed of adequate “molecular fragment” particles that are bonded by harmonic springs to mimic covalent connectivities and spatial 3D conformations [9, 1420]. Since no unique molecular fragmentation scheme exists for the various mesoscopic simulation approaches there is nothing like a universal particle set. An adequate decomposition of a chemical compound into appropriate “molecular fragment” particles is a kind of artisan craftwork which is guided by experience, empirical rules and field of application. Figure 2 demonstrates a possible fragmentation for a DMPC phospholipid that successfully preserves its amphiphilic characteristics [20].
Fig. 2
Fig. 2

Decomposition of the DMPC phospholipid into “molecular fragment” particles [20] and illustration of the resulting bonded particles (upper left) with corresponding SPICES line notation (upper right): The SpicesViewer GUI generated visual particle graph surrounds the “molecular fragment” particle identification

Key part of this work is a set of methods operating on an intuitive line notation for particle-decomposed molecular structures denoted SPICES (Simplified Particle Input ConnEction Specification). The SPICES design is derived from straightforward simplifications of the well-established SMILES representation for atom-based molecular connectivity [2123]. The set of SPICES related methods supports the interplay of structural encoding levels (compare Fig. 1) as well as structure-based calculations for mesoscopic simulations (length and time scales, simulation box size, compound concentrations etc.): It allows for parsing and (graphically) analyzing the line notations, topological calculations (e.g. particle frequencies, particle neighbors or particle paths) as well as the generation of corresponding 3D particle structures with support of their spatial mapping into the simulation box and the final output of tabular ASCII files with molecular information for the following simulation step (the construction of the tabular ASCII file at [1] was in fact supported by the SPICES related code of this work).

Concept, feature overview and implementation details

The SPICES implementation extends the fragment structure representation proposal in [9]. The syntax rules for a correct SPICES line notation together with some helpful comments are outlined in the appendix. These rules allow arbitrary topological particle connections with branches and ring closures but do not comprise attributes like electric charges or chiral centers since these are intrinsic particle properties (i.e. differently charged states or different enantiomers of a “molecular fragment” particle have to be coded with different particles where each particle has a specific charge and a specific stereochemistry). Particles may possess a “backbone” label which may be utilized to assign specific particle pair forces e.g. for spatial 3D structure constraints of ring structures (see Fig. 3), the tail stiffness of surfactants and lipids or the backbone conformation of macromolecules like proteins. This kind of labeling could be performed in an automated manner by attaching a tagging label to every particle (which in fact was our first approach) but according to our findings the user control of the “backbone” label distribution within a molecule alleviated possible manual force assignments as well as the interplay between the textual line notation and the corresponding visual particle graph. In addition the concrete force assignments are chosen to be not a part of the line notation itself due to their intrinsic differences (from simple springs to e.g. complicated polygonal force chains) and possible automated conditional assignments according to various criteria. Thus the manual “backbone” labels allow for a flexible post-processing for different purposes in the aftermath of molecular definitions.
Fig. 3
Fig. 3

Cholesterol fragmentation scheme with SPICES line notation (at the bottom). The specified backbone labels ‘1’ to ‘17’ allow for an assignment of specific inter-particle forces (e.g. the exemplarily shown harmonic springs between particles Me’12’ and Me’15’, Me’10’ and Me’13’ and Me’4’ and Me’7’) in order to control the stiffness of molecular structure elements like the cholesterol ring structure

A SPICES representation may contain multiple independent parts (with each part being a valid molecule), e.g. to represent aggregated molecular structures like the quaternary structure of proteins. Finally a [START] and an [END] tag may be attributed for spatial orientation in the simulation box, see Figs. 2, 4 and 5.
Fig. 4
Fig. 4

Top: Phospholipid DMPC fragmentation scheme [20] with 16 particles connected by harmonic springs (compare Fig. 2). Bottom: For spatial mapping into the simulation box the topological DMPC particle structure is converted to a linear 3D tube along the [START]/[END] tagged main chain where side-chain particles are collapsed onto the spatial positions of their neighbored main-chain particles, i.e. the second spatial position to the right contains 8 particles with the exact same position: The main-chain particle DMPN and the side-chain particles MeAc and 6 Et

Fig. 5
Fig. 5

Simulation box start geometry with random distribution (left) or bilayer orientation (right) of phospholipid DMPC molecules as linear 3D tubes (see Figs. 1, 2 and 4). Color code of particles: Et (olive), MeAc (orange), DMPN (red), TriMeNP (blue)

The Spices.jar library supports all aspects of SPICES definition and handling. A Spices object may be created with at least an input structure string or in combination with additional information like a map of available particles. A syntax parser analyzes the provided line notation and returns detailed syntax error information if necessary by the methods isValid and getErrorMessage. SPICES properties like the frequency of particles or complete lists of particle neighbors are evaluated upon user request by the methods getParticleFrequencies or getNextNeighbors.

A function of specific importance is the spatial projection of topological SPICES into a simulation box to set up adequate start geometries. Since a mesoscopic simulation is driven by soft particle potentials (in contrast to atomic hard core repulsions for e.g. molecular dynamics), different particles may occupy the same exact spatial position (which would lead to infinite forces for hard atomic potentials) as well as penetrate each other. Thus the possibly severe problems of particle entanglements or caging effects due to inadequate start geometries are considerably attenuated [24]. Nonetheless, a more favorable initial configuration may considerably reduce the necessary simulation period. A straightforward approach is a spatial linear tube representation [9] as shown in Figs. 4 and 5: The longest linear particle chain in the molecule is determined and its particles are consecutively lined up along a straight line according to the specified bond length (which may be squeezed to fit into specific compartments like simulation box layers, see below and Fig. 5). Then all branched side particles are collapsed onto their nearest-neighbor particle on this line. For a fast determination of a sufficiently long linear particle chain, the Depth-First Search (DFS) algorithm is used [25]. Starting from the first particle of the SPICES line notation the maximum-distant particle A is evaluated by a first DFS run. With a second DFS run, the maximum-distant particle B from particle A is determined. Finally the particle chain between A and B is chosen for the spatial tube representation. If a [START]/[END] tag pair is defined the longest (oriented) linear chain between the tagged particles is evaluated. The sketched algorithm leads to true longest chains for acyclic SPICES but not necessarily for cyclic particle structures. For a distinct fragmentation scheme of a molecule there may be several different but equally valid SPICES line notations since the proposed line notation is not canonically unique. For acyclic SPICES with a defined [START]/[END] tag pair the sketched 3D tube construction process will lead to a single distinct spatial 3D tube representation for all these possible different line notations (without a defined [START]/[END] tag pair there may be two possible orientations). For cyclic particle structures this may not be the case, i.e. different but equally valid SPICES line notations may lead to different spatial 3D tube representations and corresponding different start geometries of a simulation. According to our experience this shortcoming is of minor practical relevance since the possibly different 3D tube representations for small molecules seem to be sufficiently similar for convergent mesoscopic simulation results. On the other hand, for large complex molecules like cross-linked (bio)polymers the simple linear 3D tube representation is questionable in principal so that specific conversion tools like a PDB-to-SPICES parser for peptides and proteins would be advised which would take the known molecular 3D structure into account.

The sketched spatial projection (see Fig. 5) is accomplished by interplay of the methods setCoordinates and getParticlePositionsAndConnections: After creation of a Spices object from a SPICES line notation string (which is rapidly performed within a fraction of a second for small molecules like DMPC) arrays for the first (start) and the last (end) particle positions of all spatial linear 3D tubes as well as the bond length may be provided via the setCoordinates method. The first (start) particles of the linear chains always have the defined start positions whereas the last (end) particles may not necessarily reach the defined end positions if the length of the defined start/end straight line is longer than the accumulated bond lengths of the particles on the longest linear chain so that a 3D tube may be smaller than defined. On the other hand a 3D tube may be squeezed (with equally reduced bond lengths) if the length of the defined start/end straight line is smaller than the accumulated bond lengths. Thus the calling code (e.g. a compartment editor that allows for flexible compartment definitions within the simulation box like the bilayer compartment shown right in Fig. 5) must only define correctly-oriented and valid lines within an arbitrary compartment (which is comparatively simple to realize) without the necessity to calculate and pre-check every individual length (which could be more difficult). Method getParticlePositionsAndConnections then provides all corresponding particle positions within the simulation box where in addition all particle–particle bonds are coded with specific offsets which are commonly used by simulation kernels (compare to the tabular ASCII file at [1]). The sketched interplay of methods setCoordinates and getParticlePositionsAndConnections performs sufficiently fast for true on-the-fly calculations, e.g. a spatial projection of 50.000 DMPC molecules (with 800.000 particles) into the simulation box performs in less than a second using an ordinary scientific workstation or even a standard notebook computer.

Whereas line notations may be regarded as a reasonable compromise for a human–machine interface (readable by human beings, decomposable by machine) their definitions are error-prone for complex branched or ring structures. A visual display of the topological particle graph with all its particle–particle connections may considerably alleviate a correct SPICES definition, see Fig. 6.
Fig. 6
Fig. 6

SpicesViewer graph display (right) of the cyclotide Kalata B1 (upper left) with 29 amino acids according to the fragmentation scheme in [20] (lower left)

A graphical visualization may be achieved by adequate application of open-source projects that provide chemical structure drawing capabilities. For instance the structure-diagram layout of the Chemistry Development Kit (CDK) [2628] can be customized to display SPICES instead of atom-based connection topologies [9]. A principle problem of this (mis)use of atom-based layouts is the inappropriateness of its layout elements and templates: Particle graphs do not follow common patterns of atomic connections (see Fig. 6) so that topological visualizations may result in incomprehensible graphs. Thus a more general graph visualization approach with e.g. the GraphStream library [29] is necessary. In addition this library allows individually tailored changes of the produced graph by manual displacement of node positions to remove unwanted node or edge overlaps. SpicesViewer.jar is a GUI application (on top of Spices.jar and connection library SpicesToGraphStream.jar) for a topological SPICES display with the GraphStream library to analyze the influence of different graph settings and to demonstrate computational functions like zooming or graph image generation. Figure 6 shows the SpicesViewer.jar GUI with a manually tailored SPICES graph visualization of the cyclic peptide Kalata B1 with 29 amino acids.

Conclusions

This work provides a Java library for SPICES handling and mesoscopic simulation support (Spices.jar) in combination with a connection library (SpicesToGraphStream.jar) and a Java Graphical User Interface (GUI) viewer application (SpicesViewer.jar) for visual topological inspection and manipulation of SPICES molecule definitions. All libraries/applications are publicly available as open source published under the GNU General Public License version 3 [30]. The SPICES GitHub repository contains the Java bytecode libraries, a Windows OS installer for the SpicesViewer GUI application, all Javadoc HTML documentations [31] and the Netbeans [32] source code packages including Unit tests.

The presented set of methods may alleviate molecular structure definitions for mesoscopic simulation tasks. The SpicesViewer GUI application demonstrates relevant use cases in detail with corresponding sample code. The new libraries may be utilized within scripting environments or become part of integrated mesoscopic simulation systems.

Future developments may address SPICES parsers that especially support the more difficult preparation of polymer systems, e.g. a PDB-to-SPICES parser for peptides and proteins provided in form of PDB files (actually, the SPICES string of the Kalata B1 peptide in Fig. 6 was generated from its PDB file with a prototype parser that uses the amino acid fragmentation schemes and connection rules outlined in [20]). Another promising challenge would be a conversion between particle and all-atom representations for an interplay of atomistic and mesoscopic simulation.

Declarations

Authors’ contributions

KvdB, MD, JS and AZ designed, implemented and tested the SPICES related code. ME, HK and AZ conceived the SPICES approach and lead the project development. All authors read and approved the final manuscript.

Acknowledgements

The authors like to thank the GraphStream dynamic graph library development and project team, the Apache Commons contributors as well as the reviewers for helpful suggestions—especially the catchy SPICES acronym—and Noel O’Boyle for stimulating discussions. The support of GNWI—Gesellschaft für naturwissenschaftliche Informatik mbH, Oer-Erkenschwick, Germany, is gratefully acknowledged.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

SPICES repository at https://github.com/zielesny/SPICES.

Availability and requirements

Project name: SPICES. Project home page: SPICES repository at https://github.com/zielesny/SPICES. Operating system(s): Platform independent. Programming language: Java. Other requirements: Java 1.8 or higher. License: GNU General Public License version 3.

Ethics approval and consent to participate

Not applicable.

Funding

Not applicable.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Authors’ Affiliations

(1)
Inorganic Chemistry and Center for Nanointegration Duisburg-Essen (CeNIDE), University of Duisburg-Essen, Essen, Germany
(2)
Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665 Recklinghausen, Germany
(3)
CAM-D Technologies, Solingen, Germany

References

  1. Text file (2018) PositionsBonds1.txt. https://github.com/zielesny/Jdpd/tree/master/src/de/gnwi/jdpd/tests/test_DMPC. Accessed 16 June 2018
  2. Engel T, Gasteiger J (eds) (2018) Chemoinformatics: basic concepts and methods. Wiley, WeinheimGoogle Scholar
  3. Engel T, Gasteiger J (eds) (2018) Applied chemoinformatics: achievements and future opportunities. Wiley, WeinheimGoogle Scholar
  4. Siani MA, Weininger D, Blaney JM (1994) CHUCKLES: a method for representing and searching peptide and peptoid sequences on both monomer and atomic levels. J Chem Inf Comput Sci 34(3):588–593View ArticleGoogle Scholar
  5. Siani MA, Weininger D, James CA, Blaney JM (1995) CHORTLES: a method for representing oligomeric and template-based mixtures. J Chem Inf Comput Sci 35(6):1026–1033View ArticleGoogle Scholar
  6. Drefahl A (2011) CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. J Cheminf 3:1View ArticleGoogle Scholar
  7. Zhang T, Li H, Xi H, Stanton RV, Rotstein SH (2012) HELM: a hierarchical notation language for complex biomolecule structure representation. J Chem Inf Model 52(10):2796–2806View ArticleGoogle Scholar
  8. Dufresne Y, Noé L, Leclère V, Pupin M (2015) Smiles2Monomers: a link between chemical and biological structures for polymers. J Cheminf. 7:62View ArticleGoogle Scholar
  9. Truszkowski A, Daniel M, Kuhn H, Neumann S, Steinbeck C, Zielesny A, Epple M (2014) A molecular fragment cheminformatics roadmap for mesoscopic simulation. J Cheminf 6:45View ArticleGoogle Scholar
  10. Hoogerbrugge PJ, Koelman JMVA (1992) Simulating microscopic hydrodynamic phenomena with dissipative particle dynamics. Europhys Lett 19(3):155–160View ArticleGoogle Scholar
  11. Koelman JMVA, Hoogerbrugge PJ (1993) Dynamic simulations of hard-sphere suspensions under steady shear. Europhys Lett 21(3):363–368View ArticleGoogle Scholar
  12. Espanol P, Warren P (1995) Statistical mechanics of dissipative particle dynamics. Europhys Lett 30(4):191–196View ArticleGoogle Scholar
  13. Espanol P (1995) Hydrodynamics from dissipative particle dynamics. Phys Rev E 52(2):1734–1742View ArticleGoogle Scholar
  14. Groot RD, Warren P (1997) Dissipative particle dynamics: bridging the gap between atomistic and mesoscopic simulation. J Chem Phys. 107(11):4423–4435View ArticleGoogle Scholar
  15. Groot RD, Madden TJ (1998) Dynamic simulation of diblock copolymer microphase separation. J Chem Phys 105(20):8713–8724View ArticleGoogle Scholar
  16. Ryjkina E, Kuhn H, Rehage H, Müller F, Peggau J (2002) Molecular dynamic computer simulations of phase behavior of non-ionic surfactants. Angew Chem Int Ed 41(6):983–986View ArticleGoogle Scholar
  17. Schulz SG, Kuhn H, Schmid G, Mund C, Venzmer J (2004) Phase behavior of amphiphilic polymers: a dissipative particles dynamics study. Colloid Polym Sci 283:284–290View ArticleGoogle Scholar
  18. Truszkowski A, Epple M, Fiethen A, Zielesny A, Kuhn H (2013) Molecular fragment dynamics study on the water–air interface behavior of non-ionic polyoxyethylene alkyl ether surfactants. J Colloid Interface Sci 410:140–145View ArticleGoogle Scholar
  19. Vishnyakov A, Lee M-T, Neimark AV (2013) Prediction of the critical micelle concentration of nonionic surfactants by dissipative particle dynamics simulations. J Phys Chem Lett. 4:797–802View ArticleGoogle Scholar
  20. Truszkowski A, van den Broek K, Kuhn H, Zielesny A, Epple M (2015) Mesoscopic simulation of phospholipid membranes, peptides, and proteins with molecular fragment dynamics. J Chem Inf Model 55:983–997View ArticleGoogle Scholar
  21. Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36View ArticleGoogle Scholar
  22. Weininger D, Weininger A, Weininger JL (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci 29(2):97–101View ArticleGoogle Scholar
  23. Weininger D (1990) Smiles. 3. Depict. Graphical depiction of chemical structures. J Chem Inf Comput Sci 30(3):237–243View ArticleGoogle Scholar
  24. Groot RD (2003) Electrostatic interactions in dissipative particle dynamics—simulation of polyelectrolytes and anionic surfactants. J Chem Phys 118(24):11265–11277View ArticleGoogle Scholar
  25. Wayne R, Sedgewick K (2011) Algorithms. Chapter 4: Graphs, 4th edn. Addison-Wesley, BostonGoogle Scholar
  26. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen EL (2003) The Chemistry Development Kit (CDK): An open-source java library for chemo- and bioinformatics. J Chem Inform Comput Sci 43(2):493–500View ArticleGoogle Scholar
  27. Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) Recent Developments of the Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. Curr Pharm Des 12(17):2111–2120View ArticleGoogle Scholar
  28. Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluska T, Rojas-Chertó M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C (2017) The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 9:33View ArticlePubMedPubMed CentralGoogle Scholar
  29. GraphStream: A dynamic graph library. http://graphstream-project.org. Accessed 16 June 2018
  30. GNU General Public License. http://www.gnu.org/licenses. Accessed 16 June 2018
  31. Javadoc documentation. http://www.oracle.com/technetwork/java/javase/documentation. Accessed 16 June 2018
  32. NetBeans IDE Version 8.2. https://netbeans.org. Successor: https://netbeans.apache.org. Accessed 16 June 2018

Copyright

© The Author(s) 2018

Advertisement