Jmol SMILES and Jmol SMARTS: specifications and applications

Background SMILES and SMARTS are two well-defined structure matching languages that have gained wide use in cheminformatics. Jmol is a widely used open-source molecular visualization and analysis tool written in Java and implemented in both Java and JavaScript. Over the past 10 years, from 2007 to 2016, work on Jmol has included the development of dialects of SMILES and SMARTS that incorporate novel aspects that allow new and powerful applications. Results The specifications of “Jmol SMILES” and “Jmol SMARTS” are described. The dialects most closely resemble OpenSMILES and OpenSMARTS. Jmol SMILES is a superset of OpenSMILES, allowing a freer format, including whitespace and comments, the addition of “processing directives” that modify the meaning of certain aspects of SMILES processing such as aromaticity and stereochemistry, a more extensive treatment of stereochemistry, and several minor additions. Jmol SMARTS similarly adds these same modifications to OpenSMARTS, but also adds a number of additional “primitives” and elements of syntax tuned to matching 3D molecular structures and selecting their atoms. The result is an expansion of the capabilities of SMILES and SMARTS primarily for use in 3D molecular analysis, allowing a broader range of matching involving any combination of 3D molecular structures, SMILES strings, and SMARTS patterns. While developed specifically for Jmol, these dialects of SMILES and SMARTS are independent of the Jmol application itself. Conclusions Jmol SMILES and Jmol SMARTS add value to standard SMILES and SMARTS. Together they have proven exceptionally capable in extracting valuable information from 3D structural models, as demonstrated in Jmol. Capabilities in Jmol enabled by Jmol SMILES and Jmol SMARTS include efficient MMFF94 atom typing, conformational identification, SMILES comparisons without canonicalization, identification of stereochemical relationships, quantitative comparison of 3D structures from different sources (including differences in Kekulization), conformational flexible fitting, and atom mapping used to synchronize interactive displays of 2D structures, 3D structures, and spectral correlations, where data are being drawn from multiple sources. Electronic supplementary material The online version of this article (doi:10.1186/s13321-016-0160-4) contains supplementary material, which is available to authorized users.


Background
The Simplified Molecular-Input Line-Entry System (SMILES) [1][2][3] and SMILES Arbitrary Target Specification (SMARTS) [4,5] have been of tremendous value in the area of cheminformatics. Developed in the late 1980s, these languages have found wide application, particularly in relation to small primarily organic molecules. In addition, SMILES has been extended in the form of CHUCKLES [6] and CHORTLES [7] (an extension of CHUCKLES), both for biopolymers, and CurlySMILES (an annotated version of SMILES) [8]. Alternatives to SMARTS-based molecular querying include Sybyl Line Notation (SLN) [9,10], which itself is an adaption of SMILES, the relatively underdeveloped Molecular Query Language (MQL) [11], and the XML-based Chemical Subgraphs and Reactions Markup Language (CSRML) [12]. And certainly programs such as Jmol [13], PyMOL [14], VMD [15], and Chimera [16] all have some sort of native selection language. Some of these languages have very powerful methods of matching molecular structures or substructures with query criteria. This article focuses on the development of SMILES and SMARTS dialects that can be used specifically in the context of a 3D molecular visualization environment to answer not only the typical questions such as whether two structures and/or SMILES strings match or whether a particular 3D structure and/or SMILES string contain some particular 3D substructure (practical examples 1 and 2, below), but also more challenging questions (practical examples 3-8, below) such as: • Given two 3D structures, what is their isomeric relationship? • Given two 3D structures from two different sources, how quantitatively similar are they? • How can I align two 3D models in order to visualize their similarity? • What would I need to do to the given conformation of Structure A to match it conformationally with Structure B? or with some substructure within B? • Given a 3D structure, what is its conformation? For example, if it is a cyclohexane, is it in the chair or boat form? Are substituents axial or equatorial? • How can I correlate 2D and 3D chemical structures from different sources? For example, how can I correlate a given 2D or 3D structure with a simulated NMR spectrum?
In this article I introduce adaptations to SMILES and SMARTS that address all of these questions, allowing them to be answered immediately and definitively. In the case of on-line browser-based applications, these answers can be obtained completely within the standard browser client, without access to external dedicated cheminformatics services. While the development of Jmol SMILES and Jmol SMARTS was-not surprisingly-Jmol, it is important to emphasize that nothing that is presented here is limited to use in Jmol. All of the additions to SMILES and SMARTS presented are simple and straightforward. The success of implementing Jmol SMILES and Jmol SMARTS within Jmol simply provides an example of the continued power of SMILES and SMARTS in the cheminformatics open-source community.

Implementation
The context for this work is Jmol, a widely used opensource community-driven program for the visualization and analysis of molecular structure [12]. Jmol has been used in a broad range of contexts, including small organic and inorganic molecules, biomolecules, and crystallographic structures crossing the boundaries of biology, chemistry, physics, and materials science. The Jmol application is written in Java and implemented (in parallel) in both Java and JavaScript. It is available in three formats: as a stand-alone desk-top or batch-driven Java program, a Java applet, and an HTML5 JavaScript-only equivalent (JSmol). The reference implementation for this article is Jmol 14.6.1_2016.07. 11.
The dialects of SMILES and SMARTS implemented here are referred to as "Jmol SMILES" and "Jmol SMARTS" respectively, but there is nothing specific to Jmol in those descriptions. As such, Jmol SMILES and Jmol SMARTS could be implemented if desired in any 3D molecular visualization platform, such as PyMOL, VMD, or Chimera. Jmol SMILES most closely resembles OpenSMILES [3], while Jmol SMARTS is based on OpenSMARTS [5]. Jmol SMILES is a superset of OpenSMILES, allowing a freer format, with optional comments and whitespace, optional "processing directives" that specify the meaning of certain aspects of SMILES processing such as aromaticity, a more complete treatment of stereochemistry, and several other minor additions. Jmol SMARTS similarly adds these same modifications to OpenSMARTS, as well as several additional "primitives" and elements of syntax specifically tuned to the investigation of 3D structural models.
To keep this in perspective, imagine that we have before us a single molecular structure. Perhaps it is a structure loaded into JSmol on a web page, perhaps from a student drawing a 2D structure with an editor. The developer of the page may not have any a priori information about what structure is present. Did the student draw a ketone (as was requested, perhaps)? Did they properly identify the diene and dienophile in a Diels-Alder reaction? These are the sorts of questions that Jmol is capable of investigating, and for which SMILES and SMARTS matching can be extremely valuable. In addition, we will see that the real power in the use of SMILES and SMARTS in a program such as Jmol can be behind the scenes, totally hidden from the user, powering the functionality that to the user appears simple, nearly instantaneous, and possibly almost magical.
To understand the significance behind the development of Jmol SMILES and Jmol SMARTS (as opposed to just using standard versions of such), it is important to understand a little about how Jmol works. When loading chemical structures, Jmol creates a linear array of N atoms starting with index 0 and going through index N − 1. These atoms may all represent one model, where a "model" could be a single protein structure, or an organic molecule, or a crystal structure. Thus, a "model" in Jmol is a sequential set of atoms. When there are multiple models, they might be from a single source (an external database or a locally saved structure), or they may be from different sources (one from PubChem [17], the other from NCI/CADD [18]); they may be multiple models from the loading of a single file or several files; one might be drawn by a student using a web-based 2D drawing app; the other a 3D reference the student may or may not have access to). Whatever the case, we are interested in answering questions that correlate the given 3D representation of the model with one or more other representations-perhaps a SMILES string, a SMARTS pattern, a 2D structural model, or another 3D model.
While this paper is not meant to be a Jmol tutorial, some explanation of the Jmol examples is in order. Notation such as {2.1} in the tables and discussion below refers to a model-in this case, "the atoms associated with the first model in the second file loaded. " Notation ({0:24}) refers to the first 25 atoms in Jmol's atom array. ({0 5}) refers to two selected atoms. Words in CAPITALS such as LOAD, SELECT, PRINT, and SHOW, are Jmol command tokens; words in lower case followed by parentheses, such as search(…), smiles(…), compare(…), and find(…) are Jmol functions. This capitalization is just a convention for this paper; capitalization in Jmol for commands tokens, variable names, and function name is not significant. So SELECT {2.1} selects all atoms in the first model of the second file loaded, as does select {2.1}. Functions smiles(…) and search(…) are Jmol functions specifically requesting SMILES and SMARTS searches, respectively. For example, the command SELECT search("a") selects all aromatic atoms, and the command SELECT on search("a") highlights them. Some commands, such as search(…), smiles(…), and find(…) can be applied to atom sets in Jmol math expressions. For example, carbonyl = {1.1}.search("C=O"), after which the variable carbonyl can be used in a SELECT command: SELECT @carbonyl. The find(…) command has broad utility, but in this context we will see it used for comparing any combination of 3D model and/or string data using SMILES or SMARTS. Thus, x = {1.1}.find("SMARTS", "a") is synonymous with x = {1.1}.search("a"), and also we can have {1.1}.find("SMILES", "C(C)OCC"), "CCOCC".find("SMARTS", "COC"), and "CCOCC". find("SMILES", "C(C)OCC"). The commands SHOW SMILES and PRINT {molecule=1}.find("SMILES") display SMILES strings-the first for the current selection; the second for the first molecule (in a model with more than one molecule).

Jmol SMILES (Tables 1-3)
In terms of SMILES for small molecules, Jmol's implementation is a superset of OpenSMILES (Table 1). Thus, all valid OpenSMILES strings are also valid Jmol SMILES strings. All of the basic aspects of OpenSMILES are part of Jmol SMILES, including: • Allowed unbracketed element symbols include B, C, N, O, P, S, F, Cl, Br, and I. Jmol SMILES adds H to this list of allowed unbracketed atoms.
• Bracketed atom notation adheres to the required ordering [<mass>symbol<stereo><hcount><charge> <:class>], where <mass> is an optional atomic mass, symbol is an element symbol or "*" (unspecified atom, with unspecified mass), <stereo> is an optional stereochemical isomer descriptor given in Table 2, <hcount> is an optional implicit hydrogen atom count, <charge> is an optional formal charge in the form (−1, +1, −2, +2, etc.) or (-, +, --, ++, etc.), and <:class> is an optional non-negative integer preceded by a colon. • Possible aromatic elements, indicated in lower case, include b, c, n, o, p, s, as, and se. Depending upon the directive, however, any element other than hydrogen may be allowed to be aromatic. This set is specific to /open/ with or without /strict/. • Connections (indicated as a single digit 0-9 or "%" followed by a two-digit number) with their optional bond type preceding them, must follow bracketed or unbracketed atom symbols immediately. Connections may span no-bond indicators (". "). Jmol SMILES expands this to allow any positive number to be used as a connection number. • Branches, indicated in parentheses, follow connections, with their optional bond type as the first character after the opening parenthesis. • Bond types include -, =, # (triple), $ (quadruple), ":" (colon; aromatic, never significant), and ". " (period, indicating no connection), as well as the cis/trans double-bond stereochemical indicators/, and \. Single bonds between aromatic atoms indicate biaryl connections.
Jmol SMILES adds several more features as well, as shown in Tables 1, 2 and 3. These include more flexible formatting, processing "directives", the atomic symbol Xx (used in quantum mechanics computational programs to indicate a reference point that is not part of the chemical structure), unlimited connection numbers, and more extensive handling of stereochemistry, including stereochemical designations for odd-and even-cumulenes, imines, and carbodiimides, as well as trigonal pyramidal, T-shaped, and see-saw molecular shapes. The bond notations ^nm-and ^^nm-indicate atropisomerism.
Jmol SMILES general additions ( Table 1) In terms of formatting, the only difference is that Jmol SMILES allows for comments and whitespace. Whitespace in Jmol SMILES simply allows more flexibility and a more human-readable string; comments allow annotation of the created strings with information about the program used to generate it or whatever is relevant to the designer of the system. In addition, Jmol SMILES includes an optional prefix, set off by matching forward slash characters, that gives directives to a processor that specify how the SMILES string is to be interpreted (see below). It is simple enough to convert these annotated Jmol SMILES strings to more standard SMILES. One simply strips out the directives, comments, and white space. Jmol itself simply strips out all comments in a preprocessing step and ignores all whitespace, as there is no context in Jmol SMILES where whitespace is relevant.  Fig. 1. The comments allow us to quickly correlate specific atoms in the structure with specific atoms in the SMILES string. We can see that the sequence N1-C2-C13-O14-C12-C7-N5-C6-C3-O4 is working its way clockwise around the six-membered ring, and N10-C11-C9-N8 are the added four atoms forming the five-membered ring, completing the structure.
The other additions shown in Table 1 simply broaden the range of applications of SMILES. Jmol SMILES allows for "dummy atoms" such as those sometimes found in quantum mechanics calculations to be introduced as [Xx]. They have atom number 0 and match only [Xx] and [#0], not "any atom. " The %(n) syntax allows connection numbers greater than 99. While having 100 open connections may seem impossible, and using large numbers is certainly not recommended in general, this feature is included at this time because it is of use in extensions of Jmol SMILES to be described in a future publication. Jmol SMILES allows for the option of more atoms being aromatic, for example when an aromaticity model does not involve bonding analysis or electron counting.
Finally, by allowing for double bonds between aromatic atoms, we can specify that double bonds in the pattern must also be present in the model or SMILES string being compared. That is, a successful match requires a specified Kekulé form of an aromatic system. It can be used to check to see if models from two different sources have the same Kekulé form. For example, 2-methylpyridine models retrieved from NCI/CADD and PubChem have different Kekulé forms. We need aromaticity models to compare them, but we still might want to distinguish them. The Jmol SMILES string [n]1ccccc1(C) will match both, but [n]1=cc=cc=c1(C) will match only the one from PubChem.
A reader with knowledge of organic chemistry R/S stereochemical nomenclature will find a familiar pattern in these explanations, namely that @ generally involves putting an atom in the back and reading the remaining atoms clockwise, in sequential order of left to right. Thus, if the first atom is the lowest priority atom (often H), and the remaining atoms are listed from highest to lowest-for example, [C@H](Br)(CC)(C)-then @ is "R" Table 2 Stereochemical aspects of Jmol SMILES (H in back; read left-to-right highest to lowest), while @@ is "S". Readers more familiar with standard SMILES explanations of this notation or like the idea that the "at" symbol has an inherent anticlockwise sense to it, may wish to replace "front" with "back" and "clockwise" with "anticlockwise" with no change in meaning.

Table 2 continued
In the first column, absence of a mark indicates same as OpenSMILES; + indicates additions to OpenSMILES

Jmol SMILES directives (Tables 4, 5)
Jmol SMILES input and output can be configured for several different nuanced dialects of SMILES. This is done by prefixing a search with directives marked off with slash marks (Table 4). These directives are not casesensitive. Thus, /noaromatic/ and /NoAromatic/ both mean the same thing. Multiple directives may be placed between slash marks. No separation is required, but some sort of separator is recommended-for example, /noAromatic,noStereo/. Applications may add their own application-specific directives.
The Jmol SMILES directives /open/ and /strict/ relate primarily to the aromaticity model assumed in the SMILES string that is to be processed by the application's SMILES matcher. This is important, because different SMILES generators and parsers have different aromaticity models. These directives allow appropriate interpretation of SMILES using their original models. Examples of differences in these models are shown in Table 5. The first of these, /open/, uses the OpenS-MILES definition of aromaticity, which involves a version of the Hückel 4n + 2 rule that allows for inclusion of ring atoms doubly bonded to acyclic atoms, provided those atoms are not more electronegative than carbon. The /strict/ directive, which is the default model for Jmol 14.6, goes one step further, applying a stricter (organic chemist's) definition of aromaticity, both requiring three-dimensional planarity 1 and also not allowing double bonds to exocyclic atoms. Within this model, 3,6-dimethylidenecyclohexa-1,4-diene and quinone are nonaromatic because they are not cyclic pi systems, cyclobutadiene is nonaromatic because is not 4n + 2, and 1-oxothiophene is nonaromatic because it is nonplanar. Note that /strict/ and /open,Strict/ are equivalent.
The directive /noAromatic/ indicates that no aromaticity checks of any kind should be made. Thus, C1CCC-CCC1 and c1ccccc1 both would match both benzene and cyclohexane. The bond type ":" would be considered simply to be "unspecified. " This directive is useful when it is 1 The algorithm used by Jmol to identify flat aromatic rings involves the following steps: (1) A set of normals is generated as follows: (a) For each ring atom, construct the normal associated with the plane formed by that ring atom and its two nearest ring-atom neighbors. (b) For each ring atom with a connected atom, construct a unit normal associated with the plane formed by its connecting atom and the two nearest ring-atom neighbors. (c) If this is the first normal, assign vMean to it. (d) If this is not the first normal, check vNorm.dot.vMean. If this value is less than zero, scale vNorm by −1.   not desired to make any aromaticity assumptions at all or to specifically test for one Kekulé version and not do any aromaticity tests.
Directives /noStereo/ and /invertStereo/ are very useful because they allow re-use of SMILES strings for different types of stereochemical matches without having to remove or switch the stereochemical designations in the strings themselves, which can be quite complicated. The directive /noStereo/ simply ignores all stereochemistry indicated in the SMILES string, including both stereochemistry at chirality centers as well as cis/trans double-bond stereochemistry. The directive /invertStereo/ inverts all chirality designations, allowing efficient checking for enantiomers. Finally, the directive /noAtom-Class/ instructs the parser to disregard atom classes when creating the molecular graph for matching.

Jmol SMARTS (Tables 6, 7)
The Jmol SMARTS dialect expands significantly on the OpenSMARTS language. Given below is a full description of Jmol SMARTS, not simply a list of additions to that language. All differences to OpenSMARTS are indicated. A discussion of compatibility issues with OpenSMARTS and Daylight SMARTS is given later in this paper.

Jmol SMARTS atom primitives (Table 6)
Jmol SMARTS is closely related to OpenSMARTS, involving 13 additional atom primitives and two modified primitives (Table 6). This table comprises the full set of atom primitives in Jmol SMARTS. Several of these added primitives in Jmol SMARTS were critical in the development of an MMFF94-based minimization that uses SMARTS for atom typing. As in OpenSMARTS, selected upper-or lower-case element symbols as well as *, a, and A do not need square brackets. Jmol SMARTS adds H to this list. Without brackets, CH is simply the same as C[H] and means "a carbon and its attached H, " whereas [CH] means "a carbon with exactly one attached H" (that is, the C only, not the H atom).
Thus, in OpenSMARTS, [D2] matches any atom with two explicit connections. This does not distinguish between hydrogen and non-hydrogen atoms. Jmol SMARTS adds [d2] to mean "exactly two non-hydrogen connections, " and in Jmol the command SELECT search("[C;d2]") selects for aliphatic carbons in the  loaded atoms with exactly two non-hydrogen connected atoms. It should be noted that these atoms will be found regardless of whether the model actually has hydrogen atoms or not. This is an important distinction, because some models used in Jmol have hydrogen atoms (those from NCI/CADD), and some do not (some of those from RCSB). The new primitive [<n>?] selects for atoms with either an atomic mass of n or no indicated atomic mass. Like atom mass itself, this primitive must immediately precede an atom symbol. So, for example, [12?C] matches aromatic 12 C or C with no indicated isotope (a common situation), but not 13 C or 14 C. The ring selectors [r500] and [r600] are particularly useful, as they specify a 5-or 6-membered aromatic ring atom, respectively, which is not something that is supported in OpenSMARTS. (Note that in OpenSMARTS, [c&r5] could be an aromatic carbon in a benzene ring, as long as there is a fused 5-membered ring (as in indene) not specifically a carbon atom in an aromatic 5-membered ring.) This coopting of [r<n>] for large n technically is not compatible with OpenSMARTS, but since it is basically inconceivable that an actual ring of size 500 or 600 would ever be searched for using Jmol SMARTS, it is felt that this is not a practical problem.
Finally, Jmol SMARTS patterns also allow for referencing PDB "residue.atom" notation: [ala.C], [ala.*], and [*.C]. This feature is strictly a lexical match, not a substructure search, and does not allow searching for the residue or atom name "*" itself or for residue names containing a period character. No such residue or atom names exist in the PDB. The residue component may include up to three parts, including residue name, number, and insertion code as "resName#resNum^insCode". The atom component may include PDB atom name and atomic number as "atomName#atomicNum". The atomic number can be used to distinguish calcium, [.CA#20], from alpha-carbon, [.CA#12]. An example of a fully elaborated PDB primitive would be [G#129^A.P#15]. Any of the five references resName, resNum, insCode, atom-Name, or atomicNum, may be omitted or indicated as the wild card "*". Thus, the critical distinguishing characteristic of Jmol SMARTS PDB notation is only the period itself.
Three additional atom primitives allow for atom selection that is application specific. So, for example, [=0] selects for the atom the application assigns index 0 to. In Jmol, [=0] would refer to the first atom in the Jmol atom array, ({0}). The notation ["x"], with quotation marks, selects for atom type "x", however that has been defined in the application. In Jmol, atom types will default to the atom's name, such as "H12", but can be set by a specific file reader or by the user or by an MMFF94 minimization or partial charge calculation.
Jmol SMARTS allows for nested (aka "recursive") searches. This option allows embedding a full SMARTS string as an atom primitive, selecting the first atom only. So, for example, [$(cc[OH])] is "the aromatic carbon atom ortho to an aromatic OH, and in Jmol SELECT on search("[$(HccOH)]") highlights the two ortho hydrogens of a phenol.
The general pattern [$(select …)] allows for a hook into application-specific selection methods. For example, in Jmol SELECT atomno<10 selects all atoms with atom number less than 10. SELECT search("…") selects using a SMARTS pattern, and so SELECT search("[$(select atomno<10)]Br") does the same, but limits the result to atoms connected to bromine. The [$(select…)] notation thus allows both a potentially huge expansion of SMARTS capability as well as potentially bringing into an application's native search language all the rich capability of SMARTS, if they are not already present. Notice that, if implemented in an application, this option may require that whitespace not be unilaterally stripped from a Jmol SMARTS pattern prior to processing.
The last three of the entries in Table 5 allow for a variable number of patterns and for substitution of predefined variables. In Jmol, these variable substitutions are carried out as preprocessing steps, in a purely lexical fashion. They do not in any way improve processing time. The Jmol SMARTS dialect includes all bond primitives of OpenSMILES as well as ~ (any bond) and @ (any aromatic bond). It does not implement the "direction or unspecified" primitives of OpenSMARTS (/? and \?) for two reasons. First, when working with a 3D model, all double bonds are specifically E or Z. Additionally, Jmol SMILES is based on OpenSMILES and thus already requires that / and \ be matched properly. So FC=C/Cl is not a valid Jmol SMILES string, and a search in it for F/?C=C/Cl therefore would not be relevant.
Jmol SMARTS implements all logical operations of OpenSMARTS, both in atom primitives and bonds. These include the standard operations "!" (NOT), "&" (AND), and ", " (OR) as well as the special "low precedence" AND operator ";". The low precedence AND operator makes up for the fact that SMARTS does not implement parentheses in logical operations, allowing, for example, for [S,O;X2] to be parsed as "(aliphatic sulfur or oxygen) with two connections", in contrast to [S,O&X2], which would mean "sulfur or (oxygen and two connections)". Perhaps WITH would be a better description than AND for this low-precedence version of AND. Jmol SMARTS allows for a larger-scope "or" logic using "||". This notation is strictly a lexical convention carried

Additional Jmol SMARTS features (Table 7)
Several non-primitive Jmol SMARTS options extend OpenSMARTS. They are presented in Table 7.
In general, SMARTS matching is used in a binary sense, returning TRUE if there is a match, or FALSE if not. In addition, in some contexts, it is valuable to know which subset of atoms in a model are the atoms that match. But there is another valuable possibility. Once a match is found, it could be especially valuable if some subset of those matched atoms is identified. This adds significant power to a SMARTS search, as it can answer questions such as "What atom is next to atom X in this pattern?" This more nuanced capability in Jmol SMARTS is provided using curly braces, for example, {C}C=O. The overall pattern is first matched, then only those atoms that are within braces are actually identified. Thus, CC=O matches all atoms of an aliphatic carbonyl group and its associated alpha carbons, but {C}C=O returns only the alpha carbons of carbonyl groups, and {C}[CH]=O returns only the alpha carbons of aldehydes. This allows very specific atom selection based on the identity of groupings of atoms. Any number of selection braces can be present in a Jmol SMARTS pattern. Thus, select on search("{c}1c{c}c{c}c1[OH]") in Jmol selects for the ortho-and para-carbons of phenol.
Conformational matching, involving ranges of distance, angle, and torsion measurements (including improper torsions), have also been of interest to Jmol users. Such matching is possible using Jmol SMARTS. This is done using the notation (.d:), (.a:), and (.t:), respectively. A range of values is included after the measurement type.  :-170,-180,170,180)C=C{*} selects for vinylic atoms that are trans-related. In addition, "not this range" can be indicated using "!", so that an equivalent description to the above would be {*}(.t:!-170,170)C=C{*}. Ranges should be selected to have some width appropriate to an application.
The default in terms of specifying which atoms are involved in measurements is simply "the next N atoms in the string, " where N is 1, 2, or 3, respectively. This sequencing is strictly lexical and is entirely irrespective of chains. So, for example, the highlighted atoms are measured in the order shown, from left to right, in each of the following measurements: C(.a:0,120)C(C)C, CC(.a:0,120) (C)C, and CC2(.a:0,120)(C).C2.
For more complicated patterns, one can designate the specific atoms in the measurement using a numeric identifier after the measurement type and then repeat that designation immediately after each of the target atoms. For example, the following will target a range of unusually low bond angles across the carbonyl group in the three-atom backbone of a peptide, CA-C-N:  Distances can be through-space; angles need not be bond angles; torsions can be improper angles. These numbers may be re-used, as for connection numbers. Jmol SMARTS allows the use of any number of predefined variables. These are separated by semicolons and indicated prior to the actual SMARTS pattern (but after any directives). Variables may refer to other variables, as long as the variables referred to are defined previously. So, for example, the following construction is allowed:

Jmol SMARTS directives
Just like Jmol SMILES, Jmol SMARTS matching can be tuned to specific modes of searching in terms of different standards. This is done using the same directives described above for Jmol SMILES. For example, in Jmol, the commands LOAD :cyclobutadiene; SELECT search("/strict/c") loads a 3D structure of cyclobutadiene from PubChem and reports "no atoms selected", because cyclobutadiene is strictly not aromatic.

Jmol SMARTS compatibility issues
Jmol SMARTS does not include the OpenSMARTS unspecified designations /? or \?. In addition, Jmol SMARTS does not implement the unspecified stereochemistry notation @..?, as these have not proven relevant to 3D molecule searching. Jmol SMARTS implements ". " as absolutely "not connected" rather than "might not be connected. " Jmol SMARTS is not an extension of Daylight "reaction SMARTS" [4], although it does allow for matching atom classes, which are generally only relevant in a reaction context, and Jmol as an application can read reaction SMILES, but simply reads ">>" as the not-connected symbol ". ".
Jmol SMARTS implements ring-membership primitives [r<n>] and [R<n>] within the OpenSMARTS framework using a simple ring membership model as "within any ring of size n" and "the number of rings containing the atom", respectively. This involves no concept of smallest set of smallest rings (SSSR). An application implementing Jmol SMARTS is free to limit ring size in ring membership determinations. In Jmol, for performance sake, the maximum ring size that will be checked by default is 8, but that is increased simply by having any check for any ring larger than 8. For example, for indole, which contains a five-membered ring fused to a sixmembered ring, so three rings total, of size 5, 6, and 9, select on search('[R2]') will select the two atoms in the fusion, because the 9-membered ring is not checked. However, while select on search('[R2&r9]') will select all the atoms not involved in the ring fusion, since now three rings will be found, and those central two atoms will be considered to be in three rings, not two.  Jmol application-specific directives ( Table 8) Table 8 lists application-specific directives for Jmol 14.6. Upon SMILES generation, /atomComments/ adds comments indicating which atoms in the structure map to which atoms in the SMILES string, and the /hydrogens/ directive indicates that all hydrogen atoms are to be given explicitly. The /topology/ directive creates a SMILES string that shows * for all atoms and indicates no bond types. It can be used for matching ring and chain patterns without regard to specific atoms or bonds. Three directives are specific to SMARTS matching. The /firstMatchOnly/ directive tells the Jmol SMARTS processor to stop after one successful match. The Jmol application-specific directives /groupByModel/ and /groupByMolecule/ (the Jmol default), govern how component-level grouping is done.
Aromatic models are important for SMILES generation and matching. The directive /aromaticPlanar/, which was the Jmol default through Jmol 14.5, is also available. This directive avoids any Hückel analysis and is based instead solely on three-dimensional ring planarity (see footnote 1), without respect to electron counting. The /aromaticPlanar/ option allows planar sp2-hybridized systems such as quinone and cyclobutadiene to be considered aromatic and allows the finding of aromatic rings in structures that may or may not indicate any multiple bonds, such as the results of many quantum mechanics calculations and structures saved in XYZ and PDB formats. In addition, the directive /aromaticDefined/ indicates that all aromatic atoms in the model to be investigated are explicitly set already, and thus no aromaticity model is necessary. This directive could be used in Jmol when a structure is loaded from a file that includes explicit bond aromaticity, such as SDF query files, where bond type 6 is "aromatic single" and bond type 7 is "aromatic double" [19]. Both /strict/ and /aromaticDouble/ are used in Jmol's MMFF94 [20,21] determination of atom types.

MMFF94 atom typing
One of the first applications of Jmol SMARTS was in Jmol's implementation of the molecular mechanics minimization package MMFF94. For this method, each atom must be assigned a specific atom type, with identifications such as "general 5-ring C (imidazole)" and "alpha aromatic 5-ring C". The MMFF94 program itself uses an elaborate sequence of logical steps to discover each of 82 distinct atom types for each atom in a structure, one at a time. Rather than attempting to implement this complicated algorithm in Java de novo, it was decided to have Jmol instead use SMARTS to do this task, scanning through types rather than atoms and identifying all atoms of a given type at once (and automatically skipping checking for types for elements that are not in the structure. The key is to go through a list of SMARTS checks in a very specific order. A full list of SMARTS tests used by Jmol for MMFF94 atom typing is given at SourceForge [21]. Table 9 shows the sequence of Jmol SMILES checks specifically for sulfur. All sulfur atoms are assigned one of atom types 15, 16, 17, 18, 44, 72, 73, or

Practical examples
Going back to the questions posed in the introduction to this paper, I now provide eight practical examples of Jmol SMILES matching and Jmol SMARTS searching within Jmol that are derived largely from Jmol user community requests for functionality.
1. Do these two structures and/or SMILES strings match? SMILES strings are often used for database look-up using simple string-based algorithms. In order for that to work, the SMILES string of interest must be expressed identically to one stored in the database. Basically, this means that it must be produced by the same algorithm used to produce the database's own SMILES keys. The process of converting a generic SMILES string to a unique form is called "canonicalization. " Since SMILES generator programs at different databases differ, the resultant canonical SMILES strings from different databases can be different as well. For example, for acetaminophen, database look-ups from PubChem and NCI/CADD, as well as drawing the structure using JSME [22] give the distinctly different canonical SMILES shown in Table 10. Canonicalization can be useful; it allows a program to match structures using simple string matching. Interestingly, in the context of 3D structure matching in Jmol, given a single target 3D structure and a SMILES string, a pair of 3D structures, or a SMILES string and a 3D structure, there is no particular need Table 8 Jmol application-specific directives

Given two structures, what is their isomeric relationship?
The directives /nostereo/ and /invertstereo/ can be effectively used to compare two 3D structures, a 3D structure and a reference stereochemical SMILES string, or two stereochemical SMILES strings. The pseudo-code for a full isomeric determination is as follows: In this case what is needed is a 1:1 atom mapping between the two structures followed by an alignment. The problem is that the two structure files likely have completely different atom order, and also there could be several suitable mappings. Jmol uses (relatively standard) Jmol SMILES matching to generate this mapping and then uses a quaternion eigenvalue algorithm [24] for the alignment, checking each possibility and looking for the best-fit RMSD. This guarantees that we end up with the very best fit of all possible mappings. If A and B are two 3D structures loaded into Jmol, then their similarity is found by compare(A,B,"SMILES", "stddev"), where the result is expresses as a standard deviation. The entire calculation is complete in a fraction of a second.

How can I align two 3D models in order to visualize their similarity?
If we remove that last parameter, the return will be the 4 × 4 rotation-translation matrix describing how to best align the atoms of A onto B. We can effect that overlay of atoms for a visual comparison using the rotate selected command, as shown in Fig. 2.
The following script generates a visual comparison of the caffeine structure found at NCI/CADD with the one at PubChem: We can also do this alignment using a substructure. So, for example, if we wanted to align these two models specifically using the five-membered ring, we could use a SMARTS search for Cn1cncc1. Substituting above VAR m = compare (A, B, "SMARTS", "Cn1cncc1"). Finally, in Jmol there is still a simpler way. The combination of SMARTS-or SMILES-based mapping and quaternionbased alignment can be done in one go using the COM-PARE command: Using a SMARTS search, it matches atoms in the two structures, identifies the associated bonds, calculates all relevant dihedral angles in tyrosine, then rotates all of those dihedrals to positions that match their counterparts in lysergamide. Quaternion-base alignment and animated overlay then transports the conformationally modified tyrosine to its best-fit location within the lysergamide molecule (Fig. 4). A bit of coloring highlights the success of the operation by assigning color in tyrosine (model 1.1) based on distance to the nearest atom in lysergamide (model 2. Note that in all these cases we are allowing for some nonideality of structures. Anti may or may not be 180-degree dihedral. We allow 10 degrees plus-or-minus. 8. How can I correlate 2D and 3D chemical structures from different sources? For example, how can I correlate a given 2D or 3D structure with a simulated NMR spectrum? The capability of HTML5 and JavaScript to allow on a single web page a 2D drawing app (JSME), a 3D visualization app (Jmol), and an NMR spectroscopy simulation client (JSpecView [25], a component of Jmol) provides both an opportunity and a challenge. We can, in principle, correlate atoms in the 2D drawing, atoms in the 3D interactive structure, and peaks in the NMR spectrum, thus allowing the user seamless clicking with visual references updating simultaneously in all three apps (Fig. 6) [26]. The challenge is to do the atom-atom mapping necessary to make that work. This is especially challenging because the services that provide the 2D and 3D structures on the page and the 3D structure that is used in the spectral analysis all come from different sources. And to make it even more challenging, an online spectral analysis may return a correlation to a different 3D structure than was sent to it. Though "canonical" on their own, these services are anything but canonical as a suite!
The JSmol solution required two atom correlations-from 2D to 3D, and from 3D to 3D-including H atoms, which are not usually part of a SMILES match. A variation of the Jmol compare() function was developed for this purpose: atommap = compare ({1.1} {2.1} 'MAP' 'H'). Here model 1.1 is the structure on the bottom left in Fig. 6; model 2.1 is the model derived from the 2D JSME drawing app above it. "MAP" indicates we want a correlation, and "H" means we want a SMILES all-atom correlation, which includes hydrogen atoms. The variable atommap is assigned an array of arrays, [[a1, b1], [a2,b2], ….], indicating the exact 1:1 correlation of these two structures in terms of atom indices. The correlation between Jmol and JSpecView in the end was not done using SMILES. Instead, the JSV application matches atoms structures returned by the server by matching individual 3D atom positions. But it would have been possible to use this same compare() function with that comparison as well. Non-canonical SMILES comparison is also being used on this page just to check that the apps are well synchronized and that all models are identical:

Conclusions
In this article I have presented a set of additions to standard SMILES and SMARTS that allow for powerful applications in 3D structure visualization, comparison, and analysis. Jmol SMILES additions are minimal. Jmol SMARTS atom primitive additions widen the scope of SMARTS searching capability, adding features that are applicable to 3D structures and useful in Jmol, such as allowing Jmol to create atom types for MMFF94 calculations efficiently. Additional atom specifications allow for application-specific atom selection based on criteria  :-170,-180,170,180

)CC[CH3]")
Fig. 6 A web application using SMILES to coordinate selection of atoms in 2D and 3D structures, with correlation to simulated 1HNMR spectra not included in any SMARTS specification as well as patterns that are specific to wwPDB-derived models, the ability to specify a variable number of repeating patterns, and the substitution of predefined variables. Nonprimitive Jmol SMARTS options include the allowance for subset selection, conformational matching, overall pattern logic, and predefined variables. The result is a rich language for 3D molecular investigation and comparison that greatly expands the usefulness of SMARTS pattern matching.
Additional extensions to Jmol SMILES and Jmol SMARTS that are specific to biopolymers and also extend SMARTS searching to inorganic and periodic crystal structure and to polyhedra analysis are being implemented in Jmol and will be addressed in future communications.

Supplemental material
Jmol scripts for all example in this article are provided as Additional file 1. All figures in this article are included as PNGJ format files in Additional file 2. These "image + data" files can be drag-dropped or otherwise loaded into Jmol or JSmol to reproduce the 3D model exactly as it appears in the image. Exact scripts used for their creation can be found in Additional file 1.