Simplified Molecular Input Line Entry Specification

from Wikipedia, the free encyclopedia

Simplified Molecular Input Line Entry Specification ( SMILES ) is a chemical structure code in which the structure of any molecule is reproduced in a greatly simplified form as an ( ASCII ) character string . Several molecule editors can import SMILES strings and thus generate two-dimensional and three-dimensional models.

The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. Daylight Chemical Information Systems Inc. , in particular , continued to develop and modify the specification in the following years. Finally, in 2007, an open standard called OpenSMILES was developed by Blue Obelisk , a chemically oriented open source community.

Since the SMILES language is controlled by the Daylight company and has some problems with stereochemistry and tautomerism , IUPAC has developed its own linear molecular representation, InChI , which is freely available.

Examples

SMILES notation Group formula Surname
C CH 4 methane
CC CH 3 -CH 3 Ethane
CCC CH 3 -CH 2 -CH 3 propane
Clc(c(Cl)c(Cl)c1C(=O)O)c(Cl)c1Cl C 7 HCl 5 O 2 Pentachlorobenzoic acid Pentachlorobenzoic acid.svg

Conventions

Atoms

A chemical element is represented by its element symbol which is enclosed in square brackets (e.g. [Au]for gold ). The isotope of the element can be specified by putting the mass number in front of the element symbol (e.g. [2H]for deuterium or [235U]for fissile uranium ); without this information, the natural isotope mixture is assumed.

Ions , i.e. electrically charged atoms, are described in the SMILES notation by specifying the charge in square brackets (e.g. [Cl-]for the chloride ion or [Cu+2]for the copper (II) ion).

Hydrogen bound directly to the atom can also be specified in brackets; for this purpose, a is indicated after the element symbol H, followed by the number of bound hydrogen atoms (the number is not mandatory for a single hydrogen atom). Simple molecules such as hydrogen chloride ( [ClH]) or methane ( [CH4]) can be described in this way.

To simplify the notation, the square brackets can be omitted from elements of the so-called "organic subset". If the brackets are omitted, the free valences of the atom are filled with hydrogen atoms to the lowest standard valence according to the table shown. So for example, ranges for the entry of water one  O, and for methane one  C.

element Standard valence (s)
B. 3
C. 4th
N 3, 5
O 2
P 3, 5
S. 2, 4, 6
F, Cl, Br, I. 1

Ties

To indicate that two atoms are linked by a chemical bond , one of the following symbols is placed between the atoms.

binding symbol optional
Single bond Yes
Double bond = No
Triple bond # No
Quadruple bonds * $ No
Aromatic bonds : Yes

* Only OpenSMILES
bonds in aromatic systems can be symbolized by a colon instead of alternating double and single bonds.

In order to simplify the notation even further, the symbols for single bonds and aromatic bonds can be omitted.

Branches

Atoms with three or more bonds are the starting point for branches. After the corresponding atom, the side chain is first put in round brackets before the other bonds follow. The levels in brackets and thus the branches can be nested as deeply as desired.

Examples:

Structural formula SMILES string Surname
Acetic acid - Acetic acid.svg CC(=O)O acetic acid
Tert-Butyl Alcohol.png CC(C)(C)O tert-butanol
Nitroglycerin.svg C(C(CO[N+](=O)[O-])O[N+](=O)[O-])O[N+](=O)[O-] Glycerol trinitrate

Separate structures

For structures that are not related, such as B. Ionic bonds, a dot (.) Is placed between the separated molecules.
Example: Sodium hydrogen carbonate (Na + HCO 3 - ) =[Na+].O=C([O-])O

Cyclic structures

One of the biggest problems with such a language is to represent cyclic structures. With SMILES you write an index after an atom that is to be connected to another atom that is further back; you do the same with the other atom and the two are connected. In aromatic rings, the ring-forming atoms are written in lower case.

Examples:

Structural formula SMILES string Surname
Structural formula benzene c1ccccc1 benzene
Structural formula trinitrotoluene Cc1c([N+]([O-])=O)cc([N+]([O-])=O)cc1[N+]([O-])=O Trinitrotoluene
Structural formula naphthalene C1=CC=C2C=CC=CC2=C1 naphthalene

Reactions

Reactions are shown in SMILES using 2 closing angle brackets (>>).
Example: Na + HCO 3 - + HCl → Na + Cl - + H 2 CO 3 = [Na+].O=C([O-])O.HCl>>[Na+].[Cl-].O=C(O)O

If another substance flows into a reaction, it is written between the pointed brackets.
Example: Na + HCO 3 - + HCl → Na + Cl - + H 2 CO 3 = [Na+].O=C([O-])O>HCl>[Na+].[Cl-].O=C(O)O

extension

SMARTS is an extension of SMILES that enables the search for molecular substructures. For this purpose, SMILES was modified to indicate wildcards or specific bonds (e.g. aromatic). It applies that any valid SMILES expression can also be used as SMARTS. This rule does not apply the other way around. SMARTS are mainly used for search applications in chemical databases.

See also

Web links

Individual evidence

  1. SMILES specification, section 3.1
  2. SMILES specification, section 3.2
  3. SMILES specification, section 3.3
  4. SMILES specification, section 3.7
  5. SMILES specification, section 3.4