# Chemoinformatics

Chemo computer science, cheminformatics or chemical computer science (English: Chemoinformatics, Cheminformatics, Chemical Informatics or Chemiinformatics ) designates a branch of science to the field of chemistry using methods of computer science connects with the aim to develop methods for the calculation of molecular properties and use. The forefathers include Paul deMain (1924–1999), Johann Gasteiger , Jure Zupan (* 1943) and Ivar Ugi .

The term "chemo computer science" is relatively young, while older Termini Computational Chemistry (derived from English: Computational Chemistry ) and chemical graph theory the same area call ( Ref : Bonchev / Rouvray, 1990). Nowadays, computational chemistry is seen more as a sub-area of theoretical chemistry and quantum chemistry .

## Basics

Chemoinformatics deals with calculations on digital representations of molecular structures . Molecular structures can be understood as graphs . . Than their representation already known for many applications binding table (English: connection table ) is sufficient in the nature of the links ( bonds ) between the individual atoms of a molecule is stored. The inclusion of two-dimensional (2-D) or three-dimensional (3-D) coordinates may only be necessary for further considerations . The latter are particularly required when, for example in the field of medical chemistry , interactions with biomolecules such as proteins are to be investigated.

The size of the theoretical chemical space of all pharmacologically active organic molecules is estimated to be about 10 60 molecules. For this estimation only molecules with the elements carbon, oxygen, nitrogen and sulfur and a molar mass of less than 500 g / mol were assumed ( Lit .: Bohacek, 1999). The space of all conceivable organic compounds is significantly larger, namely infinitely large. So that both theoretical chemical facilities far exceed greater than the amount of the previously real synthesized molecules ( ref : Lahana, 1999). With the help of computer-based methods, however, many millions of molecules can already be analyzed theoretically ( in silico ) without first having to synthesize them for measurements in the laboratory.

### Representation of chemical structures

The representation of chemical structures is one of the fundamental questions. For the majority of applications, the representation as a connection table based on the valence structure theory has established itself . An example of a binding table is acesulfame in the standard Molfile format from MDL. Lines 5–14 contain the x , y and z coordinates and element identifiers of the atoms, lines 15–24 contain the bond table with the starting and ending atoms of each bond and the bond type. The zero columns contain other possible identifiers.

 Acesulfame
-ISIS-  05070815372D

10 10  0  0  0  0  0  0  0  0999 V2000
3.2283   -1.4806    0.0000 S   0  0  3  0  0  0  0  0  0  0  0  0
2.5154   -1.8944    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
3.2283   -0.6538    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
4.0544   -1.4806    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
3.6448   -2.1935    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
1.7990   -1.4806    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
2.5154   -0.2406    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.7990   -0.6538    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1.0826   -1.8944    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
2.5154    0.5855    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
1  2  1  0  0  0  0
1  3  1  0  0  0  0
1  4  2  0  0  0  0
1  5  2  0  0  0  0
2  6  1  0  0  0  0
3  7  1  0  0  0  0
6  8  1  0  0  0  0
6  9  2  0  0  0  0
7 10  1  0  0  0  0
7  8  2  0  0  0  0
M  END


In addition to the binding table, 3-D coordinates for actually existing molecules can be determined using X-ray structure analysis. Where this is not possible, or is a molecule physically non-existent, 3-D coordinates can be at least approximately also directly from the binding table by iterative energy - minimization calculations for different conformations are generated a molecule. 2D coordinates are usually only used to illustrate a molecule and must therefore mainly meet aesthetic requirements. They are also calculated directly from the bond table according to generally recognized chemical symbol rules, but only in the rarest of cases do they reflect the actual spatial conditions in a molecule.

## Methods

Procedures that do not require empirical parameters are known as ab initio methods. Semiempirical procedures contain empirical quantities and other semiempirical parameters that have been determined by theoretical procedures, but no longer have any relation to measurable quantities. In principle, ab initio processes are suitable for smaller molecules. Semiempirical processes show their strengths with medium-sized (100 atoms) molecules. Examples of semi-empirical methods are MNDO and AM1.

### Ab initio methods

The quality with which ab initio methods can calculate the properties of molecules depends on the basic set of atoms, that is, how well and with how many individual functions the atomic orbitals are represented and to what extent the electron correlation is taken into account. Ab initio methods, which also take the electron correlation into account, are significantly more complex, but deliver the best results. One usually makes do with a compromise and approximates the electron correlation. Examples of such methods are: Møller-Plesset perturbation theory , CI ( Configuration-Interaction ), CC ( Coupled Cluster ), MCSCF (Multi-Configuration-Self-Consistent-Field). Most ab initio procedures are based on the Hartree-Fock method. One advantage of the ab initio method is that they can be systematically improved, since the accuracy of the results can be systematically improved by increasing the basic set and increasing the degree of consideration of the electron correlation (e.g. CISD, CISDT, ...).

### Density functional methods

The density functional theory (DFT) is a method for determining the ground state of a many-electron system based on the three-dimensional position-dependent electron density based. It is therefore not necessary to solve the Schrödinger equation for the multi-dimensional multi-electron system, which greatly reduces the amount of computing power required and enables calculations on larger systems. The basis of the density functional theory is the Hohenberg-Kohn theorem . However, the exact functional that links the density of ground states with the system's own energy is unknown. In practice, therefore, the choice of a suitable approximated functional is crucial for accuracy. The systematic improvement is less pronounced than with ab initio methods.

### Semi-empirical procedures

In semi-empirical methods, a large part of the integrals of the Hartree-Fock formalism is neglected, others are approximated by spectroscopic values, parameters or parameterized functions. The reason for this approximation was the low computing capacity of earlier times. In order to be able to apply the theoretical knowledge to chemical problems, the existing formalism had to be simplified.

The Hückel approximation is the simplest semiempirical approach, since it does not calculate any integrals. However, it is also only applicable to -electronic systems. The theory was later extended to systems (Extended Hückel Theory, EHT). ${\ displaystyle \ pi}$${\ displaystyle \ sigma}$

Established methods, which are still frequently used today, belong to the class of the NDDO approximation (Neglect of Diatomic Differential Overlap): MNDO (Modified Neglect of Differential Overlap), AM1 (Austin Model 1), PM3 (Parametrised Method 3). For critical calculations, semi-empirical methods have been combined with CI and MCSCF. With such methods, for example, reaction barriers and entire energy profiles of complex reactions can be calculated or even excited states (MNDO / CI, MNDO / MCSCF).

The limits of semi-empirical methods lie in their parameterization: Actually, the finished method can only be used to calculate systems that were present in a similar way in the parameterization data set.

### Molecular mechanical processes

Force field programs use a classical mechanical approach: bonds between two atoms A and B are simply approximated as a spring and, in the simplest case, described with a harmonic potential ( Hooke's law ):

${\ displaystyle E_ {AB} = k_ {AB} (r_ {AB} ^ {0} -r) ^ {2}}$

Since a double bond between two carbon atoms has a different strength and equilibrium length than a single bond, different sets of parameters are required ( force constant and rest position ). For this reason, simple elements are no longer used to identify atoms, but types of atoms. Similar approaches exist for bond and torsion angles. Electrostatic ( Coulomb ) and Van-der-Waals interactions are called non-binding interactions. Force field methods have to be parameterized on empirical or quantum mechanically calculated data so that a force field is characterized by two things, its energy function and the parameter set. ${\ displaystyle k_ {AB}}$${\ displaystyle r_ {AB} ^ {0}}$

Force fields enable the geometry of very large (bio) molecules (for example: proteins ) to be optimized and are mainly used for molecular dynamics or Monte Carlo simulations .

## Applications

There are several important topics within the area - a selection:

• The computer-aided representation of molecules and the quantum mechanical calculation of their properties.
• Applications that can store and find chemicals in a structured manner (databases)
• Methods to understand the systematics in the interaction between molecular structure and properties of substances (QSPR).
• Force field calculations for the geometry optimization of large molecules
• Molecular dynamics for calculating the binding thermodynamics of the enzymes
• Computer-aided synthesis planning
• Computerized prediction of drug effectiveness

Selected application examples are presented in more detail below.

### Quantitative structure-activity relationship

With the help of suitable algorithms , codes for molecules are developed. By inducing new can hypotheses are created through molecular properties, such as bioavailability or the ability of a substance, the function of a particular protein in the body to inhibit or enhance (see also: QSAR ).

With suitable chemical and biological hypotheses , this chemical space can be reduced to a few candidates, which are then synthesized in the laboratory and clinically tested. For this reason, cheminformatics in the field of pharmaceutical chemistry and medicinal chemistry plays a major role in optimizing lead structures .

### thermodynamics

In technical chemistry , group contribution methods are used to estimate material properties such as normal boiling points , critical data , surface tensions and more.

### Molecular modeling

The Molecular Modeling employed, for example, with the creation of models of unknown macromolecules from the template (template) Similarly, known molecules (homology modeling), the interaction between small and large molecules (receptor docking), whereby QSAR is possible, molecular dynamics and the development of energy-minimized 3-D structures of molecules ( mountaineering algorithm , simulated cooling , molecular mechanics , etc.). The point is to develop models of unknown structures based on known structures in order to enable a QSAR.

## Related areas

There is a strong connection to analytical chemistry and chemometrics . The structure-property relationships (for example: spectrum correlation) play a central role. Due to a comparable working method, there is a close relationship with computer physics , which means that a clear separation is often not clearly given.

## Software packages

Computational chemistry programs are based on various quantum chemical methods for solving the molecular Schrödinger equation . Basically, two approaches can be distinguished: semi-empirical procedures and ab initio procedures.

All of the procedures and methods described are available in common software packages. Examples include: ACES, GAUSSIAN , GAMESS , MOLPRO, Spartan, TURBOMOLE, Cerius2 and Jaguar. ArgusLab is suitable as a freely available program for entry into computational chemistry.

The challenge for the user of this software is to find the most suitable model for his problem and to interpret the results in the range of validity of the models.