# Simulated annealing

Simulated Annealing ( simulated cooling / annealing ) is a heuristic approximation method . It is used to find an approximate solution to optimization problems which, due to their high complexity, exclude the complete testing of all possibilities and mathematical optimization methods .

The Metropolis algorithm is the basis for the simulated cooling process . The basic idea is to simulate a cooling process, for example during annealing in metallurgy . After a metal has been heated, the slow cooling ensures that the atoms have sufficient time to arrange themselves and form stable crystals. This achieves a low-energy state close to the optimum. Transferred to the optimization process, the temperature corresponds to a probability with which an intermediate result of the optimization may also deteriorate. In contrast to a local search algorithm, the method can leave a local optimum again. Less favorable interim solutions are accepted because this offers the chance to find a better local optimum.

The algorithm is used, for example, for floor planning in the course of a chip design or for location and route planning.

There are also quantum versions of annealing (with tunneling between the minima) introduced in the 1990s. They were implemented in hardware in adiabatic quantum computers from D-Wave Systems .

## motivation

The simulated annealing algorithm is motivated by physical considerations. We are looking for an energetically most favorable state of a system, which can be described with the help of Boltzmann statistics . According to Boltzmann statistics, the probability of encountering a microstate with energy is given by the probability distribution ${\ displaystyle \ geq E_ {j}}$

${\ displaystyle p (E_ {j}) \ propto \ exp \ left (- {\ frac {E_ {j}} {k _ {\ mathrm {B}} T}} \ right),}$

where is the Boltzmann constant and the temperature. The energy of the energetically most favorable state is . The above proportionality remains when multiplied by an independent factor: ${\ displaystyle k _ {\ mathrm {B}}}$${\ displaystyle T}$${\ displaystyle E_ {0}}$${\ displaystyle E_ {j}}$

${\ displaystyle p (E_ {j}) \ propto \ exp \ left (- {\ frac {(E_ {j} -E_ {0})} {k _ {\ mathrm {B}} T}} \ right)}$

Since it is the energetically most favorable state, applies . Furthermore is and . Thus, the exponent is negative, and its magnitude increases with decreasing temperature, which decreases the probability of finding an excited energy state with at least . If one lowers the temperature of the system slowly, the most energetically favorable state is found with an increasing probability. ${\ displaystyle E_ {0}}$${\ displaystyle E_ {j} -E_ {0} \ geq 0}$${\ displaystyle k _ {\ mathrm {B}}> 0}$${\ displaystyle T> 0}$${\ displaystyle E_ {j}}$

## Problem

Given is the solution space , a fitness function that assigns a value to each solution , and a termination criterion. ${\ displaystyle D}$ ${\ displaystyle f \ colon D \ rightarrow \ mathbb {R}}$${\ displaystyle D}$

We are looking for an approximate solution of the global minimum of over , i.e. one with the smallest possible value (or also the largest possible, which can simply be reduced to the previous case by negating this). ${\ displaystyle f}$${\ displaystyle D}$${\ displaystyle x \ in D}$${\ displaystyle f (x)}$${\ displaystyle f}$

In addition, an environment concept (see power set ) is required to generate a neighboring solution for a given one . ${\ displaystyle U \ colon D \ rightarrow {\ mathcal {P}} (D)}$${\ displaystyle x \ in D}$${\ displaystyle y \ in U (x)}$

## algorithm

1. Initialization:
• choose a starting solution ${\ displaystyle x \ in D}$
• put ${\ displaystyle x _ {\ mathrm {approx}} = x}$
• choose a sequence of positive temperature values ​​that decreases monotonically towards zero${\ displaystyle (T_ {t}) _ {t \ in \ mathbb {N}}}$
• Set ${\ displaystyle t = 0}$
2. local change:
• randomly pick a neighbor to${\ displaystyle x}$${\ displaystyle y \ in U (x)}$
3. Selection:
• if , put${\ displaystyle f (y) \ leq f (x)}$${\ displaystyle x = y}$
• otherwise only bet with probability .${\ displaystyle x = y}$${\ displaystyle \ exp \ left (- {\ frac {f \ left (y \ right) -f \ left (x \ right)} {T_ {t}}} \ right)}$
4. Update best solution so far:
• if , put${\ displaystyle f (x) ${\ displaystyle x _ {\ text {approx}} = x}$
5. Increment:
• put ${\ displaystyle t = t + 1}$
6. Abort or continue:
• if the termination condition is not met, go to step 2.

### Explanations

The greater the deterioration , the smaller the probability that it will be replaced by a worse one . Because it is a monotonically decreasing sequence, the probability also decreases more and more during a program run. The process behaves more and more like a mountaineering algorithm with decreasing amount . ${\ displaystyle \ exp \ left (- {\ frac {f (y) -f (x)} {T_ {t}}} \ right)}$${\ displaystyle x}$${\ displaystyle y}$${\ displaystyle f (y) -f (x)}$${\ displaystyle T_ {t}}$${\ displaystyle T_ {t}}$

How a neighbor should be chosen depends on the problem at hand. In computer science, is often the value range and is called bit - vector considered. A neighbor of can then e.g. B. can be generated by flipping (inverting) one or a few bits (see Wegener 2005). ${\ displaystyle y \ in U (x)}$${\ displaystyle D = \ {0.1 \} ^ {n}}$${\ displaystyle x = (x_ {1}, x_ {2}, \ dotsc, x_ {n})}$${\ displaystyle y}$${\ displaystyle x}$

Various termination conditions are conceivable. For example, only a maximum number of runs is allowed, sufficient fitness is defined, a lower limit for cooling is defined, or a number of points in time are defined over which no further changes have been made. ${\ displaystyle t}$${\ displaystyle x _ {\ mathrm {approx}}}$

### Graphic clarification

Graphic representation of a landscape in which a global minimum is to be found.

The idea of ​​simulated cooling can be illustrated graphically.

Suppose you are looking for the (global) lowest point in a two-dimensional landscape. The landscape itself consists of many dents of different depths. The simple search strategy (search for the next lowest point) corresponds to the behavior of a ball that is exposed in this landscape. It rolls to the nearest local minimum and stays there. During the simulated cooling, the ball is repeatedly given an impact that becomes weaker as the "cooling" increases. This is ideally strong enough to remove the ball from a shallow dent (local minimum), but not enough to escape from the global minimum.

Simulated annealing in the search for a maximum. The numerous local maxima are quickly abandoned again by the strong noise movement of the temperature simulation when the "temperature" is still high. The global maximum is reliably found, however, since the falling “temperature” value is no longer sufficient to leave it at the end. This gives better results than a simple mountaineering algorithm .

## literature

• Ingo Wegener : Simulated Annealing Beats Metropolis in Combinatorial Optimization . In: Lecture Notes in Computer Science . tape 3580 . Springer, Berlin / Heidelberg 2005, ISBN 978-3-540-27580-0 , pp. 589–601 , doi : 10.1007 / 11523468 (For a problem that is easy to describe, it is shown that, regardless of the temperature, the simulated cooling is more efficient than the Metropolis algorithm .).