# Fast folding

The fast convolution is an algorithm for the calculation of the discrete, aperiodic convolution operation with the help of the fast Fourier transform (FFT). The computationally intensive aperiodic convolution operation in the time domain is replaced by a much simpler, functionally equivalent, multiplication in the frequency domain . Fast folding is used in the field of digital signal processing and the like. a. with non-recursive digital filters ( FIR filters ) to reduce the calculation effort.

## From discrete convolution to fast discrete convolution

In the time domain , the discrete convolution is defined as:

${\ displaystyle h (n) = (f * g) (n) = \ sum _ {k} f (k) \ cdot g (nk)}$

If the discrete convolution is discretely Fourier-transformed into the frequency domain , the following term results:

${\ displaystyle H (N) = {\ mathcal {F}} (h (n)) = {\ mathcal {F}} (f (n)) \ cdot {\ mathcal {F}} (g (n)) }$

${\ displaystyle H (N)}$is calculated and then transformed back into the time domain , resulting in the function you are looking for:

${\ displaystyle h (n) = {\ mathcal {F}} ^ {- 1} (H (N)) = (f * g) (n)}$.

## application

The areas of application of fast convolution primarily include filter applications in the area of non-recursive filters (FIR filters). The methods used here are called the overlap save method and overlap add method . Since the fast convolution using FFT and its grouping of the data in blocks is a cyclic convolution , but on the other hand FIR filters work with the aperiodic convolution in the time domain, procedures are necessary to equate the periodic with the aperiodic convolution to the data to "correct" in the overlapping areas between the data blocks. This is where the name overlap in the fast folding process is derived .

## Overlap failure

In the following, the effect of the overlap in fast (cyclic) convolution is discussed, and in which cases the result is identical to the discrete linear convolution . ${\ displaystyle h [n]}$

If the input sequence of length  K is folded with the impulse response of length  G , the linear convolution results in output values . ${\ displaystyle f [n]}$ ${\ displaystyle g [n]}$${\ displaystyle N = K + G-1}$${\ displaystyle h [n]}$

In order to be able to carry out the cyclic convolution via the DFT at all, the input sequence and impulse response must be of the same length. If this is not the case, the shorter vector must be balanced by adding zeros ( zero-padding ).

To simplify the following consideration, we assume that the impulse response is shorter than the input sequence ( ). ${\ displaystyle g [n]}$${\ displaystyle f [n]}$${\ displaystyle G

Since the inverse transformation using IFFT from the spectrum in this case delivers as a convolution product instead of also just K samples (see properties of the DFT), there is an overlap error at the first positions in the output sequence . In addition, it is too short, as the missing values add to the first. The error can be corrected by adding at least zeros to and zeros to (additional zeros at the end do not disturb the DFT algorithm , if the length is a power of two, some FFT implementations work much more efficiently). ${\ displaystyle N = K + G-1}$${\ displaystyle G-1}$${\ displaystyle G-1}$ ${\ displaystyle G-1}$${\ displaystyle f [n]}$${\ displaystyle K-1}$${\ displaystyle g [n]}$

With the help of zero padding , both vectors now have elements; the result of the fast convolution also provides values. As with linear convolution, there is no longer any overlap error. ${\ displaystyle N = K + G-1}$${\ displaystyle h [n]}$${\ displaystyle N = K + G-1}$

The use of fast convolution with the help of the FFT leads to a reduction in the computational effort , so that it no longer depends proportionally on the length (number of values) K of the function for each value (each sample) , but only logarithmically on the selected block size  N. , with the constraint that N must be greater than K , with which the FFT is carried out. ${\ displaystyle g (k)}$

For a set of N values ​​(samples) the following complexities result (measured as the number of basic arithmetic operations such as additions and multiplications):

• discrete folding
${\ displaystyle O (N \ cdot K)}$
• fast discrete folding
${\ displaystyle O (N \ cdot \ mathrm {log} (N))}$if .${\ displaystyle N> K}$

FIR filters implemented in practice can usually be implemented more efficiently from an order of approx. 20 to 40 using fast convolution than in the direct form.

One problem with fast discrete convolution is its latency , which is caused by waiting for a set of values ​​(samples) corresponding to the block size N to calculate the fast discrete convolution:

The block size must at least correspond to the length of the signal in the time domain with which the function is to be convolved, otherwise the result would be shorter than the impulse response of the convolution. In addition, when using the overlap-add or overlap-save method , if the creation of artifacts is to be completely avoided by the method, this minimum block length must also be extended by the distance between the individual packets in which the folding is to be carried out - which is why large block lengths tend to deliver results with greater efficiency. Furthermore, depending on the specific FFT implementation, the condition may still exist that the block size N must correspond to an integer power of 2, 4 or 8 - which together can lead to a very large block size N in the end .

Another disadvantage is the quantization noise over the entire convolution operation , which is more difficult to calculate . This tends to be higher with fast convolution than with discrete convolution. The problem of quantization noise has to be taken into account especially with signal processors with fixed point arithmetic .

## literature

• Karl-Dirk Kammeyer: Digital signal processing . 6th edition. Teubner, 2006, ISBN 3-8351-0072-6 , p. 248-260 .
• Steven W. Smith, Ph.D .: The Scientist and Engineer's Guide to Digital Signal Processing . 1st edition. Elsevier Ltd, Oxford 2002, ISBN 978-0-7506-7444-7 , Chapter 18 (English, dspguide.com ).