Cut detection

When cut detection is known in the art multimedia technology of computer science the automatic detection of sections in a digital video .

Intended use

Cut detection is a useful aid when post-processing film material on the computer, because it saves the user the time-consuming search for cuts by hand. The cut recognition is also one of the cornerstones of the automatic archiving of video material. The aim is to automatically create indexes for large video archives; The crop detection can help here both with the classification of a video and with the selection of preview images.

Hard and soft cuts

A "hard cut".

With the "soft cut" fade , the scenes merge with one another with a transparency effect.

In the cut detection, a distinction is made between hard cuts (English hard cut , called “ cut ” in film art ) and soft cuts ( soft cut , called “ aperture ” in film art ). With a hard cut, one film scene suddenly and seamlessly merges into another. In contrast, with a soft cut, one scene gradually merges into another.

While excellent results are achieved with hard cuts with modern cut detection algorithms, soft cuts are still a challenge. The jerky change of the entire image content with a hard cut is already possible with very simple methods of image processing - see e.g. B. "Histogram differences" under procedure - easily recognizable. The gradual change in the image content with a soft cut is, however, quite often misinterpreted by previous algorithms as a movement of the filmed objects and the cut is therefore not recognized.

Procedure

Processes for section recognition work according to a two-stage principle:

Rating . All images in the digital video are compared with the image immediately following. Each pair of images is assigned a value that should be as high as possible when there is presumably a cut and as low as possible when there is presumably no cut.
Filtering . Subsequently, all pairs of images to be filtered with a threshold value (also called "threshold" or "threshold", engl. Threshold ). In doing so, all pairs of images are sorted out whose value is below the threshold. There is probably a cut between the two images of the remaining pairs of images.

Cut detection. (1) Detected hard cut. (2) Unrecognized soft cut (dissolve). (3) Soft cut (trick fade) that is misinterpreted as two hard cuts.

This practice is prone to failure. Since even slight exceeding of the threshold is interpreted as a cut, the threshold must be chosen very carefully. As a rule, their value is determined using statistical methods from a large number of test runs.

A method for section recognition thus consists of two parts that can be optimized independently of one another. The evaluation should be optimized so that it spreads the values as wide as possible, i.e. the difference between the values for cut and non-cut is as large as possible. The filtering can be made more tolerant, so that soft cuts are not misinterpreted as several cuts in a row.

Evaluation process

Optimizing the rating is not an easy task. To date, numerous algorithms have been developed that provide more or less reliable results.

The sum of the absolute differences (SAD) is probably the most obvious approach to determining the difference between two images: The color values of the images are subtracted from one another pixel by pixel and added up in terms of amount. The result is the SAD, a positive number that indicates how much the pixels of the images differ from one another overall. The SAD reacts very sensitively to even small changes in the image content and therefore often suspects cuts where there are actually none; Fast tracking shots, explosions or switching on a light in a previously dark scene are particularly often misinterpreted. On the other hand, the SAD does not react at all to most soft cuts, since the changes are progressing too slowly and not increasing the value strongly enough. The fact that the procedure is still used frequently is due to the fact that it detects all visible hard cuts with absolute certainty and is also very fast.

The histogram difference (HD) is a small change in the sum of the absolute differences. Instead of comparing the images point by point, the histograms of the two images are compared with one another instead . For each color in an image, a histogram contains the number of pixels that have that color. The histogram difference does not directly examine how much the image contents differ from one another, but how much the colors of the two images differ. This can become a drawback, because it is quite possible that two completely different images have identical histograms - think, for example, of an image with sea and beach and one with grain field and sky. There is therefore no guarantee that hard cuts will be detected with certainty. On the other hand, the histogram difference is less prone to minor changes in the image content, such as movement and tracking shots.

The Edge Change Ratio (ECR) tries to compare the actual image content of two images. For this purpose, the outlines of all objects in the two images are first searched and so-called edge images are generated. The two edge images are then compared with one another and the portion of the edges that disappears from the first image and the portion that is added in the second image are determined; the aim is to determine how much the depicted objects differ in the two images. The Edge Change Ratio is one of the most reliable indicators of the occurrence of a cut. It is sensitive to hard cuts and is able to identify some forms of soft cuts with great certainty. Nevertheless, the Edge Change Ratio also reaches its limits when it comes to the detection of trick screens - e.g. B. black bars that "wipe away" the picture - goes.

Another possibility is offered by combining different methods.

Filtering process

The simple threshold value filtering can be expanded in order to combine several closely spaced excesses of the threshold value into a single excess. To do this, you choose a minimum distance that two exceedances must have from each other in order to be interpreted as two individual cuts and within such a frame area always select only one exceedance - usually the one with the highest value.

Quality measures

There are three measures that are used to judge the quality of pattern detection methods. If C denotes the number of correctly recognized cuts, M the number of not recognized cuts and F the number of incorrectly recognized cuts, the following formulas result for the quality measures:

Precision . The probability that a detected cut is actually a cut.

${\ displaystyle P = {C \ over C + F}}$

Recall . The likelihood that a real cut will be detected.

${\ displaystyle V = {C \ over C + M}}$

F1 . A combination of the other two quality measures that only yields high values if both Precision and Recall have high values.

${\ displaystyle F1 = {2 \ times P \ times V \ over P + V}}$

As true mathematical measures, the quality measures only take values between 0 and 1; the following applies to all three: the higher the value, the better the procedure.

literature

R. Steinmetz: Multimedia technology . Springer, Berlin, July 2000. ISBN 3-540-67332-6 .

This version was added to the list of articles worth reading on October 22, 2005 .