Complexity (computer science)
The term complexity is used in various sub-areas in computer science . The complexity theory deals with the resource consumption of algorithms , whereas the information theory uses the term for the information content of data .
Complexity of Algorithms
Under the complexity (and cost or cost ) of an algorithm is understood in the complexity theory its resource requirements. This is often given as a function of the length of the input and estimated asymptotically for large ones using Landau symbols . Similarly, the complexity of a problem is defined by the resource consumption of an optimal algorithm for solving this problem. The difficulty is that one has to look at all the algorithms for a problem to determine the complexity of the problem.
The resources considered are almost always the number of computation steps required ( time complexity ) or the memory requirement ( space complexity ). The complexity can, however, also be determined in relation to another resource. It is not the effort of a specific program on a specific computer that is of interest , but rather how the resource requirements grow when more data has to be processed, e.g. B. whether the effort for twice the amount of data doubles or squares ( scalability ).
It is often difficult to provide a function that generally expresses the cost of resources associated with any input for a problem. Therefore, one is usually content with leaving out constant factors and summands and only dealing with the behavior depending on the input length . The input length is measured in the number of bits that are required to represent the information.
Another simplification is the assumption that all elementary operations take the same time, so that e.g. B. the addition of two numbers and the multiplication of two numbers costs exactly one time unit.
However, it is usually too difficult to specify a function with the above simplifications. Therefore one often restricts oneself to giving an upper and lower bound for the asymptotic behavior. To this end, Big O notation used.
In complexity theory, algorithms and problems are divided into so-called complexity classes according to their complexity . These are an important tool for determining which problems are "equally difficult" or which algorithms are "equally powerful". The question of whether two complexity classes are equivalent is often not easy to decide (for example P-NP problem ).
The complexity of a problem, for example, is decisive for cryptography and especially for asymmetric encryption : The RSA method , for example, relies on the assumption that the prime factorization of large numbers can only be calculated with a great deal of effort - otherwise it would be impossible easily calculate the private key from the public key .
A problem that even a computer the size of earth cannot solve is called a transcomputational problem .
Complexity of data
In information theory , the complexity of data or a message is understood to be its information content . In addition to the classic definition of this quantity according to Claude Shannon, there are various other approaches to determine how much information is contained in a data set:
On the one hand, there is the so-called Kolmogorow complexity (also algorithmic complexity or descriptive complexity ), which defines the information content as the size of the smallest program that is able to generate the viewed data. It describes an absolutely optimal compression of the data. A clarification of the approach Andrei Kolmogorov respect to the machine model offers the algorithmic information theory of Gregory Chaitin .
There are a number of software metrics that measure the complexity of data and algorithms in computer science in different ways. These are:
- Chapin's data metric
- Measures the proportion of condition and result data in all data used.
- Elshof's data flow metric
- Measures the number of times data has been used relative to the number of data. It is related to high cohesion . High cohesion corresponds to the frequent use of as few variables as possible.
- Cards data access metric
- Measures the ratio of the number of accesses to external files and databases relative to the number of them.
- Henry's Interface Metric
- Measures the number of accesses by external functions / methods to a module (English fan-in ) or the number of calls to external functions / methods from a module (English fan-out ) relative to all functions / methods of the module.
- McCabe metric or Euler's measure or cyclomatic complexity
- Measures the complexity of the flow graph as the ratio of the number of edges to the number of nodes.
- McClure's Decision Metric
- Measures the share of decisions in all instructions.
- Sneeds branching metric
- Measures the ratio of the number of branches of any kind to the sum of all instructions.
- Halstead metric
- Measures the number of different words (here instruction types) relative to the number of words used (here instructions). She claims that the fewer different types of statements you use, the simpler the code, which is very controversial.
- Counts the acyclic paths through functions
- Cognitive Complexity
- Recommended by SonarQube for measuring the complexity of programs and program parts against other complexity metrics.
- Brian A. Nejmeh: nPath: a measure of execution path complexity and its applications . In: Communications of the ACM . tape 31 , no. February 2 , 1988, doi : 10.1145 / 42372.42379 .
- G. Ann Campbell: Cognitive Complexity . A new way of measuring understandability. Ed .: SonarSource. 2016 (English, 20 p., Sonarsource.com [PDF; 214 kB ; accessed on March 10, 2020]).