A classification requires either abstraction or the formation of a multilayered structure: a complex (see complexity ). - In semiotics , these two methods are called “class-building” and “complex-building superation ”.
Classification occurs in all areas of thought ; however, in philosophy , psychology , ethnology and other anthropological sciences the term “ categorization ” is used instead . This designates the elementary ability to intuitively sort different entities (objects, living beings, processes, abstracts) and to subordinate them to corresponding collective terms (categories).
“Classification”, on the other hand, stands for the deliberately planned order of knowledge in the context of a concrete consideration according to objectifiable, uniform criteria (often in mathematics , natural science and technology ).
In a classification, it may by errors in the approach and / or characteristics of the objects to be classified to wrong decisions, so-called. Faulty or false classifications come. In order to indicate how sure you are about an assignment, it is therefore advisable to include information about its reliability with every decision.
Definition of terms
This section provides a cross-article overview of the most important terms associated with classification.
Classification terms are often used inaccurately or even incorrectly, although most of them have a clearly defined meaning. Linguistic confusion is exacerbated by the fact that some concepts have multiple names:
- Classification: the process of creating class boundaries
- Class or Category: A class is a group of things that meet a number of conditions. A class is generally used to summarize things that are identical or similar in terms of their characteristics .
- Class boundaries, decision boundaries: In order to decide which class an object belongs to, class boundaries - sometimes also called decision boundaries - are drawn between the classes. An object belongs to a class if it is within its class limits.
- Classification, class system, systematics: The entirety of all classes forms a classification , also called a class system or systematics. Frequently used, special classifications often have their own names: thesaurus , ontology , index , taxonomy , typology . The classification is the end product of a classification; mostly, however, no distinction is made and classification and classification are used synonymously.
- Classification: While the class boundaries are first created during the classification, the classification assigns objects to an existing class system. The distinction between classification and classification is more theoretical; Colloquial German and other languages combine both approaches under the term classification.
- Categorization: Classification and categorization are basically the same, but “classification” encompasses mathematics and technology, and “categorization” encompasses psychology and meaning. Categorization can also include defining the classes.
- Classifier, classifier: The classifier is the name of the entity that carries out a classification or classification.
- Classification procedure: The classification procedure determines the procedure of the classifier. Often no distinction is made between classifier and classification procedure.
- Assessment of a classifier : The quality of the classification by a classifier or a classification method can be assessed using statistical means.
Classification is a fundamental and universal process on which countless more complex processes are built. Even the simplest organisms can divide outside world stimuli into classes such as “dangerous” and “harmless” or “edible” and “inedible” and differentiate between the important and the unimportant . In living beings with a nervous system, an initial classification is made by the neuron, which " decides " whether a stimulus is subliminal and ignored, or whether it is subliminal and is further processed.
People classify sounds they hear into words and shapes they see into letters and symbols; Classification is the basis of any understanding. The ability to classify is a prerequisite for concept formation and thus ultimately for intelligence. The article Categorization (Cognitive Science) goes into more detail on this complex of meanings of classification.
Automatic classification is used in many techniques. For example, classifiers evaluate products on assembly lines as “acceptable” or “unsatisfactory” or computed tomography images as “tumor” or “harmless”. Classification is also of central interest for artificial intelligence.
The fundamental philosophical counter-concept to classification logic or subsumption logic consists in the procedure of dialectical logic .
A distinction is made between top-down and bottom-up approaches.
With the top-down approach, the classification process consists of three individual steps:
- Specify classes
- Select features
- Draw class boundaries
It is typical for the classification that a fixed number of target classes is specified and it is only a matter of determining their limits. The category formation is responsible for determining the number and type of classes .
The selection of meaningful features is essential for a successful classification, since the number of required observations grows exponentially with an increasing number of features. In practice, however, the number of observations is fixed, as a result of which, from a certain point, the quality of the classifier decreases again with additional features (see also overfitting ).
For classification it is therefore important to determine decisive characteristics. Various methods are used for this:
- Feature selection process
- Principal Component Analysis (PCA)
The processes vary in complexity and, depending on the application, deliver satisfactory results; the selection of the features may have to be carried out again if the selection was not made appropriately. Less important characteristics can also play a decisive role for the classification in connection with some other characteristics, so that not too few characteristics may be selected.
Choosing the right classification method and an efficient classifier is just as crucial.
This process is often carried out unconsciously, for example with the first language acquisition with its concepts. Wilhelm Kamlah formulates:
“On the one hand, language seeks to adapt to the world and its imposing structure, while on the other hand it first gives the world a structure ... But that there is a world that is already familiar to us, in which the ever new individual is mostly a case of what is already known Encountered in general, is not explained by language, but by the fact that in the world itself there is a recurrence of the same ... "
The following difficulties can arise when classifying:
If the conditions for when an object belongs to a class and when not, are not clearly specified, it becomes difficult or even impossible to classify an object. This happens quite often in everyday use of the classification: Which criteria distinguish good and bad? What are the conditions that distinguish rock music from jazz? Clearly formulated and objectively measurable criteria are required for an unequivocal classification. In order to achieve a clear formulation, mathematics is usually used.
It is only possible to assign objects to classes if the characteristics considered actually enable the classes to be differentiated. For example, it is not possible to classify living things into the human and ape classes based on their hair color; the hair color is generally not indicative of a living being's class.
Smooth transitions between classes contradict the idea of sharp class boundaries. For example, the class boundaries of the class red in the color spectrum are very difficult to define. To enable classification, a sharp dividing line can be artificially introduced. Instead, also, by the use of fuzzy logic operates on these fuzzy sets and a sharp decision be taken by the defuzzification. For smooth transitions in the field of language cf. Blurring (speech) .
Inseparability occurs especially when too few or meaningless features are considered. From this point of view, the objects appear mixed up and a clear separation seems impossible. If you want to distinguish between apples and oranges on the basis of color, size and weight, many apples and oranges could be so similar in these characteristics that a clear separation is almost impossible. Although the characteristics are selected meaningfully, there remains a gray area in which the decision is uncertain.
Unpredictable measurement errors or unusually pronounced individual specimens can lead to an object being incorrectly classified.
At the end of the classification, a group of residual objects can remain that does not fit into any of the existing classes and for which a new class cannot easily be created that would not make the entire classification system incoherent. An unsatisfactory residual category must then be set up for these objects .
Trustworthiness of a decision (confidence)
Even if all characteristics of an object are known, it can be classified incorrectly under certain circumstances (unless one regards the class itself as a characteristic). For example, a hazelnut would usually be classified as harmless, although it can kill allergy sufferers and, when shot from a slingshot, becomes a dangerous projectile. On the other hand, not every x-ray image is correctly classified as sick or not, because the image content may not allow any conclusions to be drawn about the class. If a decision is enforced - and this is usually the case with the classification - the classification can become questionable or even wrong as a result of such effects.
For this reason, modern classifiers output a value in addition to each decision, which indicates the trustworthiness (confidence) of the decision made. This measure is commonly called reliability information. A large, red tomato would be classified as "ripe" with high reliability, a medium-sized red tomato with some green areas would also be classified as "ripe", but with lower reliability. The indication of the reliability of a decision offers advantages in the processing following the classification. A mushroom recognized as “unsafe” as edible will not be eaten, whereas a “certain” recognized as edible will be.
In scenarios in which an incorrect classification has more serious disadvantages than none at all, it can also make sense to introduce an additional class “unclassifiable”.
- Decision tree
- Decision table
- Attribute space
- International statistical classification of diseases and related health problems
- Hans Uszkoreit u. Brigitte Jörg: Information science and information systems. Lecture notes, specialization in general linguistics, Saarland University.
- Hardwin Jungclaussen: Causal Computer Science: Introduction to the Teaching of Active Linguistic Modeling by Humans and Computers , Springer Fachmedien Wiesbaden , 2013, ISBN 978-3-322-81220-9 , p. 57