Relief algorithm

The relief algorithms , a family of algorithms for attribute weighting , belong to the supervised learning methods of machine learning. First of all, the relief algorithm does not refer to future decision-making processes, but is a tool for retrospectively investigating the question of which attribute had the greatest or least influence on the decision. This is particularly useful when decisions are not made arbitrarily, but in technical contexts physical, chemical or other parameters lead to a certain test result, which is important and which is not. From the attribute weighting with one of the relief algorithms, conclusions can be drawn from decision processes that have already been carried out (e.g. tests) on similar processes (with different attribute values), thus possibly making their implementation superfluous and thus saving money and effort. To use them, it is first necessary to define a distance between two instances , which results from the differences between the attributes for each of the instances. The totality of all possible instances is also referred to as an instance space. This is i. A. No vector space in the mathematical sense, since usually no vector addition can be meaningfully defined, nor is multiplication by a number. It is, however, a metric space, since a distance (see above) is defined for every two instances. The difference between the attribute values of two instances is also for nominal defined values, namely (originally) and 0 if the authorities agree in this attribute, and otherwise than 1. As a distance between the instances you need only the so-called generally Manhattan distance to use, that is, the sum of all differences.

A simple example

In order to add some “meat” to the theoretical “skeleton” from the outset, a simple example with (nominal) attributes should be added here to clarify the terms:

\| Attributes:	\| Outlook	\| Temperature	\| Humidity	\| windy	\| Class:	\| play
\| possible Values:	\| sunny	\| cool	\| normal	\| no	\| Class value:	\| yes
\|	\| changeable	\| mild	\| high	\| yes	\|	\| no
\|	\| rainy	\| hot

There are 4 attributes here , two of which can have 3 values and the other 2 values. An instance here is a concrete weather situation, and 3 * 3 * 2 * 2 = 36 different weather conditions are theoretically possible in this example.

The decision-making process offers two possibilities: the target variable is the decision as to whether or not it is possible to play outside in weather; This results in two classes of examples (instances): Each instance either belongs to the “play” class or the “do not play” class. The greatest possible distance between two instances in this case is 4 for the extreme cases. If you choose a weather that is played outside as I, then H must be a similar weather that is also played and M a weather that is not far from I, but which is no longer being played.

Brief outline of how relief works

The relief algorithms change the weights of each attribute A depending on how great the differences between neighboring instances of both the same and different classes in A are. Finally, one can read off the weightings that Relief has given a variable, the influence of which on the result of a decision-making process, the so-called class value of an instance. In the case of the simplest algorithm, Relief, this only works for two classes, that is, the target variable can have two values, and the data must be complete. Relief first picks out an instance I and then determines its closest neighbors, one from its own ( nearest hit , also called "next hit" and denoted by H) and one from the other class ( nearest miss , also called "next error", marked with M). The weights of all attributes in which the I agrees with H or does not agree with M are increased, the weights of those in which I differs from H or corresponds to M are decreased. If the attributes are numeric, the degree of this change depends on the difference between the attribute values.

Limits of relief and its extension

The prerequisite for the applicability of Relief is that exactly two classes are defined and no data is missing. In addition, noisy data can also impair the informative value of the algorithm. In this case, the relief must be expanded. Such extensions are generally identified by a capital letter from A to F:

ReliefA ist has the same requirements as Relief, but searches not only for one but for k nearest neighbors of a selected instance in its and in the other class. k is a user-defined parameter.

ReliefB and ReliefC modify the definition for the (normalized) difference between two attribute values. This means that you can also include instances that have no known value for individual attributes. Here the difference between two instances in an attribute A is defined as 1-1 / (number of values of A).

ReliefD represents a further modification in which the difference function is a little more complicated, but defined more 'intelligently'. Here, conditional probabilities play a major role, which are estimated according to the relative frequencies of attribute values within the individual classes.

Finally, the class value can be more than two, u. U. assume numerical values. In our weather example, in addition to the decision whether you want to play outside at all, it would also be possible to allow the option of how long you play. In this case, an algorithm can either randomly select a nearest neighbor from any of the other classes (this is how ReliefE does it ) and, as before, one from the same class, or it searches for one selected instance from each of the k other classes and from the own k nearest neighbors (this is how ReliefF makes it ).

literature

M. Robnik-Sikonja, I. Kononenko: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 53 (1-2): 23.69, 2003
Ian H. Witten, E. Frank: Data Mining, Practical Tools and Techniques for Machine Learning, Carl Hanser Verlag Munich Vienna, 2001
Qi Zhong: Using RELIEF to select Features on Mice Gene Data, Machine Learning, Winter 2004