Feature subset selection
The feature selection (FSS), short Feature Selection is an approach from the machine learning in which only a subset of the available features for a learning algorithm is used. FSS is necessary because it is sometimes technically impossible to include all features or because there are differentiation problems when a large number of features but only a small number of data sets are available.
Filter approach
Find a measure to distinguish between classes. Measure the weight of the features and choose the best ones . The learning algorithm is applied to this feature subset. Filters can calculate the intrinsic properties of the data either univariately (e.g. Euclidean distance , chi-square test ) or multivariate (e.g. correlation-based filters) .
Advantages:
- quickly calculable
- scalable
- can be interpreted intuitively
Disadvantage:
- Redundant features (related features will have similar weight)
- ignores dependencies with the learning algorithm
Wrapper approach
Search the set of all possible feature subsets. The learning algorithm is applied to each subset . Searching can be either deterministic (e.g. forward selection, backward elimination) or random (e.g. simulated annealing, genetic algorithms).
Advantages:
- Finds a feature subset that optimally fits the learning algorithm
- Also includes combinations of features, not just each feature individually
- Removes redundant features
- easy to implement
- interacts with learning algorithm
Disadvantage:
- Very time consuming
- with heuristic procedures there is the danger of only finding local optima
- Risk of over-adapting the data
- Dependence on the learning algorithm
Embedded approach
The search for an optimal subset is directly linked to the learning algorithm.
Advantages:
- better runtimes and less complexity
- Dependencies between data points are modeled
Disadvantage:
- The choice of the subset strongly depends on the learning algorithm used.
Examples:
- Decision trees
- Weighted naive Bayes
- Selection of the subset using the weighting vector of SVM
literature
- Dunja Mladenić: Feature Selection for Dimensionality Reduction . Craig Saunders et al. (Ed.): SLSFS, 2005, pp. 84-102 ISBN 3-540-34137-4
- Yvan Saeys, Inaki Inza and Pedro Larranaga (2007) A review of feature selection techniques in bioinformatics . Bioinformatics. 23 (19) 2507-2517.