Shogun (toolbox)

from Wikipedia, the free encyclopedia

Shogun is an open source toolbox from the field of machine learning . It can be used to solve regression and classification problems and learn hidden Markov models , among other things .

Application focus

Fig. 1. The figure shows the Python interface of the Shogun toolbox. You can see a support vector machine classification (left) and regression result (right)

The focus of this toolbox is on the so-called core methods (see Kernel (machine learning) ), with the focus on bioinformatics . For this purpose, a number of cores on sequences (so-called string cores) were implemented and especially designed for speed for use on large amounts of data. The Toolbox can be used to solve problems with string cores on very large amounts of data (up to 10 million).

In particular, Shogun provides generic interfaces to many different implementations of Support Vector Machines (SVMs), such as: B. SVMlight and libSVM. This allows all SVMs to use the same core implementations and makes it easier to add new core-based learning methods. In addition to the following standard kernels (linear, polynomial, Gaussian and sigmoid kernel (see Kernel (machine learning) )), Shogun has efficient implementations for recently published string kernels, such as: B. the Locality Improved, Fischer, TOP, Spektrum, Weighted Degree Kernel (with shifts). The efficient LINADD optimizations were implemented for the latter.

particularities

Shogun also makes it possible to work with your own precalculated cores . One of the main features of this Toolbox is the so-called combined core (engl combined kernel.) Extending from a weighted linear combination composed of sub-cores: . The sub-cores do not necessarily have to work on the same input space , but on different domains . Shogun can have an optimal sub-core weight; H. , learn through the Multiple Kernel Learning Algorithm. In addition to the SVM 2-class classification and regression problems, a number of linear methods are also implemented in Shogun. Examples include the discriminant analysis (Engl. Linear discriminant analysis (LDA)), Linear Programming Machine (LPM), (Kernel) perceptrons and Hidden Markov Models . Shogun can process a wide range of data. Not only fully populated input matrices, but also sparsely populated as well as strings , each of which can be of the type integer / floating point (single or double precision), are possible. Furthermore, chains of preprocessors can be attached to the inputs so that the inputs can be processed further on-the-fly by the learning algorithms.

Interfaces

Shogun is implemented in C ++ and offers interfaces to Matlab , R , Octave and Python. These interfaces allow interactive experimentation with the learning algorithms (see Figure 1 for the Python interface), but also batch script processing on computer clusters .

literature

Web links