Markov spam filter

from Wikipedia, the free encyclopedia

The Markov spam filter (after Andrei Andrejewitsch Markow ) is a spam filter based on a hidden Markov model and is a further development of the Bayesian spam filter . The spam filter calculates the probability with which the word strings of the checked text match the word strings of typical spam texts. While a Bayesian spam filter calculates the probability of individual words, the Markov spam filter uses word chains to determine the probability and weights the individual possible combinations. If the word strings of the checked text are similar to those of typical spam texts, the checked text is considered spam .

Example of weighting the possible combinations

Using the example of the sentence "The quick brown fox jumps ..." one can illustrate the possible combinations and weightings 2 2N in the Markov spam filter:

Word chain weighting N
Of the 1 0
The fast one 4th 1
The <...> brown one 4th 1
The <...> <...> fox 4th 1
The quick brown one 16 2
The <...> brown fox 16 2
The fast <...> fox 16 2
The quick brown fox 64 3

Formal representation of the probability calculation

While the probability due to the Bayesian spam filter through

is specified applies to the Markov spam filter

.

literature

  • Shalendra Chhabra, William S. Yerazunis, Christian Siefkes: Spam Filtering using a Markov Random Field Model with Variable Weighting Schemas . In: Fourth IEEE International Conference on Data Mining (ICDM'04) . 2004, p. 347-350 , doi : 10.1109 / ICDM.2004.10031 .

Web links