Markov spam filter
The Markov spam filter (after Andrei Andrejewitsch Markow ) is a spam filter based on a hidden Markov model and is a further development of the Bayesian spam filter . The spam filter calculates the probability with which the word strings of the checked text match the word strings of typical spam texts. While a Bayesian spam filter calculates the probability of individual words, the Markov spam filter uses word chains to determine the probability and weights the individual possible combinations. If the word strings of the checked text are similar to those of typical spam texts, the checked text is considered spam .
Example of weighting the possible combinations
Using the example of the sentence "The quick brown fox jumps ..." one can illustrate the possible combinations and weightings 2 2N in the Markov spam filter:
Word chain | weighting | N |
---|---|---|
Of the | 1 | 0 |
The fast one | 4th | 1 |
The <...> brown one | 4th | 1 |
The <...> <...> fox | 4th | 1 |
The quick brown one | 16 | 2 |
The <...> brown fox | 16 | 2 |
The fast <...> fox | 16 | 2 |
The quick brown fox | 64 | 3 |
Formal representation of the probability calculation
While the probability due to the Bayesian spam filter through
is specified applies to the Markov spam filter
- .
literature
- Shalendra Chhabra, William S. Yerazunis, Christian Siefkes: Spam Filtering using a Markov Random Field Model with Variable Weighting Schemas . In: Fourth IEEE International Conference on Data Mining (ICDM'04) . 2004, p. 347-350 , doi : 10.1109 / ICDM.2004.10031 .