Spam filter

A spam filter ( advertising filter ) is a computer program or module of a program for filtering unwanted electronic advertising ( spam ).

A classic area of application is the filtering of unwanted e-mails as a module of an e-mail program or a mail server . More recent applications of importance are the filtering of pages in the web browser for advertising banners , for blogs ( blogspam ) or for wikis ( linkspam ).

Approach of control

Verification of the sender based on their email address or URL
Control of the servers that send, forward or make the content available
Sorting out according to the header
Sorting out based on the text ( content filter )

Methods of control

Blacklist method

This method checks the contents of the e-mail for certain expressions or phrases or the sender of entries from a negative list ( blacklist ). If the printout is included in the e-mail, the e-mail is sorted out. These blacklists generally have to be created manually and are correspondingly complex to manage. However, many spam filters already contain preset blacklists. In addition, the hit rate is not very high, since now and then spam can be classified as good e-mail and good e-mail as spam. Such filters can also be easily avoided: B. Viagra in the blacklist, the filter Vla * gr-a will not recognize. If the filter allows regular expressions to be entered, you can use appropriately sophisticated filter patterns that take all conceivable spellings into account, e.g. B. v.{0,1}[!iíì1\|l].{0,1}[aáàãå@].{0,1}g.{0,1}r.{0,1}[aáàãå@].

One of the best-known programs under Linux and other Unix derivatives is SpamAssassin , which scores every email according to various criteria (obviously invalid senders, known spam text passages, HTML content, sending dates that are dated in the future, etc.) and, if a certain number of points is reached, as spam classified. SpamPal and SPAVI also work with a blacklist , which, in addition to the respective e-mail itself, also examines the pages linked in the e-mail for suspicious terms. Razor and Pyzor, in turn, generate a hash value for every email and check in central databases whether other people who have also received this email have classified it as spam or not.

Bayesian classifier method

Alternatively, the spam can be filtered with a self-learning Bayesian spam filter based on the Bayesian probability . The user has to manually classify the first 1000 emails as spam or non-spam. The system then recognizes the spam e-mail almost automatically with a hit rate of mostly over 95%. The user has to manually re-sort e-mails incorrectly sorted by the system. This increases the hit rate steadily. This method is usually clearly superior to the blacklist method.

Bogofilter and Mozilla Thunderbird as well as the Spamihilator, which is particularly popular in German-speaking countries, use this mechanism in their current versions. The program must be trained by the user before it can reliably detect spam.

A method related to the Bayesian filter is the Markov spam filter . It uses a Markov chain for this and is more effective than a Bayesian filter, as William Yerazunis was able to show with his spam filter CRM114 .

Database-based solutions

As early as the 1990s, it was discussed on Usenet to detect spam based on the URLs advertised in the mail (and possibly phone numbers ). The spammers can modify and personalize the messages as they wish, but since ultimately (with UCE ) the aim is always to entice the user into contacting them, and the possible address space is not infinitely variable, this approach enables theoretically very good detection. It is particularly interesting that no heuristics are used, which always entail the risk of incorrect identifications. However, due to the technical requirements, reaction speeds, etc., this was long considered impractical. The SpamStopsHere spam filter is based (as a centrally hosted solution) on exactly this idea and shows that it can also work in practice.

Problems

The sorting of e-mails is always associated with a certain error rate. On the one hand, spam mails are not recognized and thus reach the inbox as " false negative ". If desired mails are classified as spam, one speaks of " false positive " detection. If the filter is trained for a sufficiently long time, “positive” errors can be almost completely ruled out (using a white list, for example ) and “negative” errors can be reduced to 10% to less than 1%. However, this involves a certain amount of effort. In addition, filters have to be constantly adapted to the new methods of spammers through improved methods.

Example of a method of obfuscation

The following spam was sent to the same recipient list every few days. It comes from the same sender, has the same content and makes it clear that the spammers can use small variances to deceive spam filters and thus reach the addressees directly.

	first spam	second spam
Subject	treat as urgent by Christopher	Greetings from Christopher
Reply address	jchrist1@____.org (domain here has been made unrecognizable)	jchrist@____.org
Salutation line	good ay.	Hello friend.
First sentence	i am mr.christopher johnson head of accounting udit department of credit suisse bank london 38 strand, city west minister, london wc2n 5jb, here in england.	I am Mr Christopher Johnson Head of Accounting Audit at Credit Suisse Bank London 38 Strand, City of Westminster, LONDON WC2N 5JB, here in England.
Center of text	This is very urgent please.	This is very URGENT PLEASE.
Center of text	1. Full name, 2. Your direct cell phone number, your address, 4. Occupation, 5. Age, 6. Sex, 7. Nationality	1. Full name, 2. Your direct mobile number, 3. Your contact address, 4. Profession, 5. Age, 6. Sex, 7. Nationality
End of text	Please on your confirmation of this message and indicate your interest I will provide you with further information. make an effort to let me make your decision rather than make me wait. thank you in advance for your positive answer. Greetings, Mr. Christopher Johnson	Please on your confirmation of this message and indicate your interest I will provide you with further information. Endeavor, let me know your decision instead of making me wait. Thank you in anticipation of your positive answer. Greetings, Mr. Christopher Johnson

Web links

Wiktionary: Spamfilter - explanations of meanings, word origins, synonyms, translations

Link catalog on the topic of spam filters at curlie.org (formerly DMOZ )

Individual evidence

↑ The two spams were sent on July 13th and 26th, 2013.

[1] The two spams were sent on July 13th and 26th, 2013.