Cyber Matters: Spam

Wednesday, 7 December 2011

How Do They Do IT: Spam Filters

The current mechanisms for blocking junk email fall into one of the following categories:

User defined
Black lists
White lists
Bayesian filtering

User Defined
We are all probably aware that our email client has the ability to help block junk email depending on what we tell it. When email appears in your inbox, if you think it is junk, you can typically block that particular sender. This assumes that all email from that sender will be junk and so it is a rather blunt instrument.

The corollary of blocking a sender is to tag a sender as being "safe". All email from this sender will be classed a appropriate and not be classed as junk email by one of the other methods, if it they are operating. This matters because, in trying to apply the other methods, false positives can result meaning that you can miss legitimate emails as they have been automatically diverted to a junk email folder. Many people would rather not have this happen and so make use of the "safe sender" functionality rather than blocking emails..

In Microsoft's Exchange-based email systems, users can monitor the Spam Confidence Level (SCL) score being assigned by the email server. If the threshold for the SCL is too low or high you can then ask you administrator to adjust the level. This is always a tricky balance between receiving too many junk emails and missing legitimate emails. It is also not usually something that is for a general user as setting up your email client to monitor SCL scores is not a trivial task.

Black Lists
This is what you might imagine: a centralised list of those who are known spammers. Your email server (or potentially your email client) can refer to this list and block accordingly. One of the most popular is Domain Name System Blacklists, also known as DNSBL's or DNS Blacklists.

An issue here is the proliferation of DNSBL's: it is difficult to decide which to use. Rather like anti-virus checkers, people tend to migrate to the better known names whom they feel they can trust. A considerable benefit of most DNSBL's is that they tend to include "zombie" machines which are used to avoid the simple user defined email blockers.

Recent developments have included listings email addresses that have sent to "honeypots" and ISPs that knowingly host spammers. However, there has been some concern expressed about blacklists from organisation such as the Electronic Frontier Foundation (EFF). These concerns are not so much about the technologies but about the specific policies implemented by those compiling the lists.

White Lists
Still the most common form is the user defined white list as described above. However, increasingly ISPs are supplying their customers with white lists, usually through an email client that is provided by the ISP. The ISP supplied white lists typically comprise email addresses of companies who apply to the ISP to be included as safe senders.

White lists can operate in one of two ways. They can let through only those on the list or, alternatively, the list prevents other junk email methods from deleting the message.

The concern about allowing commercial organisations to pay for inclusion on a white list is that they can effectively pay to avoid spam filters. The business models used to determine payment try to militate against this. For example, the ISP will charge depending on the number of complaints received. The ISPs argue that charging this way means that the funds can be used to invest in further spam filtering. It's not cleat if this actually happens.

There are some non-commercial white list providers. Inclusion on these lists is allowed only if the sender passes certain tests. For example, they must not allow unchecked relay of SMTP messages, which is a classic attack vector for spammers. Personally, I would recommend using one of these white lists.

Bayesian Filtering
This is a statistically-based technique using, you guessed it, Bayesian Probability. This approach determines how likely a given proposition is ie is an email spam. The probability is determined using "evidence" ie it is learned from experience.

One particular form used in spam filtering is known as a "naive Bayesian classifier", which simply means that every feature you look for evidence of in the spam emails is considered independent of every other feature. This would appear to restrict the ability of the system to learn about system combinations of content that increase the likelihood of a message being spam. However, it is fast and has surprisingly high accuracy.

Other forms use combinations of content as well as typical traffic patterns. For example, you may receive many emails with the word Viagra but you rarely send them. Hence, if you see a high proportion of email with a particular word passing across your network the likelihood of it being spam is raised.

One cannot rely totally on Bayesian Filtering as it is susceptible to "poisoning", where spammers send email using large amounts of text that is unlikely to be classed as spam. Hence, whilst individual words might raise an alert, when looked at as a whole, the message receives a lower spam score than would otherwise be the case.

Conclusion
The volumes of spam email are extraordinary. Between 70% and 80% of all email sent is spam. As none of the current methods described here are completely effective, there is still scope for much further research in this area.

Monday, 5 December 2011

It's Not Junk Email That Is The Worry But What Lies Behind Them

It’s quite scary how many home computers are unwittingly aiding and abetting cyber criminals: 6% according to the latest study reported by the BBC. And it’s not just spam email that is the problem.

One of the issues that those tackling the problem have is that spammers are becoming ever more cunning in their use of email content. Whilst spam filters look for obvious content, often through key word monitoring, the spammers subtly change the content so that it might appear readable to a recipient but not to an automated process. The classic is replacing a letter (say “l”) with a number (say “1”).

With the latest estimates saying that spam, and malware laden emails, account for over 70% of all email traffic, this is undoubtedly significant problem. Although, attempts over the last year have seen some inroads into reducing the volumes. Microsoft report in their latest Security Intellignce Report that machines running their software (and despite the wishes of the Apple lobby the vast majority of PCs run Microsoft operating systems) have seen a significant decrease in spam emails.

Having said that, there it is a valid debate as to who should be trying to stop the email. With landmark cases such as that in European Court of Justice two weeks ago which relieved ISPs of responsible for ensuring traffic does not contravene copyright laws, who is to say that the ISPs should stop spam. After all, the Post Office does not stop junk mail by default. There is a view that we should all take more responsibility for our own machines and have email clients that can stop junk email and catch malware before it jumps from our email to our PCs.

This volume of spam does not mean great economic loss through reading adverts for illegal Viagra, cheap loans or free legal advice. Rather, the criminal activity comes from so called “phising” emails. You might think it rather daft to respond to, for example, someone calling themselves the ex-President for Nigeria who, if only you would deposit £1000 in his account, could release millions and he would reward you tenfold. We’ve all had them. But if you send enough of them, then someone will fall for the scam.

There is classic hacker trick where you obtain a phone book for a company. Then you ring around each number in the book saying you are “technical support” and that you have called to help them with their problem. Eventually you will reach someone who has a problem and lodged a call for help. You then ask for the username and password, which of course they are happy to provide as you have proven you are technical support by responding to their call. How else woul dyou have known to call them? The current equivalent are the emails. We all receive emails from banks saying that they are responding to our call for assistance and would you just click this link and enter your details in the very authentic looking website. The medium is different but the con is the same. With billions of spam emails each day, the spammers can collect a frightening number of credentials.

However, in my opinion, the fact that such large proportion of home machines host unknown malware hides a bigger threat than simply spreading large volumes of annoying and phising emails. By hijacking so many PCs it is possible to mount a massive probing operation that can seek out high value targets that are susceptible to classic hacking attacks. A good example is what is known as “SQL Injection” attacks. If an attacker had to manually probe every system using SQL to see if it was vulnerable his/her arms would fall off before they found a victim. But, automate the process across many thousands of “bots”, each of which is reporting success or failure back to some master criminal machine, and you’ll have an embarrassment of victims from which to choose. In fact, this is so effective that an industry is growing up in which one set of criminals will find the vulnerable machines and then sell the list to other criminals.

So, am I worried about junk email? No. Am I worried about those same hijacked PCs supporting criminal hacking. Yes. The graphs show that the junk email is beginning to be tackled but what is less clear is if the hidden activity of these botnets is being tackled. My guess is not.