Back in mid-September, I received a comment from Chris Järnåker pointing me to this nifty project he'd written:a greylist SMTP transport sink for Exchange 2003, Exchange 2000, or the IIS SMTP service. Neat idea, I say, especially when I notice that it uses the .NET 2.0 framework. I didn't actually get around to installing it on my home Exchange server until last week, though.
For the past several months, my wife and I have noticed a steady rise in the amount of spam getting past the Exchange IMF and the Outlook Junk Filter. It was pretty clear by looking at the logs that this increase had less to do with the spams getting better at slipping past the filters and more to do with a rise in the sheer volume of messages coming in. In fact, it's a phenomenon that has been pretty noticeable on the Internet level, as this graph demonstrates. We'd gotten to the point where we were having between 25-50 spam messages in the Inbox each day, and that was far too much; back after I'd switched on the IMF and some judicious blocklisting, we'd only had 4-5.
So, how does this greylist sink do?
One word: it's terrific. Yesterday, I got a single -- that's right, one solitary spam in my Inbox. On average, we're back down to 2-3 spams per day. It doesn't seem to be putting that much of a load on the Exchange server, and the default 2-minute delay hasn't resulted in any sort of noticeable slowdown in message delivery.
If you're not familiar with greylisting, it's a spam-blocking strategy that relies on the simple fact that the vast majority of spam runs are produced by software that doesn't actually implement a full SMTP sender. Early spammers tried to bounce spam through existing mail servers, but these are relatively easy to find and shut down (or block, at the router if need be). Instead, the typical spammer now uses a large number of clients (often botnets of zombies, or machines that have been infected with malware and can be remotely controlled by the miscreants) to perform a distributed run of messages. The spam literally comes from hundreds or thousands of discrete IP addresses, making it difficult to control by traditional listing methods.
Enter greylisting. What greylisting does is keep track of the properties of incoming connections, typically some combination of the source IP address, the envelope sender address, and the envelope recipient address. This combination is known as a triplet and is treated as a single data item, even though it's really a composite of three fields. The first time a greylist engine sees a connection from a given triplet, it records that fact in its database and instructs the SMTP server to issue a temporary error. By design, when SMTP-compliant machines get a temporary error, they queue the message up and try to send it again in a short time.
However, queuing and retry is bad news for spamware, because it means messages have to be written to and read from disk. It's easy to open up a few outbound SMTP connections without the user noticing when you're doing it all from memory, but if you start thrashing their disk, sooner or later they're going to catch on that something isn't right with their computer. A lot of spam-sending malware treats temporary errors as equivalent to permanent errors -- more accurately, it treats all errors the same way: drop the message and move on to the next.
Forcing a single resend the first time you see a message with a particular triplet value is, therefore, an excellent way of distinguishing ham from spam. Spammers know it's being used, but many of them don't bother to change their software to accommodate it, because it slows down the resulting spew of crap. It's literally far cheaper for them to instead concentrate on pumping up the volume of spam, hoping to make up the lost revenue from blocked spam in volume.
Chris's greylist uses a local Access-format database file to store the message triplets it's seen. Not only can you adjust which fields you want it to use, you can also specify how many days it remembers them for. I've configured mine for 90 days retention to start and will tune it as necessary. I'm using the full triplet to block on, but this is another value I'll have to keep an eye on, thanks to message lists. Some message software (for good reasons too long to go into in detail right now) generates a unique envelope sender and/or recipient address for each copy of each message it sends out (it's common to see this on mailing lists); these message-specific addresses mean that each incoming message from that host will always generate a unique triplet.
The only issue I've possibly noticed; a couple of days ago, my Exchange server went nuts. Various services were stopping and starting every few minutes, and the result was that the box stopped accepting inbound SMTP for several hours until we noticed it. A simple reboot solved the problem, and I'm not sure whether the greylist was the cause of the problem or merely one of the first victims. If it happens again, I'll have to drop Chris a note and see if he can't help me figure out what's going on.
Updated 11/10: Fixed a dropped quote mark in the URL. You should now be able to follow the link to the greylist homepage.
Updated 11 Sep 2007: The author has just let me know that the product is now called JEP(S) and is available from a new website. Link updated accordingly.