The volume of spam I receive has continued to rise inexorably, to the point where I’m averaging between 500 and 600 spam messages per day, and that’s just what’s getting past our rather conservative server-side filters (including the MAPS RBL+). Luckily, Michael Tsai’s SpamSieve is doing a swell job of identifying and extracting the spam from my Eudora mailboxes; it’s currently 99.6 percent correct in identifying spam. Unfortunately, in that remaining 0.4 percent of mistakes are some 230 false positives I’ve manually identified since September 2003, and although the likelihood of SpamSieve erroneously identifying good mail as spam continues to drop, it’s not perfect.
I say all this in part to raise the alarm that where I am today with receiving spam, many of you will be in six months or a year. It’s gotten to the point where just retrieving the spam has become a burden if I’m traveling and don’t have a fast Internet connection. Also, I can no longer sort through my Junk mailbox for false positives; pulling a good message or two from a few thousand spam messages is just too hard and time-consuming. As a word of advice then, please try to write normally to make sure your mail to me doesn’t look like spam to a Bayesian filter; I simply can’t guarantee I’ll see anything that SpamSieve identifies as spam.
Interestingly, SpamSieve missed a few more messages than it usually does this week. When I checked to see why, I realized it was because I had set SpamSieve to honor the Habeas headers that can be used legally only by legitimate senders (such as TidBITS), but some spammer had forged those headers to sneak past SpamSieve and similar Habeas-specific filters. Habeas issued a statement saying that they were aggressively tracking down the spammer (the spam itself appears to have originated from a distributed set of zombie PCs taken over in a past virus attack). Despite the fact that Habeas has reportedly successfully sued some spammers in the past, this seems to be the most flagrant misuse of the Habeas headers so far. Habeas must bring down this spammer – and any that try the same trick – in a timely fashion to maintain user confidence in the Habeas headers as a mark of legitimate mail.
Although we have the necessary technologies, ranging from Bayesian analysis and whitelists to challenge-response and real-time blackhole lists, to control aspects of spam at an individual user or even individual server level, the vast variety of email systems that speak the basic language of open, trusting SMTP has ensured that spam will overwhelm increasingly large chunks of the Internet email infrastructure. Our older mail servers are staggering under the load even now, and I cringe every time I hear the horror stories from one of my ISPs about their Herculean efforts to keep legitimate mail flowing while under the onslaught of thousands of zombie spam-delivering machines.
There are no easy solutions. In an upcoming issue, Brady Johnson will give us a look at the effect the U.S. CAN-SPAM Act of 2003 will likely have on the volume of spam. There are various Internet standards organizations working on the problem as well, but from my discussions with John Levine of the Anti-Spam Research Group of the Internet Research Task Force, they have no magic bullets in the works. And for all of us who think we have the answer (the so-called "Final Ultimate Solution to the Spam Problem," or FUSSP), it’s worth reading the final link: "You Might Be An Anti-Spam Kook If…," whose author’s tongue was only partially lodged in his cheek.