Series: Distributed Filtering
Use MailSmith's unique filtering to manage your email, or adapt the ideas to other email clients!
Article 1 of 2 in series
We've never met, but I know something about you: you're getting more email this year than you did last year, possibly a lot more. If you simply let messages pile up in your incoming and outgoing mailboxes, sooner or later you'll have an organizational nightmare on your handsShow full article
We've never met, but I know something about you: you're getting more email this year than you did last year, possibly a lot more. If you simply let messages pile up in your incoming and outgoing mailboxes, sooner or later you'll have an organizational nightmare on your hands. The best way to prevent this nightmare (and the best way to deal with the mess if it has already developed) is to define and use email filters. Indeed, after allowing you to receive mail and send mail, helping you organize your mail is the single most useful thing an email client can do, and filtering is the number one tool for the job.
This article is a followup to "Mailsmith 1.5: Lean, Mean Email Machine," my review of Mailsmith in TidBITS-638. In that review, I stated my judgment that Mailsmith's filtering options are more powerful, more flexible, and more varied than those of any other Mac OS email client. Mailsmith's most distinctive feature, called "distributed filtering," is so novel that the editors of TidBITS have given me a chance to say a bit more about the subject, both so people considering Mailsmith come to appreciate what it might offer them, and so those already using Mailsmith can take full advantage of the power at their fingertips.
Distributed Filtering -- You can use, and people do use, Mailsmith's filters in the traditional way, simply sorting incoming messages into the appropriate destination mailboxes. Mailsmith's traditional filters are powerful; perhaps more so than those in any other email program. But Mailsmith also provides a completely different and wholly original way to approach filtering: distributed filtering.
If you use traditional filters, every message, as soon as it hits the incoming mailbox, is examined by each and every filter you have defined. Even if the message happens to meet the test in filter 29, it must usually continue to be tested against filters 30 through 50. When all the filters have had a chance to examine the incoming message, the program determines which tests, if any, have been satisfied, then decides how to process the message, resolving conflicts between filters if necessary. Normally, the result is that the message is sent directly to the mailbox where you want it to end up. Note that, in this scenario, the way your mailboxes are organized has no effect whatsoever upon filtering.
Not so with Mailsmith's distributed filtering, which uses the way your mailboxes are organized as a way of controlling and limiting the application of filters to incoming messages. Incoming messages are greeted initially by the mailboxes at the top level of the hierarchy, starting with the first one in alphabetical order. As soon as a mailbox "recognizes" an incoming message, that is, as soon as a test in one of the filters attached to a mailbox is met, that mailbox lays claim to the message. The message now continues to be examined by any mailboxes inside the one that claimed it; but the message will never be tested by filters attached to mailboxes at the first level of the hierarchy that come alphabetically after the mailbox that claimed it.
How Distributing Filtering Works -- My description above is by necessity a bit abstract, so let's look at a concrete example that shows the power of distributed filters.
Consider the following mailbox hierarchy, based loosely on my own setup. The (incoming) and (trash) mailboxes belong to Mailsmith - in other words, they are created by Mailsmith and cannot be moved, deleted or renamed. I created the other top level mailboxes - clients, lists & subscriptions, and personal - which correspond to the three main server accounts (POP mailboxes) from which I download email.
(incoming) (trash) clients lists & subscriptions - Mailsmith -- Mailsmith / keep - FileMaker -- FileMaker / keep - TidBITS personal
Let's create filters that will catch messages from the first two server accounts:
If Server Account Contains "clients" [Then] Deposit
If Server Account Contains "lists" [Then] Deposit
Now attach the first filter to the clients mailbox and attach the second filter to the lists & subscriptions mailbox. (Note that my example filters look much like what you will see in the Mailsmith filter definition dialog. I've edited them only slightly to make them easier to understand here.)
When new mail arrives from the lists server account, it will be offered first to the clients mailbox for examination, because "clients" sorts alphabetically before "lists & subscriptions." But the message won't match the criterion in the clients filter, so it will be passed to lists & subscriptions. The filter attached to that mailbox will match the message, so the message will be deposited in lists & subscriptions. The personal mailbox will never see it.
But the message is not home yet. It may be filtered further by the mailboxes inside lists & subscriptions. There is a mailbox named "TidBITS" in there, and let's assume that this filter is attached to it:
If To Contains "tidbits" [Then] Deposit
If our imaginary message happens to meet this test, it will end up deposited in the TidBITS mailbox. Using distributed filters with the "deposit" action, messages percolate through the mailbox hierarchy in a straightforward and efficient way.
Why is this approach better? Setting up distributed filters is concrete. You can visualize the way your filters will work by simply looking at your mailbox list. This makes troubleshooting easier, too. None of my mailboxes have more than one or two filters attached to them; my incoming mailbox has no filters attached to it at all. If mail does not end up where it is supposed to end up, I just observe where it does end up in my folder hierarchy, and climb back up the mailbox tree until I find the branch where things went wrong. This process almost never requires looking at more than one or two filters. In Microsoft Entourage, by contrast, if you have fifty filters and one isn't working, almost any of the other filters could potentially be causing the problem -- not to mention Entourage's mailing list rules and junk mail filters, both of which are located elsewhere in the program.
Filtering to the Max -- So far we've looked only at the basics of distributed filtering. What's most impressive about distributed filtering is not that it does what traditional filters do, just a little better, but rather that distributed filtering takes the whole idea of processing your mail to a new level. Consider the following:
I subscribe to the active and helpful Mailsmith Talk list. A filter initially deposits incoming mail from the list in a mailbox named "Mailsmith." When I find the time to read new messages, their status changes from unread to read automatically. I enjoy reading all the messages (traffic on the list is not so heavy that this is impossible) but I'm interested in saving only a handful each week. So as I read, if I want to keep a message for future reference, I use a simple keystroke I defined to mark the message with a custom label ("keep"). Now, inside my "Mailsmith" mailbox there is a child mailbox named "Mailsmith / keep," to which two filters are attached. Here is the first, named "Archiving."
If ((Label Is Equal To "keep" Or From Contains "email@example.com") Or Answered Is Equal to True) And Read is Equal to True [Then] Deposit
I've used parentheses above to show how Mailsmith interprets the criteria. This filter catches messages that meet one of the initial three criteria - I applied the label "keep" to them, they're from me, or I replied to them, - and they have been read.
What happens to the rest of the messages? They are processed by the following simple filter named "Trash."
If Read Is Equal To True [Then] Transfer [to] "(trash)"
This filter simply takes everything that wasn't caught by the first filter and moves it into the trash mailbox.
Note that the alphabetization of the filter names matters here. If the Trash filter got to the messages before the Archiving filter, well, all my read mail would get routed into the trash. I could make the Trash filter safer by adding more tests to it, but I have come to trust this setup completely.
Of course, incoming messages are by definition unread, so these filters never catch new messages. They process messages after they have been read; most filters process messages before they are read. So how are these filters activated? Although I could automate the process by writing a simple AppleScript script that runs, say, every time I launch Mailsmith, I prefer to activate the filters manually, by using Mailsmith's Re-Apply Filters command on selected mailboxes. Messages that had already been filtered once when they arrived are now filtered again, and since their properties have changed, they meet filter tests that they didn't meet originally.
And so all my list traffic - hundreds of messages a day - is processed from cradle to grave, so to speak, by Mailsmith's distributed filters. I don't bother deleting messages one by one. Instead, as I read, I focus on what I want to keep, rather than on what I want to trash. This is far more efficient, since in most cases, I want to keep far fewer messages than I want to delete.
Contextual Filtering -- But wait, distributed filtering is even cooler yet! You can attach the very same filter to many different folders, and its effect will be determined by the context in which it is applied.
All of my list mail is processed in exactly the same way as mail I receive from the Mailsmith Talk list. Mail from the various FileMaker lists I subscribe to is deposited initially in a "FileMaker" mailbox. Inside that mailbox, there is a child mailbox named "FileMaker / keep," to which are attached the same two filters attached to the "Mailsmith / keep" mailbox.
Look back at those two filters and you'll see they test for properties that have nothing to do with whether a message came to the Mailsmith list or the FileMaker list. You can test in Entourage to see if a particular message is in a particular folder and respond accordingly, but that isn't contextual filtering, because the test must be defined within the filter.
Filtering Multiple Accounts -- Distributed filtering works exceptionally well for users like me who have multiple email accounts. It lets me route all mail from one account directly into that account's top-level mailbox, and then filter further using content-based tests specific to the mail I get from that account. The content filtering works especially well for my list traffic, since lists messages always come to the same address and are easy to match in a filter.
Unfortunately, not all of my incoming mail is so cooperative, and some of the uncooperative mail is extremely important. I try to encourage my clients to use a special email address when they write to me, so their mail ends up in a dedicated POP account. I can then snag it with this filter attached to the clients mailbox:
If Server Account Contains "clients" [Then] Deposit
Inside the clients mailbox, I have special mailboxes defined for clients with active projects. Each of these mailboxes has attached to it a filter that catches mail specifically from that client. For example, the mailbox for a client named Not So Big Company, Inc., might look like this:
If From Contains "@notsobig.com" [Then] Deposit
But as you might imagine, my clients do not always use the preferred address when they write to me. Sometimes client mail comes to my personal account instead. My solution is simply to attach the client-specific filters both to the top level "clients" mailbox and to the individual client mailboxes inside it. That way, if the first filter doesn't catch the message, the second filter will. Any given mailbox can have multiple filters attached to it.
Is this approach better than simply defining a transfer-action filter and attaching it to the incoming mailbox? I think so. Even when there is a certain amount of redundancy in the way they are applied, distributed filters are still easier to define and troubleshoot, although it would be nice if Mailsmith's filter list could show me to which mailboxes a given filter is currently attached.
Next week, I'll finish up this explanation of Mailsmith's innovative distributed filtering by examining how you can use distributed filtering to manage not just your incoming mail, but your outgoing mail as well. Plus, we'll look at how distributed filtering can help you stem the ever-increasing tide of spam.
Article 2 of 2 in series
Last week I explained how you can use Mailsmith's distributed filters to manage your incoming mail in flexible and efficient ways. This week I concentrate on outgoing mail, with a few tips on handling mail you do not expect - and may or may not want. Filtering Outgoing Messages -- In most email programs, the mail you send is all lumped together in a single Out box on the assumption that you probably don't want to read something you've writtenShow full article
Last week I explained how you can use Mailsmith's distributed filters to manage your incoming mail in flexible and efficient ways. This week I concentrate on outgoing mail, with a few tips on handling mail you do not expect - and may or may not want.
Filtering Outgoing Messages -- In most email programs, the mail you send is all lumped together in a single Out box on the assumption that you probably don't want to read something you've written. However, it's often important to go back and discover what you said to someone (did you really tell your client that job would be done by Friday?), or make sure you're not repeating yourself on a mailing list. With every outgoing message stored in one Out box, you must either perform a search or scan the messages by date to find the one you're looking for.
To work around this annoyance, I use Mailsmith to filter my outgoing mail into logical locations. For example, all messages between me and a client - incoming as well as outgoing - are grouped together in one mailbox, making it easy to experience the back-and-forth nature of our correspondence.
On the surface, it would appear that you can't use Mailsmith's distributed filtering to process outgoing messages. It's true that you can't get mail out of the outgoing mailbox using a deposit-action filter, because a move directly from the outgoing mailbox to any user-defined mailbox would be a lateral move, and deposit-action filters don't work this way. They must always drill further down inside a given mailbox. The trick to making this work is to get all your mail out of the outgoing mailbox and into one that contains the other mailboxes - you can do this by defining only one such filter. From that point on, deposit-action distributed filters can kick in.
The test to use for this filter is simple enough:
If Sent Is equal to True...
Two things to note about this test. First, the special Sent property of messages in Mailsmith applies to only outgoing messages; it fails (or is ignored) when applied to incoming messages. Second, since mail in the outgoing mailbox is filtered only after it has been sent, this test is strictly a formal requirement. You can't define an action without defining a test to trigger it, and this is the one test that all outgoing mail will satisfy.
So where do we transfer these messages? The incoming mailbox would seem like the obvious choice, since all other mailboxes, including the outgoing mailbox, sit logically inside it. But that hierarchy is precisely why the incoming mailbox won't work. Follow along for a minute: You send a message. Once it's on its merry way to the intended addressee, Mailsmith transfers it out of the outgoing mailbox and up to the incoming mailbox, and since the transfer action causes further filtering to be halted, the message just sits there. Later (seconds later, or weeks later) you reapply your filters to this message. It is offered first to the outgoing mailbox, which - you guessed it - kicks it back up to the incoming mailbox. It's like catching a fish that's too small, throwing it upstream, then catching it again, and throwing it back upstream. If you don't want to see that fish any more, you need to throw it in the other direction.
The solution is to create a catch-all mailbox that lives downstream from the outgoing mailbox in the filtering hierarchy and contains all your other user-defined mailboxes. Then, you start filtering everything from there. Accordingly, my folders are set up something like this:
(incoming) (outgoing) my mail - clients - lists & subscriptions - personal
The mailbox named "my mail" is one that I created. (I could have named it anything I wanted.) Only three filters are attached to it to catch mail from all of my server accounts. When messages are downloaded, they are moved here first. Nothing is ever left in my incoming mailbox.
This arrangement proves to be very flexible no matter what type of outgoing mail I have. I have outgoing messages to mailing lists deleted, since I know I'll get copies back from the list. I use a single filter for this purpose, with a test that catches outgoing messages to each of the lists I subscribe to:
If To Contains "firstname.lastname@example.org" Or To Contains "email@example.com" (etc.)
This filter does not need the "Sent Is equal to True" test. That test was simply a formality to catch outgoing messages that didn't match any more specific tests. Why don't I just test to see what server account is being used for outgoing mail and throw everything sent using the "lists" account to the trash? Because occasionally I write off-list messages to people using that account, and I may want to preserve them.
Outgoing messages not addressed to lists are then processed by the next filter:
If Sent Is equal to True Transfer (to) "my mail"
This moves everything that is not to a list into the "my mail" mailbox.
From time to time, I manually refilter that mailbox so the appropriate subordinate mailboxes in my hierarchy pull all the outgoing messages into themselves. Doing so insures that messages to my mother end up in the same folder as messages from her.
And the neatest thing is that the same filter processes both incoming and outgoing mail. How is this possible? To use a modification of an example from last week, I use the following filter to catch correspondence from a certain imaginary client:
If (any address) Contains "@notsobig.com" [Then] Deposit
This filter catches not only mail to me from the guys at Not So Big, Inc., but also my mail back to them. The "(any address)" criterion first appeared in Mailsmith 1.5.3. So: one correspondent, one mailbox, one filter. Very efficient.
Are Transfer-Action Filters Obsolete? With the sole exception of the filter used to extract sent mail from the outgoing mailbox, all of the filters I have described use the deposit action, because it's integral to the concept of distributed filtering. If you use the deposit action to pull a message into a folder, you don't have to specify the folder's name, and that means you can use the same filter in many different contexts. Plus, the deposit action does not forestall additional movement of the message the way the transfer filter does. The deposit action - unique to Mailsmith - is so important to distributed filtering that it's easy to think they're one and the same thing.
Nevertheless, the transfer action remains useful, at times even necessary. As I pointed out above, you must use at least one transfer-action filter if you want to filter outgoing mail, since deposit-action filters can't get their hands on outgoing messages any other way.
The essential ideas of distributed filtering are, first, that different filters are attached to different mailboxes and second, that the filters are applied in conformity to the way you organize your mailboxes. Almost every mailbox in my hierarchy has at least one filter attached to it. The one exception is the incoming mailbox, which has absolutely none.
Dealing with Leftovers -- Because I filter the messages I expect so aggressively, almost all of my correspondence with lists, clients, family, and friends ends up in the right place instantly. But not all of it. Five to ten percent of the mail I receive is either (a) welcome but unexpected or (b) extremely unwelcome but increasingly expected - in other words, spam. The odds are heavily weighted in favor of (b), but not heavily enough that I can simply move all unfiltered messages into the trash without perusing them first.
There's not much you can do about the messages in group (a). You can't create filters for messages you don't see coming. A couple weeks ago, I received email from my best friend in high school. I hadn't heard from him in twenty-five years, so I didn't have a filter defined for him. Even some messages you do expect are hard to filter, for example, acknowledgments from online stores where you've just placed an order. These messages land in the "my mail" mailbox and I file them by hand.
And as for group (b) - spam - well, filtering spam turns out to be constant and persnickety battle. I do want to note, however, that the fact that Mailsmith lacks a built-in spam-sniffing process like those in Microsoft Entourage and Apple's Mail does not mean that Mailsmith users are by any means defenseless against spammers. Although traditional filtering techniques work as well on spam as distributed filters, Mailsmith still performs well thanks to its powerful grep pattern matching capabilities. The members of the Mailsmith Talk list love to share spam-catching tests, many of which make use of grep to tease out the subtle patterns that differentiate spam from legitimate messages.
Honestly, though I initially wrote more about filtering spam, over the last few weeks I've stopped using most of my homegrown filters in favor of a new shareware utility for Mac OS X called SpamSieve. Written by developer Michael Tsai, SpamSieve employs Bayesian probability theory to identify junk mail (the first link below explains the theory behind Bayesian filtering). You have to train SpamSieve by feeding it both spam and legitimate messages, but once it has a satisfactory statistical base, you can ask it to start identifying and labeling spam, using Mailsmith's custom labels feature; then you filter the spam wherever you want. I've been using SpamSieve with excellent results - no false positives, and a growing success rate at identifying the mail that I personally regard as spam. And the best thing is that it merely extends Mailsmith's capabilities, so what happens to the spam remains entirely within my control. [We're planning a full review of SpamSieve soon - it currently supports Mailsmith, Entourage, and CTM Development's PowerMail; support for other email clients, including Eudora, is in the works. -Adam]
Getting It -- Distributed filtering is so novel that it took me a while to "get it," and I have noticed other people going through a similar evolution on the Mailsmith Talk list. If you don't get distributed filtering, or if for some reason you decide you just don't like it, Mailsmith lets you work entirely with traditional filters, and even in this area, it's more powerful than any of its competitors. But if you stick with distributed filtering for a while, you will get it, and once you do, you won't want to go back.
PayBITS: Did learning about Mailsmith's distributed filtering
save you time? If so, why not drop Will a few bucks via PayPal?
Read more about PayBITS: <http://www.tidbits.com/paybits/>