In the shower after my walk tonight, I was thinking about Google's page rank and that Spam is actually the opposite problem. The more people "paying attention" to a particular email message, the more likely it is Spam. So, here's the idea: strip off the headers and create an MD5 hash of the body. Put that in an associative array associated with a count. Everytime someone sees the email, increment the count. Any message with a count over 1000 is likely Spam (or a big mailing list). You could build this as a module in SpamAssassin and have a central clearing house that SpamAssassin uses. A test and increment function would result in a count being incremented and returned in a single call.

So someone has to have already tried this or determined why its a dumb idea. Which is it? One reason it might not work is that Spammer could individualize each message in a tiny way so that the hash broke.

Update: Pat Ekman writes to say that this is essentially what the Vipal's Razor module for SpamAssassin does. Very good. Does anyone care to comment on how well it works?