On Spam
This essay is a work in process. Several people have
asked that I post it because they were interested in
the sections I have written. I hope to finish it
by November, 2009.
According to the Wikipedia article on spam, there are numerous types of spam: Usenet newsgroup spam, Web search engine spam, spam in blogs, instant messaging spam, mobile phone messaging spam, and e-mail spam. I don’t use text messaging or instant messaging very much, and so far, I’ve received no spam through either one of these methods. So I will discuss only e-mail spam, which affects almost all of us. I will particularly focus on conceptual definitions to differentiate ham (good e-mail) from spam. I’ll talk about:
- The Spamming Business“
- What is Spam? “ In this section, I provide a core definition of spam.
- Additonal Aspects“ (to be added)
- Reducing the Amount of Spam You Receive“ I provide some advice about what you can do to reduce the number of spams you read. Those who simply want my advice on this and nothing else can jump to Sections E and F and skip the rest of this essay.
- Spam Filters“ Almost everyone should install a spam filter. I analyze ___, discuss what a Bayesian filter is, and recommend one in particular (InBoxer).
- Other Methods of Combating Spam“ I focuse on other approaches to combat spam outside of what you as an individual can do.
H. One Proposal“ (to be added)
I. My Parties“ This is a discussion of whether I am a spammer in how I invite people to my parties
J. Further Reading“ For those really interested, I’ve assembled a list of worthwhile links.
A. The Spamming Business
Running a spamming enterprise is a business, subject to the laws of economics. The spamming business is definitely unethical, in most cases it is illegal, but it is a business nonetheless, just as running a drug-trafficking operation is a business. Spammers must make sufficient revenue to cover their expenses, or otherwise they will go out of business. In order to drive spammers out of business, one should focus not on appealing to their ethics. Nor are laws likely to do much good. (Laws could be useful if Congress was smart enough to pass intelligent laws, but that is a low probability event.) Laws in most cases will affect the more-honest and law-abiding spammers; the less-law abiding will flout such laws. And the lobbyists will water down any legislation that would make a meaningful dent in spam. The most effective approach is to make the business of spamming unprofitable. As soon as that happens, I guarantee that you will receive little or no spam.
Spammers operate on numbers. They send a large number — strike that, they send an extremely large number — of e-mails to people, all or almost all of whom do not want to receive the message, hoping that a few will purchase the goods or service they are selling. Unlike direct mail solicitators — which need a reasonable response rate to pay for the cost of sending you junk mail — spammers can survive with an extremely low response rate because the cost of sending e-mail is so cheap. ____. For a spammer, there are three significant factors in determining the profitability of his business:
- the cost of sending e-mails
- the response rate
- the commission he receives.
According to Paul Graham, as of 2002, the lowest cost for sending spam is about $200 to send a million spams, which is 1/50 of a cent for each spam — in other words, one can send 50 spams for a penny. [1]
Let’s do the math. Assume a spammer is selling a book that costs $20. Let’s assume that what he is selling is a legitimate product, in fact, let’s go further and say it’s actually a very good book. (Many if not most spammers sell either blatantly illegal products or services (e.g., child pornography), products that don’t work (e.g., enlargement of certain body parts) and thus the sale of such products constitutes fraud, or are scams where the stated goal (sell you a product or a service) is not the real goal (e.g., identity theft). In this case, I am giving them the benefit of the doubt.) Let’s assume you mail a million spams, and with your other costs (e.g., graphic design of the e-mail, copyediting, etc.), it costs a total of $300 to send a million spams. Let’s assume you receive a 25 percent commission for selling the book, or $5 per book. $300 / $5 = 60, so you would have to sell 60 books to break even. This is a response rate of 0.006 percent, a number so small that when I first calculated it in Excel, I had to alter the number formatting so it did not appear as 0. This is why the spamming business can be so profitable.
B. What is Spam? — A Core Definition
U.S. Supreme Court Justice Potter Stewart is most famous for his definition of pornography — “I know it when I see it.??? [2] This is also the case with spam, as Paul Graham notes:
To the recipient, spam is easily recognizable. If you hired someone to read your mail and discard the spam, they would have little trouble doing it.
So what is spam? Spam is unsolicited e-mail sent by automatic means to a large number of people, all or almost all of whom do not want to receive such e-mail. In order to be spam, an e-mail must have all of four characteristics:
- It was sent as part of a series of a large number of e-mails.
- The e-mail was unsolicited.
- The e-mail was unwanted.
- The e-mail was sent by automated means.
B-1 Large Number of E-mails
Since spam is unwanted by all or almost all of its recipients, spammers need to send a large number of e-mails as part of a spamming campaign, for two reasons. First, since almost no one is interested in reading these e-mails, by sending out a large number of e-mails, they hope that a few readers will be interested. Second, as spam filters become increasingly sophisticated (particularly with the advent of Bayesian filters), every year a smaller number of spam e-mails will bypass these filters and actually reach an inbox. Spammers typically send millions or tens of millions of e-mails as part of a spam campaign. Increasingly, such spams are being sent through zombie computers (computers that have been hijacked by the spammers to act as a sending station for spams).
B-2 Unsolicited
Spam e-mails are unsolicited in that the recipient did not ask to receive them. If you knowlingly subscribe to an e-mail list, and it was clear what you would receive, such e-mails are not spam. As I note in Section D-2, “knowingly subscribe” can be an elastic meaning, since in many cases, it is not clear that you are “subscribing” to an e-mail or exactly which list you are subscribing to.
B-3 Unwanted
When one person says, “Stop spamming me!”, that often indicates that person does not understand what spam is (e.g., they haven’t read this essay). What they are really saying is, “I don’t want to receive your e-mails.” But one recipient not wanting to receive a particular e-mail does not make such an e-mail, or the series of e-mails that were sent, spam. For the e-mails to be spam, all or essentially all of the people have to feel the same way.The essential part of spam is that none or almost none of the recipients want to receive it, yet each recipient has to spend a few seconds or more dealing with it.
Assume that you send out 1 million e-mails, trying to sell the recipients brand new snow tires for $100. If 1 percent of the recipients (10,000) decides to purchase your snow tires, that would not be spam, since a meaningful number of the recipients find it worthwhile to read your e-mail and then purchase your tires. (I’m assuming that you’re selling good tires and that your e-mail is truthful.) The other 99 percent of the recipients did not find your offer worthwhile and had to spend time reading your e-mail, but that’s life. I would argue that the cost imposed on these other 99 percent is small enough to justify the benefit the 1 percent received by being able to purchase good snow tires at a fair price. So I would not call your e-mails spam.
At some point, the number of people who want to purchase your snow tires becomes so small that I would call it spam. At that point, the cost in time to the people who are not interested in purchasing your snow tires outweigh the benefits received by the few that want to purchase your snow tires. If only one out of a million people want to purchase your snow tires, most people would agree that your e-mails are spam.
For e-mail to be spam, such e-mails must be unwanted by all or almost all of its recipients. The percentage of people that must want to receive such e-mail must be very, very small. I can’t think of precise cut-off number — e.g., if 1 out of 1 million want it, that’s spam, but if 10 out of 1 million want, it’s not spam.
What if a set of e-mails was totally unsolicited but these e-mails turn out to be wanted by a meaningful number of recipients? Then they are not spam. Consider this example. Bill Gates announces he will take a small portion ($1 billion) of his fortune and will award $1000 to 1 million lucky recipients, each of whom will be notified by e-mail. All they have to do is to respond within 24 hours by e-mail, and their $1000 will be wired into their bank account the next day. Assume this offer is legitimate and the e-mail you receive was legitimate. Would such e-mails be spam? No. Such e-mails were sent in large numbers (1 million), they were unsolicited, and they were sent by automated means. But — assuming the recipients believed the e-mails were legitimate — I suspect that most of the recipients would welcome such an e-mail. If a series of e-mails is not unwanted by any meaningful percentage of the people receiving such e-mails, then such e-mails are by definition not spam.
My definition is essentially a cost-benefit analysis — what is the benefit to those who want to receive such e-mails compared with the cost imposed on the recipients who do not want to receive such e-mails. Thus, one should consider two additional factors.
First, how much do such e-mails bother those who do not want to receive such e-mails? Everything else being equal, most of us would be more bothered to receive solicitations for hard core pornography than solicitations for flowers. (Of course, some the readers of this essay may want to receive solicitations for hard core pornography, but that’s a different matter.) Frequency would also be an issue. A few times I have received 50 almost identifical e-mails (except for the subject line of the e-mails) from a spammer. This is more bothersome than receiving one spam.
Being bothered obviously runs on a spectrum. Assume I send you an e-mail trying to sell you a book. You read the e-mail, think about, seriously consider purchasing the book, and eventually decide not to purchase it. On a strict did they buy / did they not buy analysis, your choice would be in the latter category. But presumably you saw some possible benefit to receiving this e-mail, because you almost purchased the book. If that’s true of a meaningful number of people, that would make such e-mails ham.
Second, what is the benefit to those who want to receive such e-mails? Consider the following hypothetical. Assume I have a cure for a rare form of cancer — all one has to do is to eat 15 Argentinian carrots a day for the next year. Assume I send such e-mails to 1000 people. One of the recipients has such cancer, follows my advice, and his life is saved. The other 999 find such e-mail irrelevant to their lives — they don’t have such cancer, they do not expect to ever have it, and they do not know anyone who does. They delete such e-mails and mutter to themselves, “That bastard Mitchell is spamming me.???
Under those circumstances, am I a spammer? One of 1000 wanted to receive such an e-mail, a percentage much higher than what we normally think of as spam. But in addition to the higher percentage than is normal for spam, the one person who did want to receive such e-mail really wanted to receive it — presumably an e-mail that saves your life is more wanted than one advertising a chainsaw you end up purchasing. So I would argue that on any rational cost-benefit basis, the benefit gained by the one person outweighs the inconvenience of the 999 who did not want to receive such e-mail, and thus I would argue that such e-mails are not spam.
So how far does one go? What if it is 1 out of 10,000? 1 out of 100,000? What if I send an e-mail to every person in the world that has an e-mail account, and by doing so, I save the life of one person? Some would consider that worthwhile, some would not. However you analyze it, I would argue that the only correct analysis is cost-benefit analysis — what is the benefit to those who want to receive such e-mail compared with the aggravation suffered by those who did not?
Let’s turn to a more mundane example. What if I am selling goods rather than telling you how to save your life? What if the book changes your life, as opposed to a supermarket romance novel that entertains you for the evening? In that case, the threshold limit of the number of people who want to receive such e-mail should be lower.
If you’re selling goods, the kind of good would also matter. If I send out e-mails advertising expensive goods — say, a Rolls Royce — the number who need to purchase the item in order to the e-mails to not be spam would be lower, I would argue, that if I am selling pencils.
B-4 Automated Means
On first blush, one could argue that for a series of e-mail to be spam, they do not need to have been sent by automated means. If I send 1 million e-mails, typing in the 1 million e-mail addresses and pressing the send button on my e-mail package 1 million times, to those who have not solicited such e-mails, and such e-mails are unwanted by almost all of the recipients, are not such e-mails spam? They are, of course, but in the real world, no one would send 1 million e-mails manually. Because the spamming business depends on volume in order to make up for the fact that almost nobody wants to receive such e-mails, the only practical way to operate a spamming business is to send your spams through automated means.
C. Other Chacteristics of Spam
Note — Is “Other Characteristics??? the best name for this section? If not, what do you suggest?
“Bulk mailers” are companies that send a lot of e-mails. Presumably they do so via automated means — it simply is not practical to send 1 million e-mails manually. If a bulk mailer sends lots of e-mails that are unsolicited and unwanted by almost everyone who receives them, then they are spammers.
There are legitimate bulk mailers and illegitimate bulk mailers (e.g., spammers). It’s not black or white, it is a continnium, with many shades of gray. In order to properly classify someone who sends a large number of e-mails, one must consider many factors. In this essay, I will use “bulk mailers” to describe business that send large numers of e-mails legitimately, while “spammers??? is used to describe those who send large numbers of e-mails illegitimately. In addition to the four factors listed above that constitute my core definition of spam, there are numerous other factors one could consider in differentiating bulk mailers from spammers.
C-1 Commercial
Some people’s definition of spam is that the e-mails are commercial in nature — they are trying to sell you something or otherwise obtain some of your money. It certainly is true that almost all spam is commercial in nature. Sammers are businessmen, after all; they just happened to be illegitimate businessmen. But a set of e-mails can certainly be non-commercial in nature and still be spam, if they meet the criteria specified in Section B.
C-2 Disguised Identity
A bulk mailer will not conceal his identity. If Sears sends you an e-mail, you know it is from Sears. Spammers almost always hide their identity. First, merely by sending such e-mails, they might be violating certain laws, such as the CAN-SPAM Act of 2003. [3] Second, in many cases what they are advertising is fraudulent or illegal. ***
There are a few cases where disguising one’s identity would be ethical and legal — a battered woman who does not want her attacker to know her whereabouts, or a whistleblower who is disclosing improper conduct by his employer. In such cases, it would be very unusual for a large number of e-mails to be sent, so by that critieria alone, such e-mails would be ham. So if someone is disguising his identity while sending lots of e-mails, he almost certainly is not a legitimate bulk mailer. When Sears attempts to sell you snow tires, you know the e-mail came from Sears.
C-3 Disguised Identity II
[To be added.]
C-4 Misspelled Words
When a bulk mailer sends you an e-mail, it would be extremely rare for it to contain typos, since typos make a bad impression. Most spams, on the other hand, contain numerous typos, particularly for “sensitive” words that spam filters are looking for. When you receive a spam that spells Viagra as VIag*a, it’s not because the spammer does not know how to spell, but rather because he is trying to evade spam filters.
Spammers increasingly are sending e-mails where they attempt to disguise the content of the message. Not from you, but from anti-spam filters, in an attempt to slip through such filters. This is why you now see the strangest spelling of certains that one can imagine. It’s not that the spammers are illiterate (on the contrary, most are well educated), it’s that they are trying to slip through the spam filters that you may have installed on your computer and your ISP almost certainly has installed on his server. If Pfizer (the manufacturer of Viagra) sends you an e-mail about Viagra, I guarantee you they will spell the name of their product correctly, rather than V!agra. Spammers, on the other hand, would almost never spell “Viagra” properly since any e-mail containing Viagra will almost certainly set off warning bells in even the most simplistic spam filter.
C-5 Disguised Message
[More about disguised messages.]
C-6 Method of Transmission
The method of sending such e-mails is another factor. Spammers almost always use an illegitimate method of sending e-mails.
Until recently, most spammers would open a new account with an Internet Service Provider (“ISP”), send out a ton of e-mails, and then quickly close the account or wait for such account to be closed, once the ISP figures out what is going on. Many ISPs now have active monitoring software. If a new account is sending 500 e-mails a minute, many ISPs will close that account within a few minues or hours, unless prior arrangements have been made.
So spammers have turned to [****].
So if someone is using a supbertuge to send large number of e-mails, they almost certainly are a spammer. To use Sears again, Sears does not surepetiously plant software on your computer as a platform to use your computer to send e-mails to other people.
C-7 Types of Products or Services Offered
Some would classify e-mail as spam based on what is being sold. If one looks at spam, it’s remarkable how limited the choice of products or services that are typically offered:
- Pornography
- Products to enlarge certain body parts
- Credit repair
- Mortgage refinancings
- Viagra and other sexual enhancements
- How to obtain a grant
- Printer toner cartridges
- Weight loss
- Quick loans
- HGH therapy
- Debt relief / enhancing your credit rating / credit counseling
- Beautiful women who want to sleep with me sight unseen. (It is so disappointing when one learns this is a scam.)
- Rolex watches
- Penny stocks whose price will triple in the next week
- College and graduate degrees I can receive without attend any classes.
This by itself does not make a set of e-mails spam. Assume, for example, that 1000 people attend a conference on Viagra and each one asks (“opts-in”) to receive e-mails on this drug. Assume that I assemble such a list, and by mistake one additional e-mail address — yours — slips in. You have no interest in receiving e-mails about Viagra. (You have no problems in that department). So I send 1001 e-mails. You’re receiving an unsolicited, unwanted e-mail about a product that spammers regularly try to sell. Surely such e-mails are spam. Sorry, you couldn’t be more wrong, because every other recipient wants to receive such e-mails. Remember that in deciding whether an e-mail is spam, you have to look at the entire seires of e-mails that were sent, not just one e-mail. So the kind of product or service being offered can often be correlated with spam, but by itself that does not make a series of e-mails spam.
C-8 Legitimacy of the Product
C-9 Legality of the Product
[To be added.]
C-10 Illegal Conduct
[To be added.]
C-11 Illegitimate URLs
[To be added.]
C-12 Frequency of Emails
Another factor is how often will the organization contact me? I purchase a lot of software, often on-line. At the bottom of the screen is a box (usually checked “Yes” by default) asking me if I want to receive e-mails from the software publisher.
At one end of the continnium, they only contact me when (i) there is a patch or bug I presumably want to know about or (ii) there is a new version. Presumably if I am running their software, I want to know this. (Unless their software automatically updates itself automatically, which I usually prefer.) New versions do not come out that often and I certainly want to know about bug fixes. So presumably every time I receive one of these e-mails, I’m interested.
Moving along the continnium, this publisher may send me e-mails about other products of theirs. I’ve purchased Dreamweaver from Adobe, and I probably want to know about their other products. This is OK as along as I can opt out of these and only receive the bug fixes and announcements of new versions of the product I purchased.
At the other end of the continnium, I receive several times a week e-mails about every gory detail of the product I purchased, other products sold by that company, and what the CEO’s dog ate for breakfast that morning. I purchased an utility called Diskkeeper, which defragments my hard disk in the background. At least once a week, Executive Software (the publisher of Diskkeeper) was telling me about stuff I had no interest in. So I unsubscribed from their entire mailing list, figuring I’ll just look at their Web site every 6 months or so to check on patches and upgrades. (Of course, I never remember to do so, but that’s another matter.)
The desired frequency depends in part ***.
?
?
?
?
?
?
?
?
?
?
D. Opting In and Opting Out
D-1 Opting-in
Everyone agrees that e-mail must be unsolicited to be called spam. If you request that company A sends you e-mails about snow tires, and they do so, they are not spamming you.
The legitimate organizations attempting to deal with spam fall into two camps. On the one hand are those who are concerned solely with the recipients of e-mails. The second are trade organizations who represent bulk mailers. Interestingly, the trade associations are as concerned with reducing spam as you are. The more spam there is, and the more outrageous the conduct of spammers, the more difficult it is to run a legitimate bulk mailing business, for several reasons:
- The more spam that is sent, the stronger the anti-spam filters will be, and the tougher ISPs will be, even on bulk mailers.
- If the majority of the e-mail you received attempting to sell you products or services was from legitimate bulk mailers rather than spammers, you would be much more likely to read such e-mail, rather than reflexively reaching for the Delete key.
- The more spam there is, the more draconian will be the laws passed by U.S. Congress and the states, laws which the legitimate bulk mailers will abide by and which the spammers will not.
The first camp representing the recipients of e-mail argues that recipients should be required to “opt in“ in order to receive e-mails — i.e., they must proactively request (through some means) to be placed on your list. Simply by sending unsolicited e-mails, you are a spammer, in their eyes.
The trade associations representing legitimate bulk mailers (as far as I know, there is no trade association for spammers) argue for a lower standard: “opt out.” Basically, they are permitted to send you unsolicited e-mails provided they have a legitimate (and presumably not too difficult) method to opt out. (When I say “legitimate,” I mean that when you ask to be taken off their list, they actually comply, rather than saying, “I now have a confirmed e-mail address which I can send more spam to, as well as sell to another spammer.”)
Needless to say, these standards are very different. Very few people opt in (legimatimely opt-in; see below) to many e-mail lists. Those that do presumably want to receive e-mails. These people are the “gold standard” from a bulk mailers’ point of view — someone who has asked to be placed on a pet supply mailing list is much more likely to purchase pet supplies than someone who has not asked to receive such e-mails. As much as bulk mailers would love to have only those who have opted in on their mailing list, the economic reality is that there are not enough of such people. Most bulk mailers would be out of business if they could only e-mail to those who opted in.
Both camps have dug in for a political fight that will last decades. Both sides have clever marketing and slogans. (It’s almost like the abortion debate, whether both sides have chosen labels that are hard to be against. How can someone not be pro-life? Life is good. On the other hand, how can someone be anti-choice? Choice is good.) The trade associations — surprise surprise — have a lot of money and have hired lobbyists who are well connected. The other side has the emotional advantage — is there anyone who favors anything short of the death penalty for spammers? [4]
Notice that the bulk mailers do want some regulation. If you run a legitimate business, you are usually in favor of some government regulation. Why? Because otherwise some of your competitors will engage in certain practices that you do not condome which reduce their cost of business and they will have an economic advantage over you. If you’re Wal-Mart (which provides lousy health benefits to its employees), you are opposed to legislation mandating better benefits. If you’re General Motors (which provides great health benefits to its employees), you might be in favor of such legislation.
So where do I stand on opt in and opt out? All of us are formed by our experiences. Through my parties, I have learned that people are really, really lazy (see Section __). As I note in Section __, about ___ of my list has opted-in about __ percent of my list has not. Many people in the latter group have become core members of my social group and I would shudder to think I would not have been able to invite them to my parties.
I have also learned that sending one initial e-mail, asking people if they want to be placed on a list, does not work very well. About 7 percent will write back. Of those that do, about 60 percent said they were interested and 40 percent said they were not. What about the remainer? [***]
So I side with the trade associations on this issue. As long as there is a legitimate and easy way to opt out, I think bulk mailers should be permitted to send unsolicited e-mail to those who have not opted out.
D-2 Disguised Opt-in
There are different forms of opting in. A key “walk away” of this essay is that many of these issues are not black or white, but rather run along a continnium. That is certainly the case with opt in. Some examples:
- I go on-line to a Web site and have to fill out a zillion questions in order to placed on a mailing list. This is the case with certain “controlled circula¬tion” magazines. “Controlled circulation” means there is no cost to subscribe to the magazine, they are supported by their advertisers. For example, I subscribe to eWeek and InfoWorld, two excellent computer industry publications. In both cases, I have to answer 50 or so questions before I can be placed on the mailing list. By the time I do so, I think it is clear that I know I am signing up to receive their publications, and thus I know I am opting-in to receive their publication. (In their case, they are doing this in part to demonstrate to their advertisers that all of their subscribers want to receive their magazine, and in part to gather data on how many millons of dollars I year I spend on, say, 8-processer rack mounted servers. Some advice to those who want to subscribe to such a publication: If you’re simply interested in the industry and don’t spend a dime buying products in that industry, they may not put you on their circulation list, since their advertisers will have little interest in you.)
- Out of the blue, I send Sears an e-mail, “Put me on your mailing list for dishwashers. ” (To be candid, I have never done this.)
- I visit the Sears Web site and proactively check a box to be added to the Sears mailing list — i.e., the default is “No” and I have to choose “Yes” to be placed on their list.
- I visit the Web site, fill out the form, and the default is “Yes??? to be placed on their list — i.e, if I do not want to placed on their e-mail list, I have to change the default choice.
- Same as above, but the type indicating I will be placed on a list is small and is a similar color to the background color, making the text hard to read.
- Same as above, but in really small type, they let me know that I am authorizing them to provide my e-mail address to their “affiliates” (which could mean anyone they trade e-mail addresses with).
As you can see, opt-in is not black or white. I would argue that the degree to which one is legitimately opting-in is how aware a person of average intelligence is that he will receive the e-mails he is about to receive. Thus, he needs to know not only that he is signing up but what he is signing up for. In example one, I clearly want to receive eWeek. In example 0, do I really understand that I am agreeing to receive e-mails from any “affiliate” of Sears?
D-3 Difficulty of Opting Out
The CAN-SPAM Act of 2006 requires that bulk mailers provide a method of opting out. [MORE[
A legitimate bulk mailer follows the law and always provides a method of opting out. When you request to opt out, he actually takes you off his list. Spammers, on the other hand, usually do not follow the law and do not provide a real method of option. Notice I said real, for many spammers do not really offer opt-out. Instead, if you request to opt out, all you are doing is confirming that your e-mail address is valid, which means the spammer can make more money selling your address to another spammer, since confirmed e-mail addresses are much more valuable.
How difficult it is to opt out runs a continnium. The more difficult it is to opt out (assuming you really are opting out, rather than confirming your e-mail address for the spammer), the more the company sending you the e-mail becomes a spammer. Some opt-out instructions are clear and easy. The type is legible and all you have to do is click on one linke, or put “unsubscribe” in the subject line and send an e-mail to a certain address. At the other extreme are instructions where you have to go to a certain Web site, which is often down, enter various user names and passwords, and follow a complex series of instructions. One can safely assume that that in almost all cases, the complexity inherent in such a process is intentional.
Time frame could be a factor, although so far not in any practical sense. I have received unsolicited e-mails where I followed the opt-out instructions, and they tell me it might take as long as 5 days to go into effect. Giving we’re talking about an instanteous updating of a database, why 5 days? The only legitimate explanation I can think of is that that organization will notify other organizations — perhaps the one that sold it your e-mail address — and for the second organization to remove your e-mail address will take some time. Even though, I would argue that systems should be established before the e-mails are sent, so that your request is immediately transmitted electronically to the second organiza¬tion. Five minutes is OK in my book; five days is not.
D-4 Should You Unsubscribe?
Unsubscribing is a two-edge sword. The Can Spam Act requires most bulk mailers to offer a method to opt-out. Of course, “unsubscribe” is in most case an inaccurate term, since in most cases you never subscribed in the first place. Or if you did, it was because it wasn’t clear that you were agreeing to receive commercial e-mails from “affiliates??? of the bulk mailers. But unsubscribe is the term that is used, so I’ll use it.
The problem is that many spammers not only do not unsubscribe your e-mail address from future mailings, but by attempting to unsubscribe you confirm to the spammer that yours is a legitimate e-mail address. In ____ ***
Should you use the unsubscribe method listed (usually) at the bottom of the e-mail? It depends on whether the sender is a bulk mailer or a spammer. If Ford Motor Company sends you an e-mail, the odds are extremely high that if you unsubscribe you will no longer receive e-mails from Ford, and that Ford will not sell your confirmed e-mail address to someone else. So for anyone that appears to be a bulk mailer rather than a spammer, I would follow the unsubscribe instructions.
One cavaet: Some spammers send deceptive e-mails that purport to be from a legitimate bulk mailer, but in fact are not. In many cases, the return e-mail address or the URL in the e-mail is very similar to a legitimate company. Bank of America, for example, owns the bankofamerica.com domain name. Assume you receive an e-mail asking you to go to www.bankofamerica.co.uk. You might quickly glance at the domain name, notice bankofamerica.co, and assume that’s all one needs to know. The .uk at the end of the domain name, however, means that that domain name must be owned by Bank of America. It may be, but Bank of America may not have registered all variations of bankofamerica.com. [REWORD]
At the .uk URL, it’s possible there is a site that looks very similar to the real Bank of America site, particularly since one can copy another site by just downloading the .htm and .jpg files from their Web site. You enter your login information, since it doesn’t know what your login information is it just accepts everyone, and then this legitimate looking Web site is asking you to reenter certain financial information, such as social security number and bank account information.
E. Reducing the Amount of Spam You Read
There is a difference between receiving spam and reading spam. If you’re like most people with an active e-mail address, spammers will find your address (sometimes using brute force attacks) and send you spam. You can reduce the amount of spam that is sent to you, and you can also reduce the amount of spam you actually read. In addition to these recommendations, I recommend that almost everyone install a spam filter, which are covered in Section F.
E-1 Throw Away E-mail Addresses
[To be added.]
E-2 Complaining
I know what you are thinking — sure, complaining might benefit society in the long run. If enough people complain to the spammers’ ISP, then the ISP might shut down the spammers’ e-mail account (assuming the ISP is a legitimate ISP). But how will it benefit me, in the short run?
Actually, ….
E-3 E-mail Clients
[To be added.]
F. Spam Filters
A spam filter is a piece of software that reads your incoming e-mail and then guesses whether the incoming e-mail is ham or spam. Almost everyone should install a spam filter, at the ___, ___ and/or client level. I say “almost everyone??? because there are a few people who have a very unobvious e-mail address, who never provide their e-mail address to commercial enterprises, and who receive very few e-mails overall. For these people, it’s possible the number of spams they will receive in a year are so small that it’s not a problem.
F-1 Approaches
[To be added.]
F-2 False Negatives and False Positives
No spam filter is perfect, even the best ones. Spam filters can make two kinds of mistakes:
- False negatives occur when a spam filter fails to categorize an e-mail as spam even though it is.
- False positives occur when a spam filter incorrectly flags an e-mail as spam even though it is not.
False positives are what one should worry about, rather than false positives. If you install a good Bayesian filter, after you train it, more than 99 percent of spams will be correctly identified as spam — so few spams will slip by the filter that it is not worth worrying about the false negatives. It’s when a spam filter incorrectly decides/guesses a valid e-mail (ham) is spam that one has a problem. One should seek a spam filter with an extremely low — less than 0.1 percent, or less than 1 out of 1000 — false positive rate, even at the expense of a more false negatives.
F-3 Actions Taken by Spam Filters
Once a spam filter decides an e-mail is likely to be spam, it typically does one of three things:
- _
- _
- _
F-4 Rules-Based Filters
[To be added.]
F-5 Bayesian Filters
Thomas Bayes was a famous statistician in the 18th century who developed a formula to determine the probability of an event occuring based on the probabilities of two or more independent evidentiary events. Bayesian filters are based on this theory. They have be trained by the user as to which e-mails are ham and which are spam. During training, they extract “tokens” (separate words) and store them in a database. When an incoming e-mail is analyed, the message is split into tokens and each token is given a value according to the following criteria:
- The number of tokens in the message that are also contained in the sample of ham provided to the Bayesian filter. These are called “good words. ”
- The number of tokens in the message that are also contained in the sample of spam provided to the Bayesian filter. These are called “bad words. ”
One fundamental aspect of Bayesian filters is that everyone has their own definition of spam. For most of us, e-mails trying to sell us Viagra are spam. But if you use Viagra — and if the Viagra being sold is really Viagra (it rarely is) — you might want to receive such e-mails. Since such e-mails are wanted as far as you are concerned, for you they are not spam and you would not want your spam filter to filter out such e-mails.
Most of us do not want to receive e-mails offering low interest mortgage refinancings. By training a Bayesian filter, we would telling it that e-mails with “low interest,” “mortgage,”??? “refinancings,” and other such words are likely to be spam. Unless such e-mails had a meaningful number of good words, the spam filter would flag such an e-mail as spam. But what if you’re a loan officer at a bank specializing in residential home mortgages. For you, ham (good e-mail) would often contain such words. The last thing you would want your spam filter to do is screen out such e-mails; otherwise, you would quickly go out of business.
Good words are particularly important to spam filters, and the list is different for each of us. I, for example, run a social group (Boston Convivium) that gives cocktail parties at venus as as the Four Seasons Hotel, Bar 10, 33 restaurant and Great Bay. We have a dress code. I write a lot of essays, some of which have discussed Microsoft software, romance and dating, and ____. . I also do leveraged buyouts, which are a form of mergers and acquisitions. Now that I have trained my Bayesian spam filter (InBoxer), it looks for such words as:
- large cocktail parties
- dress code
- Microsoft
- romance
- dating
- leveraged buyouts
- mergers.
For me, those are good words (tokens). If a sufficient number of such good words appear in an e-mail, InBoxer is very likely to identify an e-mail as ham (unless there are also a sufficient number of bad words, according to my definition).
Different writers of Bayesian filters take different approaches. Graham, for example, is so worried about false positives that he doubles *** ____.
F-6 Evolution of Bayesian Filters
Spam filtering is a subset of text classification, which is a well established field. The first papers about Bayesian spam filtering per se were two papers given at the same conference in 1998, both of which attracted little notice. In August, 2002, Graham published ****“A Plan for Spam,??? which has led to an orgy of discussion, analysis, refinement, and research. Graham’s paper ignited the current wave of thinking for three reasons. [5] The first is his (deservedly) awesome reputation in the computer science field — when a famous person says that same thing as an unfamous person, the comment by the famous person is usually more noticed. (Graham is one of the most influential thinkers in computer science and entrepreneurship.) Second, Graham is not only brilliant but is also a very good writer. Let’s face it — most of us would rather read a tightly written, humor¬ous essay explaining a difficult concept than a dry academic paper. Third, Graham’s approach has significant lower error rates (see Section F-7), error rates low enough — particularly for false positives — that the approach became workable.
Five months later, Graham SpamBayes project, which consists of both basic research into Bayesian filtering alorithms as well as open source software that runs on a variety of platforms. The SpamBayes approach has a middle category, Unsure (some call this review), which probably is essential if you want to have an extremely low false positive error rate. I’ll avoid a discussion of their refinements and simply say in choosing a Bayesian filter, I would look for a filter based on the SpamBayes approach. [6]
F-7 Error Rates
One of the reasons Graham’s approach has attracted so much attention is that he has remarkably low error rates. 99.5 percent of spam was caught (0.5 percent false negatives — i.e., one out of 200 spams slipped through the filter. More important, only 0.03 percent — 3 out of 10,000 — were false positives (good e-mails being incorrectly identified as spam). [7] Of the two papers presented at the 1998 conference (see Section F-6), the more accurate of the two caught only 92 percent of spam and had a false positive rate of 1.16 percent, which most people agree is too high. One of the most important reasons Graham’s paper received such attention was that his error rates were so low that his Bayesian filter was practical for day-to-day e-mail filtering.
I’ve been using InBoxer since September 2006. It has three classifications — Ham, Review and Spam. “Review” is a middle category where InBoxer suspects the e-mail might be spam, but the probability is not high enough to definitely classify it as such, so it is placed in the Review folder. So far, not only has my Spam folder contained no false positives, but only a few of the e-mails in the Review folder are ham either.
F-8 Training
A Bayesian filter must be trained by each user. The easiest and fastest way to do this would be if before you installed the filter, you had two folders, one consisting of a large number of hams and the other consisting of a large number of spams. You tell the filter than the e-mails in Folder A are ham and the e-mails in Folder B are spam. Assuming you had a large number of e-mails in both folders (e.g., 1000 in each folder), in a few minutes your newly installed filter will assemble a database of good and bad tokens. If the filter is a good one, it will from the start provide an extremely accurate appraisal of new e-mails, with very little spam slipping through (false negatives) and more important, an astonishing small number of false positives.
F-9 Tricking Bayesian Filters
Since Graham’s essay was published in 2002, substantial refinements have been made in Bayesian filters and now a meaningful number of people have installed such filters. Just as the HIV virus is remarkably adaptible in mutating into ever more potent strains (often in reaction to HIV drugs), spammers are also highly adaptible. Many spams are written primarily from the point of view of circumventing Bayesian filters.
Graham argues that so far, the Bayesian filters are winning and that the spammers have not figured out how to circumvent the better Bayesian filters. To circumvent such a filter, the spammer either must add good tokens or reduce the number of bad tokens in the e-mail he sends.
The problem with inserting more good tokens is that they are individually determined and it’s not economically possible for a spammer to know my good token list and then write an e-mail just for me. As noted in Section F-5, e-mails sent to me containing the words “large cocktail parties,”??? “Microsoft” and “leveraged buyouts” have a very high probability of being ham. That may not be the case with you. A spammer cannot know the individual preferences of the 10 million people he spams.
If the spammer chooses words at random, such words are probably just as likely to occur in spams as in ham, as Graham argues: [8]
*** Check this quote
Appending chunks of articles or books doesn’t seem to work any better, at least in the cases I’ve seen so far. The appended text doesn’t look like spm, but it doesn’t look much like the email I get either, so it tends not to have any effect, statistically.
So that leaves the spammer with the choice of using fewer bad tokens. The spammer can try to conceal the bad tokens, or rewrite the e-mail to use less spammy language. So far, trying to conceal bad tokens is a complete failure. Al the tricks I’ve seen so far make the spams easier to catch, not harder. These include misspellings (V1agra), breaking up words with spaces (S E X), sending the spam as an image instead of text, and sending a Javascript program that generates the spam.
So that leaves rewriting the spam in less spammy language.
But that takes a lot of work. It may not even be possible for some spams. How do you rewrite a mortgage spam with using terms like “refinance: ([probably of being spam is] .9612), “lenders” (.9862), or “mortgage” (.9995). And remember, whatever euphemisms you use, they have to be differen from the ones used by every mortgage spammer before you. Surely at this point it would be less work for the spammer to switch to some more legitimate business. ***
And of course, spams won’t work so well if they have to be rewritten in more neutral language. People who respond to spams are presumably pretty dull-witted, and have to be hit over the head with a lot of capital letters and exclamation points to get them to do anything.
Graham argues that in the spam of the future “the sales pitch is pushed one step back. Instead of being contained in the email itself, as in an ordinary spam, it is waiting a click away on a Web site.”??? ****
F-10 Recommendations
When I started to write this essay (Summer of 2006), I assumed that there are several Web sites that provide objective reviews of spam filters, and that such reviews would be based on cutting edge approaches (i.e., focusing on Bayesian filters rather than rules based filters). Boy, was I wrong. Most of the review focused on rules based filters. Few Web sites had what appeared to be objective reviews, carefully com¬paring the strengths and weaknesses of various products. Even the computer magazines that ordinarily write excelent reviews (e.g., PC Magazine) had mediocre reviews of spam filters.
The only site that seemed somewhat decent was www.WhichSpamFilter.com. The reviews are fairly detailed and technical and I did not detect any bias towards or against any particular product. Since I use Microsoft Outlook as my e-mail client, I was only interested in Outlook plug-ins. They gave their highest rating (5 stars) to three products, all of which were Outlook plug-ins:
- Qurb is a whitelist filter that only lets through e-mails sent from e-mails addresses that are on your white list. Anything else is quarantined and you can look at the quarantined e-mails at your leisure to let Qurb know which e-mail addresses are legitimate. Qurb can also be sent into challenge/responde mode.
For me, this approach simply would not work. Each month, I receive more than 1000 e-mails from e-mails addresses that have never sent me e-mails before. If you’re a different type of e-mail user, and do not receive that many e-mails from new e-mail addresses, Qurb could make sense for you.
- SpamBully, a Bayesian filter that appears to be quite good. I choose InBoxer but you want want to compare SpamBully and InBoxer before you decide on one of the two.
- InBoxer, which is discussed immediately below in Section F-11.
F-11 InBoxer
The company also sells an Anti-risk appliance, which is a completely different product and would not be of interest to you unless you are a reasonably large company.
I choose InBoxer. It is a Bayesian filter that is based on the SpamBayes project, which is currently the most advanced research project in Bayesian filtering. Its most compelling feature is that InBoxer has three categories:
- Good, which is simply kept in your Outlook inbox.
- Bad, which is moved to a folder titled InBoxer“ Blocked.
- Review, which is moved to a folder titled InBoxer — Review.
Like all Bayesian filters, if you want high rate of accuracy (both for false positives and false negatives), you must train the filter from scratch, rather than using a pre-defined database of good and bad tokens. Training InBoxer took me over a month. (I obviously did not spend full-time on this; it was probably ten minutes a day.) During that time, a substantial number of spams were not filtered and thus went into my Outlook inbox. If I were to do it over again, I would have accumulated several thousand spams I had previously received and placed them in a folder that InBoxer could have analyed, as well as several thousand hams to be stored in a different folder. If I had done this, InBoxer would have been extremely effective on the first day.
I did not do this, so I ended up repeatedly training InBoxer, flagging various e-mails as ham or spam. Within a month, InBoxer became highly functional. If you define a false positive as a ham being placed in the InBoxer“ Blocked folder, my false positive rate is zero (that is not a typo); InBoxer simply has not done this. Once a day, I check my Review folder. Almost all of these e-mails are spam, but there are a few hams that are placed there. I simply flag these as Good, and InBoxer learns over time. (Flagging is quite easy, you simply check one of two boxes at the top of Outlook.)
After a month, only one type of spam evades InBoxer on a regular basis. These are e-mails claiming that a certain stock will triple in price the next week. These consistently slip through, even though I am religious about flagging all of these as spam. Most of these spams contain graphic images rather than text and it appears that InBoxer is unable to analyze graphic images.
Overall, I am quite happy with InBoxer. You can try the product for 21 days for free. During that time, you’re using the actual product rather than a crippled trial version. After 21 days, InBoxer will cost you $30. If you also want InBoxer for your Blackberry or Treo mobile phone, the combined price is $50. (That’s the option I chose, since I don’t want spam on my Treo.) It is unclear what upgrades will cost when they release a major upgrade (minor upgrades are usually free from most software publishers). For me, telephone support is essential, since when I have technical support questions, they tend to involve more complex issues. They offer both e-mail and telephone support, neither of which I have used, but my experience is that small software companies such as this one almost always have good telephone support (unlike, say, Microsoft, where in most cases you have to have your call escalated to a Level II engineer to find someone who knows their stuff).
InBoxer works with just Outlook and with Outlook and Exchange Server (a Microsoft e-mail product that is server based and is likely to be used by companies). It does not work with Outlook Express (a completely different e-mail package published by Microsoft, despite its similar name) nor does it work with Web-based e-mail sites such as gmail or hotmail.
Bottom line — If you’re using Outlook as your e-mail package (which I suggest you do), you’re willing to spend the time necessary to train a filter, and spam is an issue for you, I would either (i) purchase InBoxer or (ii) compare InBoxer and SpamBully and choose one of them. If you’re not using Outlook, I don’t have any recommendations for you, as I simply was not interested in exploring that segment of the market.
As noted above, for any Bayesian filter, I would recommend that collect a large sample of hams and spams into two folders, so that when you install the filter, you can instruct it to immediately analyze these two folders and then be immediately effective. If you use a Bayesian filter, I would recommend that you turn off any Junk mail filtering provided by Outlook and let your Bayesian filter perform all of the spam filtering.
G. Other Methods of Combating Spam
G-1 Telephone Solicitors
I used to receive a lot of calls from telephone solicitors. It took me a while to figure out how to stop this, and I still wonder why it took me so long. Being someone who thinks he knows something about business, I thought about the business of running a telephone solicitation business. Basically, it’s a numbers game. You hire lots of people (paying them very little, and perhaps paying them solely on commission), make a ton of telephone calls, and hope that a few people are interested in whatever you are selling. It occurred to me that these business operate with low margins — not necessarily profit margins (which I don’t know), but rather a small percentage of the people they approach are interested.
They can afford this because the costs of making the phone calls are very low — telephone rates are dropping every day (with IP telephony, an even lower cost threshold will be reached) and they aren’t paying their people very much. Even so, they do have some costs, and it occurred to me that the way to fight back was to increase their costs, to make them high enough that they would not want to call me again.
So I thought about how to make the telephone call more expensive, both in terms of the cost they paid to their telephone company and (much more important) the cost they pay to their employees. I realized that if I could make the telephone call last much longer than expected, along with no sale, I was on the right path.
So when a telephone solicitor calls, I do not hang up. Instead, I quickly say some¬thing like, “Wow, that sounds interesting. Hold on, I want to turn down the television. I’ll be right back.??? And I put the phone down, and go back to what I was doing doing.I wait a few minutes, pick it up, and say, “So sorry, I’ll be right back.” Then I put the phone down. And I repeate this process of picking up the phone every few minutes and telling them I’ll be right back.
When I am bored, I’ll pay some attention to how long I can get them to wait before they hang up. My all-time record is 22 minutes, with my only getting back on the phone three times.
Think about the economics for the telephone solicitor. On average, I get the telephone solicitor to wait about 10 to 15 minutes before he gives up. (I find the more enthusiam I initially express, the longer they will wait.) If everyone did that, they would only be able to make 4 to 6 calls an hour. Given that very few telephone calls lead to a sale, the economics simply don’t work.
So what does the telephone solicitator do? He marks me in his database as “Do not ever call this *%$&* guy again.”
G-2 Filters That Fight Back
[To be added.]
H. One Proposal
Why is spam more bothersome than junk mail? There’s simply a lot more of it, and the reason for that is that it costs almost nothing to send spam, while it costs money to send junk mail. Because it costs money to send junk mail, you don’t receive that much of it (as compared with how much spam you receive) and those who send junk mail would not attempt, for example, to sell snow tires to Florida residents. They are more careful about who they send junk mail to, because each junk mail piece they mail costs them money. Since sending spam costs almost nothing, spammers are not concerned about sending completely unwanted solicitations. If you send a million spam e-mails which lead to ten sets of snow tires being sold, you’ve made a profit.
With both junk mail and spam, the recipient pays a small cost in the time he must spend in throwing the junk mail away and in deleting the spam. The crucial difference between junk mail and spam is that junk mail costs the sender some money while sending spam is basically free for the spammer.
If you’re a spammer, since almost no one wants your e-mails, you must send millions of them to get any response at all. The other defining characteristic of spammers is that they almost always operate by opening up a new e-mail account, sending millions of e-mail messages (or more), and then closing down the account or having it closed down for them by their ISP. In short, their e-mail accounts stay open for a very short period of time.
These two characteristics of spammers — that they must send millions of e-mail and they do so from new e-mail accounts that stay open for a short period of time — combine in a nice way to permit a simple, technological solution to spam. Simply put, the idea is to impose a certain cost for sending spam that would it make it uneconomical to send spam unless a meaningful number of the recipients wanted to receive it.
The cost could be monetary. One could simply impose a tax of, say, 1/100 of one cent ($.0001) for every e-mail sent. This would be politically infeasible, however, and could impose problems for low-budget non-profit foundations and for poor people.
Microsoft Research (“MR”) has proposal a solution that involves imposing a cost in computer time. Within one year every ISP would be required to install certain software on their e-mails services. (These algorithms have been written by MR, which would put them into the public domain.) The first time any e-mail account sends a message to a new recipient, it would have to create an unique token number, which it would do so by running an algorithm on the sender’s computer, running in the background. The first such running of such algorithm during a 24 hour period would take about 30 seconds of computer time, the second would take 45 seconds, the third would take 60 seconds, etc.
This algorithm would need to be run only the first time an e-mail account sent an e-mail to a certain recipient. After that, every time you sent an e-mail to that recipient, you would not need to run this algorithm. Assume this policy has just been implemented. You send 100 e-mails the first day to 100 different recipients. Since this was the first time your e-mail account had sent an e-mail to each of these recipients, this algorithm would have to be run 100 different times, in the background, for a total of 77,250 seconds, which is about 21.5 hours. Thus, during the first 24 hours you would be able to send your 100 e-mails, with this program running in the background. But you couldn’t send 1,000 e-mails.
The second day, you send another 100 e-mails. But 75 of these are to people you’ve contacted during the first day, and only 25 people are new. So your algorithm would be running in the background for 1.45 hours. Within a month, you’re probably not writing to more than 10 new people a day, so you’re not burning up much computer time.
Now look at it from the spammer’s point of view. He’s sending e-mails to millions of recipients. Each new recipient costs him an ever increasing amount of computer time. And he can’t say, “Well, I’ll pay this price once for each recipient and then I’m home free” because his e-mail account won’t stay open that long. ISPs don’t want to have millions of messages sent through their e-mail servers, or to have their IP addresses blocked by anti-spam packages, so they’ll quickly close his account. So he has to open a new account and start over.
There’s simply no way for the spammer to afford to do this; he would have to purchase tens of thousands of computers. He could afford to do this only if a meaningful number of his recipients wanted to receive his e-mails and purchase his products. By then he wouldn’t be a spammer, because my definition of a spammer is someone that sends lots of e-mails that no one wants to receive.
If the spammer is able to find a product that a meaningful number of people want to buy, then he can afford to stay in business. One way he could stay in business would be to be careful about who he sends e-mails to. If you’re a spammer and you purchase the e-mail addresses of every person who subscribes to Samoyed Magazine and you send an e-mail trying to sell a book on Samoyeds to such subscribers, it might make economic sense to purchase the computer power to do so, because there might be enough readers of Samoyed Magazine that want to purchase a book on Samoyeds. If so, that’s OK with me, because then it is no longer spam, since a meaningful number of people want to buy your book.
MR’s proposal would not cripple legitimate commercial enterprises. If Sears wants to send an e-mail to its customers, that’s OK. Sears operates from the same e-mail address, unlike spammers, whom you usually can’t find. Presumably a meaningful number of Sears’ customers want to receive e-mails from Sears. And Sears’ customers would be able to easily opt out of receiving such e-mails.
This proposal would not eliminate all spam, but would so radically reduce the number of spam e-mails you received that spam would no longer be a problem. This proposal does not suffer from the very difficult problem of defining exactly what is spam, since it applies to every e-mail that an e-mail account sends for the first time. And it permits those who send e-mails that a meaningful number of people want to receive stay in business. It harms only those who sends millions of unwanted e-mails from newly created e-mail accounts, i.e., the classic spammer.
(This is almost entirely the idea of five researchers at Microsoft Research. I have added the discussion about what is spam, as well as increasing by 15 seconds each additional e-mail. I have tweaked it a little. See also “How to Stop Junk E-Mail: Charge for the Stamp??? by Randall Stross, The New York Times, February 13, 2005, Business section, p. 5 and Spam is Different” by Paul Graham.
I. My Parties
I-1 Human Behavior
[To be added.]
I-2 How I Use Laziness to My Advantage
[To be added.]
I-3 How I Obtain Names
[To be added.]
I-4 Opt-In
[To be added.]
J. Further Reading
J-1 Paul Graham
[To be added.]
J-2 Wikipedia
[To be added.]
J-3 Ending Spam
[To be added.]
J-4 CAN-SPAM Act of 2003
[To be added.]
1. “I shall not today attempt further to define the kinds of material I under¬stand to be embraced within that shorthand description [hard-core pornography]; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that. ” Jacobellis v. Ohio, 378 U.S. 184 (1964).
2. “A Plan for Spam” by Paul Graham, August 2002. www.PaulGraham.com/spam.html.
3. “Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003.”
4. It would be interesting to see where the ISPs and other Web hosting companies are on this issue. They fall into at least two camps. On the one hand are those who basically receive e-mail. AOL would be the ISP that most falls into this category, as well as the major Web-based e-mails services: Hotmail, gmail, Yahoo, Excite. Since such e-mail imposes a substantial cost on them, presumably they would be in favor of fairly tough laws. The second group would be IPS that both receive receive a lot of e-mail but also sell accounts to send a lot of e-mails. Presumably they would wants less restrictive laws. (There is a third category — IPSs which are primarily used by spammers. This is usually a short-lived business, as the larger ISP they connect to (the Internet is simply a network of networks, with ISP connected to another large ISPS, up to the dozen or so major backbone providers) cuts them off as the complaints mount.
5. “A Plan for Spam” by Paul Graham, August 2002. www.PaulGraham.com/spam.html.
6. The SpamBayes software is published under an open source license that allows other to modify and enhance such software and published such code as a software package.
7. “Better Bayesian Filtering” by Paul Graham. January 2003. www.PaulGraham.com/better.html. The 0.03 percent false positive statisitc is actually false — Graham has never had a false positive. He took the next legitimate e-mail — as scored by his spam filter — and flagged it as a false positive in order to compute a false positive rate.
8. Reference is needed.