Random Thoughs

Some people call this kind of page : a blog. Well, I don't know. This is only a place where, from time to time I write something.

Email Marketing isn't spam ! Really??? LOL.

CAPTCHAs as anti-spam : why it's a bad idea !

There are some anti-spam techniques which are really bad. This is one of them. The reason I'm talking about is that this method is really harmful. Well, some people call captchas “Turing Test”.

Take a look at the wikipedia page, which explains the general idea of CAPTCHAs (not only to spam filtering), and a note from w3.org - Inaccessibility of CAPTCHA - Alternatives to Visual Turing Tests on the Web, which presents the official position of w3.org.

How it works : if you send a message to someone, for the first time, some automated system will answer you with message back pointing you to a web page, where a CAPTCHA will be presented. You'll be asked to solve the CAPTCHA in order to prove that you're an human and you are the one who sent the message. If you do it, the message will be freed and sent to the recipient, and your email address will be added to a whitelist. So future messages will pass without confirmation request.

Harming innocents : from time to time, I see some campaign of spams using my email as the supposed sender.

The last time this happened, I received around 30.000 bounces from non-existent users, plus something like 500 requests (from spam filters of this kind) to confirm that it was really me who sent them the viagra spam.

So, I took the time to answer around 200 of these requests. And for each request answered, I sent another message to the recipient explaining why their anti-spam protection is really stupid. Surely, they could at least, have a trivial filter to preventing asking confirmation about so trivial spam.

This example shows how this “spam filter” can be used to generate a DoS against innocents.

Undetected spams : Here is one big lie ! Vendors of this method claim that it's able to block 100 % of spams. Let's show why it's false.

Imagine I know you and I send you a legitimate message. I'll follow all steps to confirm the message, you'll get it and my email address will be added to the whitelist.

After this first message, all subsequent messages using my address as the sender address will get into your mailbox, no matter if it's a legitimate message, a virus or a spam, without being filtered.

Either way, this is how many virus work : when they infect a computer, they collect all addresses in the address book and send themselves to all people in the address book. I'll be back later with another way to let virus and spams pass through this kind of filter.

Loosing legitimate messages : Someone looking for a job sent his Curriculum Vitae to someone here. The recipient was interested and sent him a job proposal. The mailbox of the recipient was protected by this kind of filter, and the sender received a confirmation request. He decided to not confirm the message.

So, this kind of spam protection is useful only if you don't care about messages sent to you.

Another way of loosing legitimate messages ? Suppose I'm on hollidays and I send you a message using a small terminal (a Blackberry or an iPhone). And you send me a confirmation message asking me to visit a web page which is visible on a terminal with reasonable size and resolution… Well, even if I'd like to confirm the message, I won't be able to solve the captcha.

This is another big design flaw of the method. This method was intended, at the first time, to work to confirm interactions on the web, and people though it could work the same way with spam. But there is a big difference : when confirming access to some web resource, the device asking for access and the one asking to prove that the requester is an human being is the same. This isn't the case for spam.

Vendors of this kind of solution says that there isn't message loss and no legitimate message is misclassified. They “play with words”. In fact, some of this solutions put unconfirmed messages in a quarantine zone. It's up to the recipient to look up there for all legitimate messages. If fact, no matter what filter is being used, a legitimate message which isn't put in your Inbox, is an error.

People with visual deficiency - Well, people with visual deficiency may not be able to decode even trivial captchas. This issue is well explained at the w3.org report referenced above or CAPTCHAs on Social Networking Sites Shut Out Blind Users. To learn more, just google “captcha blind”.

Generating useless traffic : Some sites indicate that spam contributes with 80-95 % of the overall SMTP traffic in the whole world. Sophos report Security threat report: 2009 indicates that spam contributed with 97 % of professional email in 2008.

So, imagine that this method is widely deployed and consider, as an example, that spam contributes with 90 % of overall traffic (lower end assumption). So, for each spam sent, a confirmation request will be sent, and the spam generated traffic will double. The overal traffic will be increased by another 90 %.

The global breakdown will be : roughly speaking, 47,5 % spams, 47,5 confirmation requests and the remaining 5 % legitimate messages. So, the useful traffic will be divided by two.

Well, this is a simplified evaluation, but not too far from reality. Hopefully, this method isn't widely deployed.

Loops : suppose Alice (alice@alice.com) subscribed to a service of this kind at Provider A and Bob (bob@bob.com) uses the same kind of service but he subscribed it at Provider B. Alice and Bob never told before. Well, if Alice sends a message to Bob, Provider B will send a confirmation request to Alice, but Alice probably won't receive the message, as his own provider will, at his time, send back a confirmation request to Bob, and both sides will be starved in a waiting state, unless one of the two recipients go retrieve the message in the quarantine zone, when available (hard…). This behaviour was verified with two providers of this kind of service.

Also, the address used by Provider B to send the confirmation request is a problem. Some use something like bob@providerb.com and others, even worst, use an address which rejects all messages sent to it, e.g., antispam7@providerb.com. mailinblack.com, a french provider of this kind of solution, is one of those whose confirmation request address doesn't accept any message.

Discussion Lists : most of the time, servers managing this kind of anti-spam filter aren't able to identify discussion lists and newletters messages - and send confirmation messages to the list you've subscribed - so everybody will receive the confirmation request.

People selling this kind of filter usually say that you can “use whitelists to manage this”. Sure !!!.

Another problem with discussion lists and newsgroups is the sender address. Many list manager software uses a message dependent envelope sender address to manage bounces (VERP, BATV, SRS). So the sender changes at each message. So the solution is to whitelist an entire domain. Not a good idea if the discussion list is hosted at big domains such as yahoogroups.com, …

Delays on message delivery : The first message will be delayed for some time :

the delay to send back a confirmation message,
the delay for the sender to read it (one should not consider that he will be just behind his computer waiting for the confirmation message),
the delay to him to confirm and
finally the delay necessary to finally deliver the message.

Delays generated by human interaction can't be neglected.

Virus - some type of virus make use of the address book of the computer they've just infected to select their next victims. So, people in whitelists are the next potential victims. Chances are that the user of the infected computer is whitelisted by those people in the address book, who subscribed to this kind of anti-spam filter. So it will be easier to infect their computers. That means : if your email address is in my address book (we communicate frequently - so my address is also in your whitelist) and if my computer gets infected by some virus, chances are that the virus will send you some virus and spams and your antispam will let them get into your mailbox. The problem here is that the management of the whitelist is based on a single and untrusted information : the sender address, without any kind of authentication.

This isn't an exhaustive listing of the flaws in this method. It isn't my goal to enumerate all them - there are surely many others.

— Jose-Marcio Martins da Cruz 2009/02/16 13:23

About shared libraries

Some people asked why ze-filter is linked against static versions of some libraries, mainly BerkeleyDB and PCRE, which are bundled with ze-filter. Some arguments I've read are :

Using static libraries is a waste of disk and memory space
They may want to be able to link ze-filter against the library they've compiled.
Some think they shall be *free* to do so.

These may be interesting arguments, but there are some reasons to not do so.

It's quite usual to have changes in these libraries from time to time. Hopefully they evolve . And changes may result in incompatibilities, both at programming interface level and at data specification level. One way to make sure the filter is compatible with the library it's linked with is to bundle a copy of the library and make sure the filter will be linked against it.

Another reason is bugs or incompatibilities of some versions of these libraries. The version bundled with the filter was already enough tested, stressed and validated with the filter by the author.

For the reasons above, I can't help people using libraries which weren't tested and validated by me. Sorry to say this way, but my goal is to provide a useful tool for those wanting to filter spam, not to provide a “game tool”. So, replacing the bundled libraries by libraries provided by your OS or some particular distribution isn't supported at all.

— Jose-Marcio Martins da Cruz 2009/02/16 11:00

About binaries distributions and packages

Why I don't encourage creating binaries distributions and packages ?

There are three main reasons :

packages and binary distributions shall be up to date and most of the time maintainers haven't enough time to update their package at the same rate as the software developer.
some package maintainers decide to create a package for some software and after some time they stop maintaining it. The availability of binary distributions of very old outdated releases of the filter is useless and confusing.
packages are usually specific to some particular distribution of each release of each OS. Packages and binary distributions are useless if they aren't available, at the same time, in all contexts.

A secondary reason is that some times a software runs better if configured and compiled in the target machine (as does the FreeBSD ports system). This is particularly true for applications where final tuning may be important.

Unless these three issues are handled in a good enough way, I think it's better to use the old “configure; make; make install” procedure.

ze-filter includes a perl script ze-easy-install which allows you to near completely automate ze-filter installation and upgrade operations in a smooth way for every operating system. ze-easy-install checks if a new release exists and, if so, it gets it, compiles it and install it and, as long as possible, it indicates you if there are some points you shall take care (configuration changes, errors, …).

But I don't discourage someone from packaging ze-filter if the package is really maintained and not be a random binary version of the filter available somewhere. Everything done to make the filter easy to use is allways welcome.

— Jose-Marcio Martins da Cruz 2009/01/25 17:17

Table of Contents

Random Thoughs

Email Marketing isn't spam ! Really??? LOL.

CAPTCHAs as anti-spam : why it's a bad idea !

About shared libraries

About binaries distributions and packages