Table of Contents
URLBL filtering extracts all URLs from the body of the message and checks the domain part against a blacklist.
Although it's possible to create your own blacklist of URLs, it's hard to maintain it. A good idea is to use SURBL. SURBL is a very interesting blacklist as the priority is to have a very low false positive rate.
The blacklist of URLs can be queried by the filter in two ways :
- DNS - the filter will query a public DNS server. If you run a high volume mail server (e.g. some hundred thousand messages a day), you can have a copy of SURBL zone at your DNS. Check : http://www.surbl.org/faq.html#high-volume.
- Local BerkeleyDB database - BerkDB database queries are much faster than DNS queries, but it's more complicated to put in place than DNS queries, as you shall set up some cron tasks to get the database and recreate it.
- The list distributed by ze-filter (BerkeleyDB format) contains some few domains inserted by ze-filter maintainer, and seen locally at his domain.
- Before enabling SURBL data, check SURBL policy to determine if you're entitled for Free Use or Sponsored Use.
- Enable it !
# SPAM_URLBL # Do pattern matching # Syntax : ----- # VALUES : NO YES SPAM_URLBL YES
- If you want to use the Berkeley DB format, enable this line :
# DB_URLBL # Database Real-Time URL Blacklist (used for content checking) # Syntax : ----- DB_URLBL ze-urlbl.db
- If you prefer the DNS format, enable this one :
<DNS-URLBL> multi.surbl.org score=20.000;code=all;onmatch=stop;recurse=yes </DNS-URLBL>
When to choose DNS format or BerkeleyDB format ?
- BerkeleyDB queries are much much faster
- BerkeleyDB remove all dependency of your server from external ressources.
- BerkeleyDB needs a copy of all data in your computer - you'll need to set up three times the size of database - roughly 300 MBytes.
- With the DNS format, the only thing you need to do is to configure the DNS-URLBL part of
- When using DNS queries, you can use more than one list.
URL blacklist database
URL blacklisted database are saved inside
/var/ze-filter/cdb directory. You'll find two files there :
- ze-urlbl.txt - text version of the the blacklist.
- ze-urlbl.db - Berkeley DB version of text file. This is the format effectively handled by the filter.
ze-urlbl.txtfile. This file MUST be kept there, as it's used during database update to save bandwidth. If the file is there only the differences will be transfered. Maybe network bandwidth isn't a problem to you, but it may be to the rsync server.
the content of the text file is as follow:
URLBL:130kg.com 20:0:127.2.0.1:ze-filter URLBL:20fr.com 20:0:127.2.0.1:ze-filter URLBL:2288.org 20:0:127.2.0.1:ze-filter
Just run make in that directory when touching the text database manually
Getting and Syncing database
You can find a script in source directory etc/get-urlbl.org. Rename that file and put it where you like, then launch it once a day in crontab. The result file is a big text file of 1.2M lines
URLBL database hacking
- Modifying original database - You can modify scores of the original database or use a whitelist with the cvt-urlbldb script. The whitelist file is a text file with one domain per line.
- and Example
Syntax : cvt-urlbldb [-s newscore] [-w whitelist] [-o source] inputfile > outputfile Exemple : cvt-urlbldb -s 30 -w urlwl.txt -o multi ze-urlbl.txt > ze-urlbl-local.txt
- Creating your own database or URLs - To create your own database, you can use the mk_dbin script. The input file is a text file with one domain per line.
- and Example
Syntax : mk_dbin [-s score] [-c code] -o source Example : mk_dbin -s 25 -c 127.1.0.1 -o local localbl.txt > ze-urlbl-local.txt
- You can concatenate multiple URLBL text files. If one entry was already found in another file, the first one is taken into account. So, order is important.
- Look at
/var/ze-filter/cdb/get-urlblfiles. You'll probably need to modify the
Makefileyou'll find inside
- You may eventually need to modify
/etc/ze-filter/ze-filter.cffile to indicate the new database file : URLBL_DB configuration option.
- Take care to not move or change the
ze-urlbl.txtfile used by rsync.
Look at DBURLBL which showed that this mail has been rejected
Mar 4 17:08:18 mx0 ze-filter: [ID 000000 local5.info] 47CD73F2.001 Connect from emailer99-151.emv1.net Mar 4 17:08:21 mx0 ze-filter: [ID 000000 local5.info] 47CD73F2.001 Bayes filter score : 0.685 Mar 4 17:08:21 mx0 ze-filter: [ID 000000 local5.notice] 47CD73F2.001 DBURLBL : trc1.emv2.com : 20 BLACKLISTED in DBURLBL:ze-filter Mar 4 17:08:21 mx0 ze-filter: [ID 000000 local5.notice] 47CD73F2.001 SPAM CHECK - M02 NB HTML > PLAIN : 1 0 Mar 4 17:08:21 mx0 ze-filter: [ID 000000 local5.info] 47CD73F2.001 ORACLE - M02 text/html without text/plain ( 0.2) Mar 4 17:08:21 mx0 ze-filter: [ID 000000 local5.notice] 47CD73F2.001 : SMQID=(NOID), Callback=(eom), Why=(Content Check : B=0.685 U=20 R=0 O=0 -> G=1.082), PeerAddr=(126.96.36.199), PeerName=(emailer99-151.emv1.net), MAIL=(<email@example.com>), NbRCPT=(1/1), RCPT=(<l XXX@univ.fr>), HeaderFrom=('Club-prive.fr' <firstname.lastname@example.org>), Scores=(R=0 U=20 O=0 B=0.685 -> 1.082), Size=(6437), Reply =(550 5.7.1 Sorry, this message is being rejected as it seems to be a spam !)
You can see the scores as explained here
B=0.685 U=20 R=0 O=0 → G=1.082
Which means that URLBL put a score of 20