Table of Contents

URL filtering

Introduction

URLBL filtering extracts all URLs from the body of the message and checks the domain part against a blacklist.

Although it's possible to create your own blacklist of URLs, it's hard to maintain it. A good idea is to use SURBL. SURBL is a very interesting blacklist as the priority is to have a very low false positive rate.

The blacklist of URLs can be queried by the filter in two ways :

  • The list distributed by ze-filter (BerkeleyDB format) contains some few domains inserted by ze-filter maintainer, and seen locally at his domain.
  • Before enabling SURBL data, check SURBL policy to determine if you're entitled for Free Use or Sponsored Use.

Configuration

# SPAM_URLBL
#     Do pattern matching
#  Syntax : -----
#     VALUES :  NO  YES 
SPAM_URLBL                         YES
# DB_URLBL
#     Database Real-Time URL Blacklist (used for content checking)
#  Syntax : -----
DB_URLBL                  ze-urlbl.db     
<DNS-URLBL>
multi.surbl.org  score=20.000;code=all;onmatch=stop;recurse=yes
</DNS-URLBL>

When to choose DNS format or BerkeleyDB format ?

URL blacklist database

URL blacklisted database are saved inside /var/ze-filter/cdb directory. You'll find two files there :

Don't remove the ze-urlbl.txt file. This file MUST be kept there, as it's used during database update to save bandwidth. If the file is there only the differences will be transfered. Maybe network bandwidth isn't a problem to you, but it may be to the rsync server.

the content of the text file is as follow:

URLBL:130kg.com                                20:0:127.2.0.1:ze-filter
URLBL:20fr.com                                 20:0:127.2.0.1:ze-filter
URLBL:2288.org                                 20:0:127.2.0.1:ze-filter

Just run make in that directory when touching the text database manually

Getting and Syncing database

You can find a script in source directory etc/get-urlbl.org. Rename that file and put it where you like, then launch it once a day in crontab. The result file is a big text file of 1.2M lines

URLBL database hacking

and Example
Syntax :
cvt-urlbldb [-s newscore] [-w whitelist] [-o source] inputfile > outputfile
 
Exemple :
cvt-urlbldb -s 30 -w urlwl.txt -o multi ze-urlbl.txt > ze-urlbl-local.txt
and Example
Syntax :
mk_dbin [-s score] [-c code] -o source
 
Example :
mk_dbin -s 25 -c 127.1.0.1 -o local localbl.txt > ze-urlbl-local.txt
  • You can concatenate multiple URLBL text files. If one entry was already found in another file, the first one is taken into account. So, order is important.
  • Look at /var/ze-filter/cdb/Makefile and /var/ze-filter/cdb/get-urlbl files. You'll probably need to modify the Makefile you'll find inside /var/ze-filter/cdb
  • You may eventually need to modify /etc/ze-filter/ze-filter.cf file to indicate the new database file : URLBL_DB configuration option.
  • Take care to not move or change the ze-urlbl.txt file used by rsync.

Logging

Look at DBURLBL which showed that this mail has been rejected

Mar  4 17:08:18 mx0 ze-filter[7771]: [ID 000000 local5.info] 47CD73F2.001 Connect from emailer99-151.emv1.net
Mar  4 17:08:21 mx0 ze-filter[7771]: [ID 000000 local5.info] 47CD73F2.001 Bayes filter score :  0.685
Mar  4 17:08:21 mx0 ze-filter[7771]: [ID 000000 local5.notice] 47CD73F2.001 DBURLBL : trc1.emv2.com :  20 BLACKLISTED in DBURLBL:ze-filter
Mar  4 17:08:21 mx0 ze-filter[7771]: [ID 000000 local5.notice] 47CD73F2.001 SPAM CHECK - M02 NB HTML > PLAIN : 1 0
Mar  4 17:08:21 mx0 ze-filter[7771]: [ID 000000 local5.info] 47CD73F2.001 ORACLE - M02 text/html without text/plain (   0.2)
Mar  4 17:08:21 mx0 ze-filter[7771]: [ID 000000 local5.notice] 47CD73F2.001 : SMQID=(NOID), Callback=(eom), Why=(Content Check : B=0.685 U=20 R=0
           O=0 -> G=1.082), PeerAddr=(84.14.99.151), PeerName=(emailer99-151.emv1.net), MAIL=(<email@club-prive.emv1.net>), NbRCPT=(1/1), RCPT=(<l
           XXX@univ.fr>), HeaderFrom=('Club-prive.fr' <email@club-prive.emv1.net>), Scores=(R=0 U=20 O=0 B=0.685 ->  1.082), Size=(6437), Reply
           =(550 5.7.1 Sorry, this message is being rejected as it seems to be a spam !)

You can see the scores as explained here

B=0.685 U=20 R=0 O=0 → G=1.082

Which means that URLBL put a score of 20