“The Merriam-Webster Dictionary”
oracle : one held to give divinely inspired answers or revelations
ze-filter's oracle is a set of tests about weak spam indicators. Some examples are :
Heuristics may include, but not only, looking for some regular expressions inside some parts of messages.
The main goal of using this kind of heuristics isn't to use them to detect spam, as long as these are weak spam indicators. Heuristic filter isn't a main filtering method. But it can help to confirm the two main filtering methods : bayes filter and URL filtering.
The number of tests are not too big : less than 40 nowadays. Only really relevant checks are integrated into the oracle.
You find 4 check categories in ORACLE.
Just enable it !
# SPAM_ORACLE # Do heuristic filtering # Syntax : ----- # VALUES : NO YES SPAM_ORACLE YES
If you want to use RBLs with the Oracle, take a look at “Expert users” section.
ze-filter's oracle uses two configuration files :
/etc/ze-filter/ze-tables
- this file is used to enable/disable each Oracle test and assign odds to them./etc/ze-filter/ze-oradata
- this file is used to define unwanted things and to assign odds to them. Unwanted things may be one of :To change the names of these files, you can edit ze-filter.cf file :
# ORACLE_DATA_FILE # Some oracle definitions # Syntax : ----- ORACLE_DATA_FILE ze-oradata # ORACLE_SCORES_FILE # Oracle scores # Syntax : ----- ORACLE_SCORES_FILE ze-tables
If you want to enable/disable or change the values of tests, you shall edit ze-oradata
configuration file :
C05 DISABLE odds=1.000 SMTP client sending mail to spamtrap C06 DISABLE odds=1.000 Bad EHLO parameter C07 DISABLE odds=1.000 Myself EHLO parameter - forged M01 ENABLE odds=1.000 No HTML nor TEXT parts
If you you want to modify the list of Unwanted things used by some Oracle checks ( CHARSET | BAD-EXPR | BOUNDARY | MAILER | HTML-TAG ), you may edit ze-oradata
file :
HTML-TAGS odds=1.66 <script[^<>]*> HTML-TAGS odds=1.40 <script[^<>]+src=[^<>]+> HTML-TAGS odds=1.45 <span[^<>]*> BAD-EXPR odds=20.88 http[s]?://[^ /#]*#[0-9a-f] BAD-EXPR odds=1.00 http[s]?://[^ /&]*&#[0-9]{1,3} BAD-EXPR odds=1.03 http[s]?://[^ /@>\\n]*@ BAD-EXPR odds=6.92 http[s]?://[^ /]*[0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3} BAD-EXPR odds=3.91 http[s]?://[^>\n\r *]+\\*http[s]?:// CHARSET odds=13.00 ^big5$ CHARSET odds=9.00 ^euc-kr$ CHARSET odds=4519.00 ^gb2312$
In probability theory and statistics the odds in favour of an event or a proposition are the quantity p / (1 − p) , where p is the probability of the event or proposition. In other words, an event with m to n odds would have probability n/(m + n). For example, if you chose a random day of the week, then the odds that you would choose a Sunday would be 1/6, not 1/7. These 'odds' are actually relative probabilities.
OBS :
/var/log/ze-filter
shows the tests that have been done when checking a mail, that's a usefull if something get rejected. You will find the reason here
Mar 4 17:08:46 mx0 ze-filter[7771]: [ID 000000 local5.info] 47CD740E.001 ORACLE - M02 text/html without text/plain ( 0.2) Mar 4 17:08:46 mx0 ze-filter[7771]: [ID 000000 local5.info] 47CD740E.001 ORACLE - M13 RFC2822 headers compliance ( 1.0) Mar 4 17:08:46 mx0 ze-filter[7771]: [ID 000000 local5.info] 47CD740E.001 ORACLE - H06 HTML tag/text ratio ( 0.5)
$ ze-filter -t oradata $ ze-filter -t oracle-checks